How the Arena Works

Recording

Users launch the Dojo and begin interacting with a target application. This could be a browser-based SaaS tool, a desktop program, or a game.

The platform captures:

Cursor movement and click sequences
Keyboard inputs and field interactions
UI layouts and screen transitions
Timing, pauses, corrections, and multi-step workflows

Structuring

Once the demo is complete, the raw interaction data is parsed and mapped into a structured format that agents can interpret. This includes:

Action timelines (e.g., "click this", "type that")
Decision branches (e.g., conditional steps)
Interface object recognition (e.g., buttons, fields, menus)

This structure forms the basis of a Behavior Tree — a modular, composable logic graph that the agent can use to reason through the task.

Generalization

The agent’s training model uses multiple demonstrations — from different users, applications, or contexts — to generalize beyond one-off examples. Over time, this builds resilience to layout changes, dynamic content, or slightly different input patterns.

The more diverse the demos, the more robust the agent becomes.

Validation

Community validators or automated test harnesses evaluate the trained agent's performance:

Can it complete the task without intervention?
How does it handle edge cases or interface changes?
Does it execute efficiently and accurately?

Validations contribute to agent rankings and help determine token rewards for contributors.

Reward

Contributors receive $SOL based on:

Quality and clarity of the demonstration
Reusability and generalizability of the behavior
Reward pool set by Dojo creator

This reward loop creates a powerful incentive to share real, high-quality digital workflows.

PreviousThe Philosophy: Learning by Doing NextWhy Demonstration-Based Training Is Transformative

Last updated 12 days ago

Recording

Structuring

Generalization

Validation

Reward