How the Arena Works
Users launch the Dojo and begin interacting with a target application. This could be a browser-based SaaS tool, a desktop program, or a game.
The platform captures:
Cursor movement and click sequences
Keyboard inputs and field interactions
UI layouts and screen transitions
Timing, pauses, corrections, and multi-step workflows
Once the demo is complete, the raw interaction data is parsed and mapped into a structured format that agents can interpret. This includes:
Action timelines (e.g., "click this", "type that")
Decision branches (e.g., conditional steps)
Interface object recognition (e.g., buttons, fields, menus)
This structure forms the basis of a Behavior Tree — a modular, composable logic graph that the agent can use to reason through the task.
The agent’s training model uses multiple demonstrations — from different users, applications, or contexts — to generalize beyond one-off examples. Over time, this builds resilience to layout changes, dynamic content, or slightly different input patterns.
The more diverse the demos, the more robust the agent becomes.
Community validators or automated test harnesses evaluate the trained agent's performance:
Can it complete the task without intervention?
How does it handle edge cases or interface changes?
Does it execute efficiently and accurately?
Validations contribute to agent rankings and help determine token rewards for contributors.
Contributors receive $SOL based on:
Quality and clarity of the demonstration
Reusability and generalizability of the behavior
Reward pool set by Dojo creator
This reward loop creates a powerful incentive to share real, high-quality digital workflows.
Last updated