How the Arena Works

  1. Recording

Users launch the Dojo and begin interacting with a target application. This could be a browser-based SaaS tool, a desktop program, or a game.

The platform captures:

  • Cursor movement and click sequences

  • Keyboard inputs and field interactions

  • UI layouts and screen transitions

  • Timing, pauses, corrections, and multi-step workflows

  1. Structuring

Once the demo is complete, the raw interaction data is parsed and mapped into a structured format that agents can interpret. This includes:

  • Action timelines (e.g., "click this", "type that")

  • Decision branches (e.g., conditional steps)

  • Interface object recognition (e.g., buttons, fields, menus)

This structure forms the basis of a Behavior Tree — a modular, composable logic graph that the agent can use to reason through the task.

  1. Generalization

The agent’s training model uses multiple demonstrations — from different users, applications, or contexts — to generalize beyond one-off examples. Over time, this builds resilience to layout changes, dynamic content, or slightly different input patterns.

The more diverse the demos, the more robust the agent becomes.

  1. Validation

Community validators or automated test harnesses evaluate the trained agent's performance:

  • Can it complete the task without intervention?

  • How does it handle edge cases or interface changes?

  • Does it execute efficiently and accurately?

Validations contribute to agent rankings and help determine token rewards for contributors.

  1. Reward

Contributors receive $SOL based on:

  • Quality and clarity of the demonstration

  • Reusability and generalizability of the behavior

  • Reward pool set by Dojo creator

This reward loop creates a powerful incentive to share real, high-quality digital workflows.

Last updated