[!TIP] Objective: Evolve from single-step prompts to a “Plan-Simulate-Execute” closed-loop, introducing YAML-based DAG asynchronous orchestration.

1. From “Chatting” to “Execution”: The AX (Architect-Executive) Paradigm

The watershed between a simple Chatbot and an Agent is the ability to decompose vague goals into actionable steps. In VISAGENT, we implemented the AX Planner logic:

  1. Architect: Receives requirements and outputs a TODO.json. Execution is forbidden; only planning is allowed.
  2. Simulation: Before execution, a “Security Expert” role performs a risk assessment on the plan.
  3. Executive: Executes each step via execute_step.

This “think before you act” mechanism is easily implemented on any CLI using simple Prompt constraints:

# Core AX_PLANNING_MODE Instruction
plan_prompt = (
    "You are a specialized Task Architect. Output ONLY valid JSON.\n"
    "Schema: { 'goal': '...', 'steps': [ {'id': 1, 'desc': '...', 'type': 'EXECUTION'} ] }"
)

2. The Failure of Linearity: Why We Need a DAG

Early AX Planners were linear. However, real-world engineering tasks are often topological: some steps can be parallelized, while others must wait for preceding results. This led to the creation of the Flow Architect.

3. Orchestration Core: YAML-based DAG

We designed a minimalist YAML orchestration protocol. Instead of heavy frameworks like LangGraph, we drive it directly with Python:

id: "deploy_service_v1"
steps:
  - id: "build_docker"
    role: "builder"
    desc: "Build Image"
    
  - id: "health_check"
    depends_on: ["build_docker"] # Declare dependency, constructing the DAG
    skill: "web_fetcher"
    desc: "Check Deployment Status"

Implementation Details:

  • State Persistence: Each Flow has its corresponding .state.json. If a process is interrupted, the system can resume from the last completed node.
  • L3 Context Propagation: Output from preceding nodes is automatically appended to l3_context, serving as the reasoning basis for subsequent nodes.

4. Shadow Reflection

To enhance the stability of our “hand-rolled” system, we added a Shadow Reflection mechanism before each step:

reflection_prompt = f"Executing: '{desc}'. Assess risks. Reply 'SAFE' or 'WARNING'."
reflection_res = self.engine.invoke(reflection_prompt)
if "WARNING" in reflection_res:
    # Auto-cutoff or human intervention

This “self-questioning” logic gives the Agent a much-needed sense of caution during write operations (like file edits or deployments).

Conclusion

The transition from linear to graph-based (DAG) orchestration is essential for an Agent to move from an experimental toy to an engineering tool. Through simple YAML definitions and state persistence, we achieved reliable long-cycle task scheduling without heavy external dependencies.

Next part: How to make your Agent “see” and understand multi-modal inputs, and the design of dynamic Skill Trees.