[!TIP] Objective: Build an Agent engine with state management and isolation by wrapping a CLI (e.g., gemini CLI) without framework reliance.

1. Why CLI Over Frameworks?

VISAGENT’s philosophy is CLI-Native. Directly wrapping a CLI (like gemini CLI) offers superior control and transparency:

  • Atomicity: Each call is a single inference step.
  • State Transparency: Session resume/suspension is strictly controlled by paths and the Resume flag.
  • Isolation: Permission sandboxing is achieved through native OS tools.

2. Core Wrapper: RoleEngineBase

Execution is triggered via subprocess. To prevent Shell parameter overflow from long Prompts, we use stdin for transmission:

def _do_raw_invoke(self, message: str, context: str = "", files: List[str] = None) -> dict:
    cmd = [AI_BIN, "--resume", "latest"] if self._get_turns() > 0 else [AI_BIN]
    
    # Use stdin for long prompts
    use_stdin = len(message + context) > 2000
    if use_stdin: cmd += ["-p", "-"]
    
    result = subprocess.run(
        cmd, input=f"{message}\n\n{context}" if use_stdin else None,
        capture_output=True, text=True, encoding='utf-8'
    )
    if result.returncode == 0:
        self._inc_turns()
        return {"ok": True, "output": result.stdout}

3. Isolation & Session Management

To allow multi-role/multi-project concurrency without credential leakage, we use Symlinks to dynamically inject .gemini configs:

def _link_auth_session(self):
    global_gemini = os.path.expanduser("~/.gemini")
    role_gemini_base = os.path.join(self.local_home, ".gemini")
    # Link credentials to isolated role execution directory
    for f in ["oauth_creds.json", "settings.json"]:
        os.symlink(os.path.join(global_gemini, f), os.path.join(role_gemini_base, f))

Handoff (Auto-Distillation)

While the CLI supports resume, excessive context degrades performance. We set a handoff_threshold = 20. When turn count is exceeded, the system automatically distills the session and starts a fresh one.

4. Cleaning: Agentic Noise Filter

To ensure “dry goods” output that automated scripts can easily parse, we filter out AI mutterings (e.g., “Sure, I will…”) using positive lookahead regex:

def _clean_agentic_noise(self, text: str) -> str:
    import re
    patterns = [
        r"^(I will|I am going to|Let me|First, I'll)\s+.*?\n+",
        r"^(Searching for|Checking|Listing|Reading)\s+.*?\n+"
    ]
    for p in patterns: text = re.sub(p, "", text, flags=re.IGNORECASE).strip()
    return text

5. Soul: Layered Context Injection (L0-L3)

The final Prompt sent to the CLI is layered:

  • L0 (Identity): Persona and architectural Lore defined in GEMINI.md.
  • L1 (Skills): Dynamic API lists from SkillHandler.
  • L2 (Substance): Current code/env metadata (extracted via AST or Git Diff).
  • L3 (Recent Context): Distilled recent conversational history.

Conclusion

This is the beginning of the “Hand-rolled Claw”: wrapping a simple CLI into an engineering-grade RoleEngine. In the next part, we will discuss defining complex asynchronous task flows with DAGs.