[!TIP] Objective: Build an Agent engine with state management and isolation by wrapping a CLI (e.g., gemini CLI) without framework reliance.
1. Why CLI Over Frameworks?
VISAGENT’s philosophy is CLI-Native. Directly wrapping a CLI (like gemini CLI) offers superior control and transparency:
- Atomicity: Each call is a single inference step.
- State Transparency: Session resume/suspension is strictly controlled by paths and the Resume flag.
- Isolation: Permission sandboxing is achieved through native OS tools.
2. Core Wrapper: RoleEngineBase
Execution is triggered via subprocess. To prevent Shell parameter overflow from long Prompts, we use stdin for transmission:
def _do_raw_invoke(self, message: str, context: str = "", files: List[str] = None) -> dict:
cmd = [AI_BIN, "--resume", "latest"] if self._get_turns() > 0 else [AI_BIN]
# Use stdin for long prompts
use_stdin = len(message + context) > 2000
if use_stdin: cmd += ["-p", "-"]
result = subprocess.run(
cmd, input=f"{message}\n\n{context}" if use_stdin else None,
capture_output=True, text=True, encoding='utf-8'
)
if result.returncode == 0:
self._inc_turns()
return {"ok": True, "output": result.stdout}
3. Isolation & Session Management
Credential Sandboxing (Symlink)
To allow multi-role/multi-project concurrency without credential leakage, we use Symlinks to dynamically inject .gemini configs:
def _link_auth_session(self):
global_gemini = os.path.expanduser("~/.gemini")
role_gemini_base = os.path.join(self.local_home, ".gemini")
# Link credentials to isolated role execution directory
for f in ["oauth_creds.json", "settings.json"]:
os.symlink(os.path.join(global_gemini, f), os.path.join(role_gemini_base, f))
Handoff (Auto-Distillation)
While the CLI supports resume, excessive context degrades performance. We set a handoff_threshold = 20. When turn count is exceeded, the system automatically distills the session and starts a fresh one.
4. Cleaning: Agentic Noise Filter
To ensure “dry goods” output that automated scripts can easily parse, we filter out AI mutterings (e.g., “Sure, I will…”) using positive lookahead regex:
def _clean_agentic_noise(self, text: str) -> str:
import re
patterns = [
r"^(I will|I am going to|Let me|First, I'll)\s+.*?\n+",
r"^(Searching for|Checking|Listing|Reading)\s+.*?\n+"
]
for p in patterns: text = re.sub(p, "", text, flags=re.IGNORECASE).strip()
return text
5. Soul: Layered Context Injection (L0-L3)
The final Prompt sent to the CLI is layered:
- L0 (Identity): Persona and architectural Lore defined in
GEMINI.md. - L1 (Skills): Dynamic API lists from
SkillHandler. - L2 (Substance): Current code/env metadata (extracted via AST or Git Diff).
- L3 (Recent Context): Distilled recent conversational history.
Conclusion
This is the beginning of the “Hand-rolled Claw”: wrapping a simple CLI into an engineering-grade RoleEngine. In the next part, we will discuss defining complex asynchronous task flows with DAGs.