Posts

Sovereign Downloader: Navigating the n-sig Fog of YouTube

[!TIP] Target Audience: Developers interested in self-hosted tools and those frequently battling various Web restrictions. Core Objective: Build a fully controlled YouTube media collection system and solve the n challenge signature hurdle. Problem-Solution Mapping: Login Wall/Cookies $\rightarrow$ Manifest v3 Browser Extension sync Dynamic Signatures (n-sig) $\rightarrow$ Docker-integrated External JS Solver (node) Real-time Observation $\rightarrow$ Socket.io streaming of stdout Lately, I’ve been tinkering with my “Digital Sovereignty” infrastructure, and a core necessity is YouTube downloading. While there are plenty of tools out there, YouTube’s increasing restrictions on n-sig (a dynamically generated signature via obfuscated JS) have caused many open-source projects to “wipe out” when fetching high-quality audio tracks. ...

Hand-rolled Lobster (3): Connecting WeChat to the CLI-Native Kernel

Following our previous discussion on the RoleEngine Core, it’s time to bridge the gap between a raw CLI “brain” and a real-world communication platform: WeChat. In VisAgent, we don’t believe in heavy, bloated frameworks. Instead, we use a CLI-Native Bridge pattern. This post explores how we connected the “Clawbot” (our WeChat interface) to a hand-rolled Gemini CLI kernel. The Request Flow: From Chat to CLI The architecture is a chain of specialized tools, each doing one thing well. Here’s how a message travels from your phone to the AI: ...

Hand-rolling a 'Concentration Tool' for My Son: From Visual Warmup to State Feedback

[!NOTE] The Origin: To help my son enter a deep state of focus during study sessions, I “hand-rolled” a lightweight “Concentration Tool” using pure HTML. No heavy frameworks, just three raw, direct modules. In this post, I’ll share the design logic behind this tool. You can experience it directly by clicking Concentration Tool in the navigation bar. 1. Why “Hand-roll”? There are countless focus apps out there, but they often: ...

Hand-rolled Lobster (4): Digital Metabolism & Architecture Autonomy — The Agent's Path to Self-Healing

[!TIP] Objective: Implement “zero-intrusion” architectural document synchronization and an AI-driven system self-repair closed-loop. 1. Zero-Intrusion Metabolism (Metabolic Governance) In modern Agent development, maintaining an ARCHITECTURE.md and routing table is often a developer’s nightmare. Our solution is a Metadata-Free “metabolism” mechanism: Perceive Changes: Mounted on a Git hook, it captures code variations via Git Diffs. Semantic Inference: The AI automatically analyzes code intent to update the machine-readable .anti_bot_map.md (RAG routing table) and the human-readable ARCHITECTURE.md (Architectural Lore). Strong Consistency: Completely eliminates the cost of manual documentation maintenance; docs are code, and code is the map. 2. The System’s Pulse: Heartbeat Engine To ensure Agent stability in unattended environments, we configured an independent Heartbeat process. It’s more than just resource monitoring; it’s a “self-awareness” probe: ...

Hand-rolled Claw (3): All-around Perception — Multi-modal & Dynamic Skill Tree

[!TIP] Objective: Enable multi-modal (vision/files) perception via CLI parameters and build a dynamically-loaded plugin-based skill system. 1. Letting the CLI “See” the World Multi-modal capabilities don’t necessarily require complex SDKs. In VISAGENT, we leverage the native support for file paths in the gemini CLI (@path) to achieve sensory integration at the RoleEngine layer: def _do_raw_invoke(self, message, files=None): # Construct multi-modal suffix mm_suffix = "" if files: mm_suffix = "\n" + "\n".join([f"@{f}" for f in files]) # Append to final Prompt full_input = f"{message}{mm_suffix}" # ... execute subprocess Field Experience: To handle complex visual tasks, we encapsulated a dedicated vision_expert skill. By using the DEEP reasoning mode, we guide the AI through Chain-of-Thought thinking, enabling precise identification of screenshots and UI components. ...

Hand-rolled Claw (2): The Strategist's Brain — AX Planner & Flow Architect

[!TIP] Objective: Evolve from single-step prompts to a “Plan-Simulate-Execute” closed-loop, introducing YAML-based DAG asynchronous orchestration. 1. From “Chatting” to “Execution”: The AX (Architect-Executive) Paradigm The watershed between a simple Chatbot and an Agent is the ability to decompose vague goals into actionable steps. In VISAGENT, we implemented the AX Planner logic: Architect: Receives requirements and outputs a TODO.json. Execution is forbidden; only planning is allowed. Simulation: Before execution, a “Security Expert” role performs a risk assessment on the plan. Executive: Executes each step via execute_step. This “think before you act” mechanism is easily implemented on any CLI using simple Prompt constraints: ...

Hand-rolled Lobster (1): RoleEngine Core Based on CLI

[!TIP] Objective: Build an Agent engine with state management and isolation by wrapping a CLI (e.g., gemini CLI) without framework reliance. 1. Why CLI Over Frameworks? VISAGENT’s philosophy is CLI-Native. Directly wrapping a CLI (like gemini CLI) offers superior control and transparency: Atomicity: Each call is a single inference step. State Transparency: Session resume/suspension is strictly controlled by paths and the Resume flag. Isolation: Permission sandboxing is achieved through native OS tools. 2. Core Wrapper: RoleEngineBase Execution is triggered via subprocess. To prevent Shell parameter overflow from long Prompts, we use stdin for transmission: ...

AI-Native Metabolic Governance V3.0: Evolution from Documentation Debt to Metabolism

[!IMPORTANT] Core Pain Point: In fast-paced projects, the high cost of manual documentation updates leads to persistent “drifting” or loss of synchronization between code and architectural documentation (README, ARCHITECTURE.md). The Solution: Offload the cognitive burden of “documentation maintenance” from human developers to AI Agents with semantic reasoning capabilities, enabling “metabolic” renewal of architectural maps. After multiple rounds of conversation with AI regarding documentation governance, I’ve distilled a concept for “AI-Native Metabolic Governance V3.0.” This is not just an automation script, but a governance logic designed to make code architecture “come alive.” ...

D2R: An In-depth Understanding of NoDrop and Loot Mechanics

[!TIP] Target Audience: D2R players looking to optimize farming efficiency through underlying game logic. Core Objective: To clarify the mathematical relationship between the /players X command and actual drop rates. Problem-Solution Mapping: Misconception: The belief that simply enabling an 8-player difficulty maximizes loot drops. Solution: A detailed breakdown of how X (Total Players) and Y (Partied Area Players) jointly impact NoDrop rates. With Diablo 2 Resurrected Season 13 in full swing, many of us are discussing how to improve our farming runs. While intuition plays a role, understanding the technical NoDrop (zero-drop probability) mechanism allows us to find a better balance between difficulty and speed. ...

2026-02-09 Retrospective | Hallucinations in Tauri v2 Migration: A Record of Pitfalls

A deep dive into the ‘hallucinogenic’ migration from Tauri v1.x to v2.0, covering ACL permission changes, Minisign signature updates, and GitHub Actions automation pitfalls.