Silicon Team S2E05: Every New Product Makes the Core Fatter

Silicon Team S2E05

EP01 built the family calendar and added LLM Pool fallback logic and design token scanning to the core. EP02 built the education tool and added multi-tenant schema handling and OAuth provider routing to the core. After two products, the core pipeline grew from 800 to 2,000 lines — every line made sense individually, but together they were a tangled plate of spaghetti.

The fear of modifying the core was concrete: every time I added domain logic to the harness, I worried about breaking other flows. The education tool’s migration check and the family calendar’s design token scan shared the same startup hook — changing one could break the other. Worse, this coupling was implicit: no line of code declared a dependency between these two features. You only discovered the shared execution path after breaking something. Implicit coupling is harder to manage than explicit dependencies — you can’t find it by searching imports.

EP04 looked at Cline’s 3,764-line orchestrator and felt somewhat reassured: god files are acceptable in orchestrator scenarios. But OPC’s bloat direction was wrong — Cline’s 3,764 lines are orchestration logic; OPC’s 2,000 lines included a third that was domain logic. The difference comes down to change frequency: orchestration logic rarely changes once the architecture stabilizes, but domain logic changes with every new product. Mixing high-frequency-change code with low-frequency-change code means every domain edit risks collateral damage to orchestration. An orchestrator can be big, but it shouldn’t hold domain knowledge.

OPC was simultaneously playing two roles: a general-purpose multi-role task orchestration engine, and a carrier for domain-specific workflows. The code for both roles was mixed together. Time to split.

The essence of the problem was at the product level: every new product contaminates the tool core with its special logic. Two products were tolerable; by the third, the fear of modifying the core would outweigh the excitement of building the product. The three approaches below all solve this problem.

Three Approaches

Approach A: Core + plugin system. The standard solution — define plugin interfaces, external code implements interfaces, load dynamically at runtime. EP04 saw LobeHub (100 lines for one Piece) and Activepieces (30 lines plus SDK learning cost) taking this path. But plugin system maintenance costs are enormous: lifecycle management, version compatibility, API stability guarantees, discovery and registration mechanisms. For a tool with exactly one user, all that infrastructure is waste.

Approach B: Monolith with configuration layer. Don’t split the core, but use config files to control which features are enabled. Simple, but treats symptoms not causes — configuration keeps growing, eventually becoming another form of bloat.

Approach C: Open-source core + private extensions. The core only does orchestration; domain logic becomes independent extension modules. Extensions aren’t plugins — no dynamic loading, version management, or registry needed. They simply communicate with the core at fixed contact points.

The final path was a B+C hybrid: core stays open-source with orchestration logic in a single file. Flow definitions split into built-in and extension categories, with the only coupling points being the flow definition spec and five fixed hook interfaces.

Capability Contracts: Neither Side Knows the Other

Decoupling between core and extensions works through capability contracts. S1E04 covered the early version — “joints are extension points; you can’t hang muscle in the middle of bone.” v0.5 formalized it.

Design principle: neither side knows the other directly; they only know a shared vocabulary. A flow node declares “I need the visual-consistency-check capability”; an extension declares “I provide visual-consistency-check.” Non-empty intersection triggers execution; otherwise, silent skip. Add a new flow template declaring a capability, and all extensions providing that capability automatically run on it — zero lines of extension code changed.

This solved the core problem: adding a new product doesn’t require modifying the core. The product’s domain logic becomes an extension declaring what capabilities it provides. The core is only responsible for asking “who can do this?” at the right moment, then dispatching the work.

Five Hook Types

Extensions cut into the pipeline through five hook types at different phases:

Startup: validate preconditions — check whether dependencies are in place, configuration is complete. If preconditions aren’t met, the extension opts out voluntarily rather than slowing down the pipeline.

Pre-dispatch: inject context — before reviewers receive their assignments, inject domain-specific scoring criteria into the prompt. For instance, Design Intelligence injects design tokens and visual consistency scoring dimensions at this phase.

Review: submit mechanized findings — the extension runs its own mechanical checks (no LLM involved) and submits results as additional review findings to the gate.

Execute: run side effects — screenshots, performance measurements, data collection. These operations have side effects, hence a separate phase rather than the review phase.

Post-completion: generate output artifacts — summary reports, visualizations, archival.

Five hook types cover the pipeline’s entire lifecycle. Extensions only interact with the core through hooks and capability declarations — they don’t touch the core’s execution logic. This rule has no exceptions — if an extension needs to read or modify the core’s internal state, it shouldn’t be an extension; it should be merged into the core.

The Circuit Breaker and the Bug That Wasn’t Fixed

Extensions run inside the pipeline; a bad extension can kill the entire flow. The per-extension circuit breaker is the safety net: three consecutive failures disables the extension for the current flow run; any single success resets the counter.

The day after v0.5 shipped, a bug appeared: circuit breaker state only lived in memory; each CLI invocation reset it. OPC runs each tick as a CLI invocation — a repeatedly crashing extension gets three fresh chances every tick. The circuit breaker was effectively decorative.

The bug’s essence was the contradiction between “stateless CLI” and “stateful mechanism.” CLI is inherently stateless — start, execute, exit. Circuit breakers need to remember failure counts across invocations. v0.5.1 persisted breaker state to a JSON file indexed by extension ID and flow ID. Looking back after the fix, this was a problem that should have been anticipated during design — but when building it, my mind was on “the circuit breaker needs consecutive counting,” forgetting that “consecutive” isn’t free in a stateless CLI.

From v0.5 to v0.8

Four rounds of hardening. Each round didn’t add new features — it added clarity.

v0.5: Extension interface first release. Capability contracts, five hook types, circuit breaker.

v0.5.1: Fixed 7 defects. Most critical: circuit breaker persistence. Second: prefix fallback for capability matching — visual-consistency-check-v2 matches extensions declaring visual-consistency-check.

v0.6: Five production extensions mounted simultaneously: design-lint, visual-eval, memex-recall, git-changeset-review, session-logex. Five extensions coexisting was a real stress test for capability contracts — their capability declarations couldn’t conflict, and hook execution order couldn’t interfere with each other.

v0.7: Validated whether an “outsider” could write a complete extension from documentation alone. 7,800 words of extension-authoring.md, a 121-line starter template, a 70-line reference extension — the latter written by an “outsider” agent reading only docs, zero looks at core source.

It could write one, but the process exposed 5 DX defects. The severity-to-emoji mapping table was buried mid-document where newcomers couldn’t find it. ctx.task might be a string or an object — the docs didn’t say. Where’s the valid list of capability names? — needs a registry.

One issue deserves confrontation: this “outsider” was another AI agent, not a human developer. AI has infinite patience — it won’t close the browser after the third documentation pitfall. If the test subject were a human developer, the defect count might double. Using AI to test AI tool documentation is closed-loop validation — necessary but not sufficient.

v0.8: Closed remaining friction items; added runbook mechanism. By v0.8, all 258 tests passing.

Convention Over Configuration

EP04 saw the cost of LangChain’s 67.7k-line core — the more universal the standard, the heavier the core. OPC chose the opposite: convention over configuration, no abstraction layer.

No Runnable interface, no general pipeline SDK, no plugin API version management. Flow definitions are a JSON schema plus a set of conventions. Extension hook signatures are fixed at five types, with no support for custom hook types.

This limits flexibility — you can’t invent new hook points, can’t insert custom phases mid-pipeline. But in return: small core codebase (an order of magnitude less than LangChain), newcomers can read extension-authoring.md and write extensions without first understanding an abstract type system.

EP04’s plugin difficulty triangle (DX, security, discoverability) also helped lock the position. OPC is a single-person tool; the extension author is the user themselves. So DX first, security via trust boundary (trust boundary = user boundary), discoverability deferred (one user, no marketplace needed). This trade-off would likely break in a multi-user scenario — but for a tool with exactly one user, paying security costs in advance for a hypothetical second user isn’t worthwhile.

The extension system stopped the core from getting fatter. But EP06 will discover that some review capabilities can’t be solved by just adding a code-level extension — when the problem is visual, code reviewers can’t see it.

The essence of infrastructure work: nobody applauds when it’s fixed; everybody complains when it’s not.

Silicon Team S2: Evolving the Toolchain Through Real Products ← S2E04: Look at Others’ Pitfalls Before Digging Your Own | S2E06: Three Reviewers and Not One Noticed the Wrong Color →