Skip to content
Touchskyer's Thinking Wall
S3E01
9 min read

Silicon Team S3E01: The Second User Didn't Not Show Up — They Got Stuck at the Door

Silicon Team S3E01

Previously on Silicon Team: Two seasons building a machine, then using it to build eight products.

S1’s core discovery: AI can write code, but you can’t trust its self-assessment. So I built OPC — the agent that does the work never evaluates it, mechanical gates count red and yellow flags instead of making judgment calls, and nothing ships until it passes. Trust built through constraints.

S2 took OPC into the field. A family calendar exposed direction problems, Pi-Math exposed termination problems, 30 open-source projects provided reference points, and opc-viewer made the review process visible. Each product pushed a boundary, each boundary forced a framework upgrade. Products expose the toolchain’s blind spots; blind spots drive the toolchain’s evolution.

But S2E08 audited OPC with itself and found an uncomfortable truth: eight products, dozens of gate decisions, and the “fail means loop back” mechanism never fired once. The core enforcement promise was an unexercised promise.

S2’s closing question: What stands between “I can use it” and “others can use it”?

S3 isn’t about how well OPC’s open-source launch went, nor about nobody showing up. S3 is about how trust transfers in layers — people came, but they only trust the outermost layer.

163 Stars Does Not Equal 163 Users

After OPC went open source, numbers arrived.

163 stars. 39 forks. A brief appearance on GitHub trending — one day brought 15+ forks alone. Issues started appearing. Pull requests started appearing.

In the context of these numbers, “people showed up” is true.

But “people showed up” and “people are using it” are two different things. Starring costs one click — it carries zero commitment to the code itself. Forking costs slightly more: it means at least “I want to look at the source,” but not “I ran this thing.” 163 people expressed interest, but interest isn’t usage. Between interest and usage lies a step no metric captures: actually getting the code to run.

Then the first truly valuable external signal arrived. Not praise. Not a feature request. One sentence:

It doesn’t run on Linux.

skill.md vs. SKILL.md

OPC installs as a Claude Code skill at ~/.claude/skills/opc/. The skill definition file was named skill.md.

Here’s the problem: Claude Code’s skill loader expects SKILL.md — all uppercase.

macOS’s default filesystem (HFS+, and APFS likewise) is case-insensitive. On macOS, skill.md and SKILL.md point to the same file. You write skill.md, the loader looks for SKILL.md, finds it anyway. Everything works.

Linux’s ext4 is case-sensitive. skill.mdSKILL.md. The loader looks for SKILL.md. The filesystem says: no such file. The skill silently fails to load. OPC doesn’t work.

No error message. No exception. It just doesn’t work.

This is the most pernicious failure mode. An error means the system knows something went wrong — the caller gets diagnostic information and can locate the cause. Silent failure means the system doesn’t know something went wrong — it believes everything is fine, but the functionality has simply vanished. The user can’t tell whether they did something wrong or the tool itself is broken. Their only takeaway is “this tool doesn’t seem to work.”

The entire S1 and S2 — two seasons of toolchain development, eight products, 109 test files, bash test/run-all.sh all green — were completed on macOS. On a case-insensitive filesystem. This bug existed from day one, but it could never be triggered in my environment.

A GitHub issue only needed one sentence — “skill doesn’t load on Linux” — but the information density behind that sentence exceeded any feature request: your test coverage is bounded by your runtime environment’s assumptions.

The Fix Touches 11 Files

PR #21’s core fix is one shell command:

git mv skill.md SKILL.md

But renaming isn’t enough. Every file in the repository that referenced the old path needed updating:

  • bin/opc.mjs: the MANAGED_ENTRIES array
  • package.json: the files field in the install manifest
  • bin/hooks/opc-post-compact.sh: the context recovery prompt after compaction
  • CONTRACTS.md, CONTRIBUTING.md, INTEGRATION.md
  • 5 pipeline documents: gate-protocol.md, loop-protocol.md, report-format.md, and more

11 files, +16 / -16 lines. Every individual change is tiny — swap skill.md for SKILL.md. But these 11 files span four different layers: code, configuration, hook scripts, and documentation. One filename’s capitalization rippled across four layers of infrastructure.

This spread itself tells a story. If skill.md appeared in only one place, the fix would be a one-line change. But personal tools don’t manage their reference graphs during development — if it works, you use it wherever you need it. Over two seasons of iteration, one filename got embedded in the entry script, the package manifest, a recovery hook, and design documents across four distinct layers. Nobody builds a reference index for a filename on Day 1, but when renaming day arrives, those scattered references become the cost of repair.

All 109 test files passed — they never tested whether “the filename can be found correctly on a case-sensitive filesystem.” Because the runtime environment itself was case-insensitive. Tests running in their own environment can’t see their own blind spots. This is the same class of problem S2E08 discovered with the “FAIL path never triggered”: an untested mechanism and a nonexistent mechanism are indistinguishable in production.

Trust Has Five Layers

S2E08’s closing question: “What stands between ‘I can use it’ and ‘others can use it’?”

PR #21 provides the first answer: invisible environment assumptions.

But #21 is just the first layer. Looking back at all external signals since OPC went open source — stars, forks, issues, PRs — a pattern emerges. Trust doesn’t transfer all at once. It transfers layer by layer:

Layer 1: Infrastructure. Can others even run it? #21 says no. Everything works on macOS; silent failure on Linux. The first user’s experience isn’t your carefully designed review flow — it’s a skill that won’t load.

Layer 2: Pattern. Can others understand your design patterns? This isn’t just “can they read the code” — readable code is the bare minimum. Pattern means: your conventions, your naming rules, the way you organize files, the decisions about what’s extensible and what’s hardcoded. Later PRs answer this — someone read the roles/*.md format and added new roles following the existing structure.

Layer 3: Contribution. Do others dare to add things? Understanding is passive — reading code is enough. Contributing is active — you have to make a judgment call: “will my change break something that already works?” That judgment requires trusting your own understanding of the system, and trusting that the system has enough safety nets (tests, CI, code review) to catch your mistakes.

Layer 4: Core. Do others dare to touch harness, gate, review flow — the core mechanisms? Core mechanisms are the skeleton of the constraint system — changing them affects not one feature but the behavioral boundaries of every feature. Touching core means “I understand why this system was designed this way, and I believe the design itself should change.” That’s the highest degree of trust.

Layer 5: Resilience. Can the system withstand external users’ failures, misunderstandings, and unexpected usage patterns? A system used only by its author has never been stress-tested by misunderstanding. External users will pass wrong parameters, skip prerequisite steps, combine features in ways you never anticipated. The system either gives meaningful error messages or silently collapses under unexpected usage — just like skill.md did on Linux.

163 stars measure interest. 39 forks measure curiosity. But between “interest” and “can actually run it,” there’s a filename’s capitalization. What S3 documents is how trust transfers from layer one upward, one layer at a time.

The First Lesson of Personal Tools Going Public

PR #21’s lesson isn’t “remember to capitalize.” It exposes a more general problem: personal tools carry a large number of undeclared environment assumptions.

My development environment: macOS + zsh + Homebrew + English locale. OPC was refined in this environment across two seasons. Every shell script, path handler, and filename convention implicitly assumed the runtime environment matches mine.

This assumption was harmless in S1-S2 — the only user was me. But the moment OPC went open source, that assumption became a landmine. You don’t know whether the first person to clone uses Ubuntu or Arch Linux, bash or fish, en_US.UTF-8 or zh_CN.GBK.

S2E08 called implicit connections “ticking time bombs” — referring to implicit dependencies between tools. #21 says the same thing from a different angle: implicit environment assumptions are ticking time bombs that only detonate on someone else’s machine.

This isn’t solvable by “testing more” on macOS. The solution isn’t more test rounds on your own system — it’s testing on different operating systems, different filesystems, different shells. CI matrices, Docker containers, cross-platform smoke tests — these aren’t nice-to-haves. They’re the foundation of the infrastructure trust layer.

The First Gate to “Others Can Use It” Isn’t Documentation

Looking back at this episode’s core:

The first truly valuable external feedback wasn’t “your review flow is clever” or “can you add a feature.” It was “can’t run it.”

You spent two seasons optimizing review accuracy — adding roles, adding gates, adding design review, making processes visible. Then the second person arrived and got stuck at installation.

The first gate to “others can use it” isn’t how thorough your documentation is or how clean your README reads. It’s the real differences in environments — operating systems, filesystems, shells — things you’ll never encounter on your own machine.

PR #21 is still in OPEN status. The fix is ready — 11 files, clean and straightforward — but what it points to isn’t just a bug. The distance between “works on one platform” and “works for anyone on any platform” is far greater than you’d think.

Next episode: someone not only got it running, they started adding things. But what they added wasn’t what you’d expect.


Silicon Team S3: From “I Can Use It” to “Others Can Use It” ← S2E08: Auditing the Tool With Itself | S3E02: They Add Roles, But Won’t Touch the Core →

Comments