AI-Conductor Model
Human-AI co-creation as artistic practice
Related Live Sites
AI-Conductor Model Concept Sketch
The Model
Every creative team in 2026 is figuring out how to work with AI. Most approaches are "AI replaces human work" or "human ignores AI."[1] Shneiderman's framework Ben Shneiderman's 2022 book 'Human-Centered AI' — argues AI should amplify humans, not replace them proposes a third path: systems that amplify human capability rather than replacing it. I developed the AI-conductor model, where human acts as conductor — setting direction, maintaining quality, making every structural decision — while AI acts as an instrument capable of producing drafts at speed. This isn't "AI wrote my portfolio." It's a designed practice with explicit roles, quality gates, and attribution.[2]
The Four-Phase Workflow
The conductor model operates through four distinct phases, each with defined role assignments and quality criteria that must be met before advancing. The separation enforces a principle: AI generation without human direction produces plausible but untrustworthy output, while human direction without AI generation produces trustworthy but unscalable output.[3] Csikszentmihalyi's research Mihaly Csikszentmihalyi's work on creative 'flow' states — the critical creative act is selection, not generation on creative flow emphasizes that the critical creative act is selection — choosing what to keep, what to discard, what to reshape. The conductor model formalizes this at industrial scale.
Phase 1: Context Assembly (Human-Led)
The human assembles everything the AI needs to generate accurate documentation: the repository's actual code, directory structure, test suites, and configuration files; the registryA centralized, queryable data store that serves as the single source of truth for all entities and their relationships in a system entry declaring name, org, status, dependencies, tier, and portfolio relevance; the organ's mission statement and documentation standards; constitutional principles and quality gates; cross-references to related repos; and specific details the AI cannot infer — test counts, coverage percentages, deployment URLs, architectural decisions not evident from code alone.
Context assembly is the most important phase because it determines the ceiling of output quality. An AI given thin context produces thin documentation — generic boilerplate about "leveraging cutting-edge technologies." An AI given rich context — specific test counts, real architectural decisions, actual deployment details — produces documentation that reads like it was written by someone who understands the project. The difference is not in the AI's capability. It is in the human's preparation. For the Silver Sprint (58 READMEs), context assembly took approximately 15-30 minutes per repo, totaling 20-30 hours across all repos.[4]
Phase 2: Generation (AI-Led)
The AI generates the first draft from assembled context, quality standards (word count targets, rubric criteria), template structure (problem statement, architecture, usage, testing, positioning, cross-references), and specific instructions about what to include and what to avoid. A single generation pass consumes approximately 15,000-20,000 tokens and produces 3,000-4,500 words of structured documentation in seconds to minutes. The raw generation is never the final product — it contains approximately 70-80% usable content and 20-30% material requiring revision: overstatements, imprecise technical descriptions, incorrect cross-references, and occasionally fabricated details hallucinated from pattern-matching rather than the provided context.
Phase 3: Review (Human-Led)
The human reviews for factual accuracy (does the README correctly describe the code?), voice consistency (does it match the authorial register across the system?), portfolio positioning (does it connect appropriately to the eight-organ system?), and cross-reference integrity (are links correct, do dependency claims match the registryA centralized, queryable data store that serves as the single source of truth for all entities and their relationships in a system?). Review for a standard README takes 30-45 minutes. Flagship READMEs take 60-90 minutes because quality standards and portfolio positioning must be precise.[5] Kahneman's distinction between System 1 (fast, automatic) and System 2 (slow, deliberate) thinking maps directly: AI operates in System 1 mode, producing fluent text rapidly; human review is System 2, catching errors that fluency masks.
Phase 4: Validation (Automated + Human)
After review, documents pass through automated quality gates: registryA centralized, queryable data store that serves as the single source of truth for all entities and their relationships in a system gate (is the entry updated?), portfolio gate (does it pass the Stranger Test with score >= 90/100 for flagships?), dependency gate (do cross-references respect unidirectional flow?), and completeness gate (zero TBD markers, zero broken links, no placeholder content). Six validation scripts run across the full corpus after each sprint.[6]
| Human Role | AI Role |
|---|---|
| Strategic direction | Draft generation at speed |
| Context assembly | Consistent formatting |
| Quality criteria definition | Template compliance |
| Voice and tone calibration | Volume production |
| Factual accuracy review | Cross-reference checking |
| Portfolio positioning | Revision iteration |
| Final approval | Structural scaffolding |
Token Economics
The fundamental economic insight: the bottleneck shifts from production to review. Writing 740K words is no longer the expensive part. Reviewing 740K words for accuracy, voice, and positioning is. Effort is measured in LLM API tokens, not human-hours.[7]
| Task Type | Token Budget | Human Time | Equivalent Writer Cost |
|---|---|---|---|
| README Rewrite (flagship) | ~72K tokens | 60-90 min | $375-750 |
| README Populate (new) | ~88K tokens | 45-60 min | $375-750 |
| README Revise (standard) | ~50K tokens | 30-45 min | $225-375 |
| README Evaluate | ~24K tokens | 15-20 min | $75-150 |
| Essay (4,000-5,000 words) | ~120K tokens | 90-120 min | $750-1,500 |
| Validation Pass (per repo) | ~15K tokens | 5-10 min | $37-75 |
| GitHub Actions Workflow | ~55K tokens | 30-45 min | $225-375 |
Total system budget: ~6.5 million tokens
Words produced: 740,907
Token-to-word ratio: ~16:1 (includes context, revision, validation)
Human hours equivalent: ~3,704.535+ hours at 200 words/hour
Actual human time: ~400 hours (direction + review)
Efficiency multiplier: ~9x over manual authoring
API cost per README: ~$1-3 (at Feb 2026 pricing)
Traditional writer cost: ~$36K-108K for equivalent corpus Phase-Level Budgets
The 6.5M token budget was distributed across three implementation phases. The compressed timeline — all phases completed in approximately 48 hours rather than the originally envisioned multi-week sprint cycle — meant human review was intensive. The token budget proved accurate as a measure of AI processing; the human time was underestimated because 48 hours of continuous review is qualitatively different from 4 weeks of distributed review.
| Phase | TE Budget | Deliverables |
|---|---|---|
| Phase 1: Documentation (Bronze + Silver + Gold) | ~4.4M TE | 72 READMEs, 8 org profiles, 5 essays |
| Phase 2: Micro-Validation | ~1.0M TE | Link audit, TBD scan, registry reconciliation |
| Phase 3: Integration | ~1.1M TE | Gap-fill sprint, orchestration flagship |
Quality Assurance Pipeline
Governance. Every AI-generated document passes through the same promotion state machineA model describing all possible states a system can be in and the transitions between them as everything else in the eight-organ system. Specifications, quality gates, validation checklists — authorship method is irrelevant to the standard.[8] Deming's principle that quality must be built into the process, not inspected into the product, applies: the conductor model prevents the most common AI failure — plausible text that doesn't say anything useful — by embedding quality checks at every stage.
Automated Validation Suite
Six automated validation scripts run after each sprint across the full corpus:
V1 Link Audit 1,267 links scanned → 7 broken found, fixed
V2 TBD Scan 12 matches → all false positives (contextual usage)
V3 Registry Reconcile 1 missing entry + 40 description mismatches
V4 Dependency Check 1 back-edge violation (III→II) → fixed
V5 Constitution Gates Registry, Portfolio, Dependency, Completeness → all pass
V6 Organ Checks 8/8 organs verified individually System-Wide Auditing
The monthly organ audit workflow runs on the first of each month, producing a comprehensive health report. This creates a time series of system quality — not just a snapshot at launch, but an ongoing record that tracks drift, regression, and improvement. The audit surfaces slow-moving quality issues: documentation that becomes inaccurate after code changes, new repos missing from the registryA centralized, queryable data store that serves as the single source of truth for all entities and their relationships in a system, dependency relationships that changed without registry updates.
Risks and Mitigations
Amendment D of the system constitution — the AI Non-Determinism Acknowledgment — states: "Same inputs produce different strategic outputs across AI models and across time. All AI-generated deliverables require human review."[9] Floridi argues that AI transparency is not merely ethical — it's epistemologically necessary. The risks are real, specific, and observed during production.
| Risk | Incidence | Detection | Mitigation |
|---|---|---|---|
| Hallucinated details | 15-20% of drafts | Manual fact-checking | Every number verified against actual repos |
| Generic boilerplate | Increases with scale | Specificity enforcement | Replace vague claims with concrete numbers |
| Broken cross-references | 1 structural violation | Automated V4 script | Layered human + automated validation |
| Voice drift | After ~40 READMEs | Organ-batch comparison | Batch by organ with organ-specific voice |
Voice drift deserves particular attention. Over a long generation session (58 READMEs in a single sprint), AI output becomes more formulaic. The first 5 READMEs in a sprint are the most distinctive; by README 40, the human attention to voice diminishes. The mitigation is batching by organ: all ORGAN-I repos generated together with ORGAN-I-specific context, then ORGAN-II, then ORGAN-III. The theoretical precision of ORGAN-I READMEs remains distinct from the commercial, metric-oriented register of ORGAN-III.
Attribution and Transparency
Attribution. Every document is transparent about its production method. No pretense that a human typed 740K words — and no pretense that AI produced quality work unsupervised. Readers who know the production method can calibrate their trust appropriately.[10]
Artistic intent. The conductor metaphor is literal. A conductor doesn't play instruments, but the performance is their artistic vision. Bourdieu's concept of cultural capital applies: the structural decisions — what goes in each document, how documents relate, what to emphasize for which audience — these are human decisions that constitute the creative work. AI is the orchestra.
What This Model Does Not Do
The conductor model does not write code. The 740K words are documentation — READMEs, essays, profiles, governance files. The code in the system was written by a human developer over years of sustained effort. The model does not replace human judgment about strategy, positioning, or quality. It does not guarantee accuracy — Amendment D exists because AI-generated content requires verification. And it does not produce creative insight: the meta-system essays are structured by human thinking; the AI executes the structure and generates the prose.[11]
Results
The eight-organ system is proof that human-AI collaboration produces real output at scale — not blog posts, but governance specifications, technical documentation, and systems architecture. Licklider's 1960 vision of "man-computer symbiosis" — where humans set goals and computers handle routine processing — is realized here at the scale of an entire institutional system.
Sprint Breakdown
The initial documentation campaign ran across four sprints in approximately 48 hours. A professional technical writer producing 3,000 words per day would need approximately 90 working days — four and a half months — to produce the equivalent output.
| Sprint | Deliverables | Words | Duration |
|---|---|---|---|
| Bronze | 7 flagship READMEs (4,000+ words each) | ~28K | ~8 hours |
| Silver | 58 standard READMEs (3,000+ words each) | ~202K | ~24 hours |
| Gold | 5 essays, community health files, orchestration | ~22K | ~10 hours |
| Gap-Fill | 13 additional READMEs, orchestration flagship | ~18K | ~6 hours |
Lessons Learned
- Context is the bottleneck, not generation. The 15-30 minutes spent assembling context per repo determines output quality more than any model parameter or prompt engineering technique. Rich context produces specific documentation; thin context produces boilerplate regardless of model capability.
- Volume vs. voice. AI generates text quickly, but maintaining a consistent authorial voice across 740K words requires extensive human editing. The seams show if review is rushed.
- Transparency vs. perception. Being fully transparent about AI involvement risks the reaction "AI wrote your portfolio." The mitigation is honesty: every essay explains the process, and the quality speaks for itself. Hiding AI involvement would be worse.[12]
- Quality gates add time but prevent catastrophe. Template compliance, link checking, and human review add 30-40% overhead to each document. Without them, plausible-sounding text masks factual errors. The overhead is the cost of trustworthy output. Deming was right: build quality into the process.
- Batching by domain preserves voice. Generating all repos within an organ together, with organ-specific voice direction, prevents the corpus from converging toward a generic mean.
- Maintenance is harder than production. Producing 740K words is the solved problem. Keeping 740K words accurate as 116 repositories evolve requires automated drift detection, efficient regeneration, and continuous re-validation — the next frontier for this methodology.
Replicability
The methodology is reusable. The components are: a registryA centralized, queryable data store that serves as the single source of truth for all entities and their relationships in a system or inventory of what needs to be documented; a quality rubric defining "good" documentation; templates structuring generation; access to a capable AI model; human expertise in the domain being documented (not writing expertise, but domain expertise sufficient to verify output); and automated validation checking structural correctness. The eight-organ system is one configuration; the methodology is general.
The quality gates are adaptable. The attribution model is honest. For a creative team evaluating how to integrate AI into practice, this is a working model, not a pitch deck.
References
- Shneiderman, Ben. Human-Centered AI. Oxford University Press, 2022.
- Schön, Donald A.. The Reflective Practitioner: How Professionals Think in Action. Basic Books, 1983.
- Csikszentmihalyi, Mihaly. Creativity: Flow and the Psychology of Discovery and Invention. Harper Perennial, 1996.
- Brooks, Frederick P.. The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, 1975.
- Kahneman, Daniel. Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011.
- Humble, Jez and David Farley. Continuous Delivery: Reliable Software Releases through Build, Test, and Deploy Automation. Addison-Wesley, 2010.
- Brynjolfsson, Erik and Andrew McAfee. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton, 2014.
- Deming, W. Edwards. Out of the Crisis. MIT Press, 1986.
- Floridi, Luciano. The Ethics of Artificial Intelligence. Oxford University Press, 2023.
- Bourdieu, Pierre. The Field of Cultural Production. Columbia University Press, 1993.
- Licklider, J.C.R.. Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics, 1960.
- Zuboff, Shoshana. The Age of Surveillance Capitalism. PublicAffairs, 2019.