Generative Music System

The Translation Problem

How do you get from a formal system to something people actually experience? That's the core design problem of ORGAN-II. This project translates recursive narrative principles from RE:GE into a real-time generative music system. The music doesn't illustrate the narrative — it is the narrative, in a different medium.^[1] Eno's concept of generative music — systems that produce ever-different and changing results through rules rather than fixed compositions — provides the philosophical foundation. The choices made during translation are themselves artistic decisions, and that's where the interesting work lives.^[2]

Xenakis opened the field in 1963 by applying stochastic mathematics to compositional structures, arguing that music could be generated from probability clouds rather than authored note-by-note.^[12] Cope's Experiments in Musical Intelligence demonstrated that style-specific grammars could generate plausible new works in the manner of established composers.^[13] This project stands in both traditions — procedural in its generation, but grounded in an external symbolic system rather than a corpus of prior compositions.

Algorithmic Composition Techniques

I built the system around three algorithmic techniques, each handling a different time scale of music. Markov chains generate melodies note by note, L-systems create self-similar rhythmic patterns, and a stochastic process controls harmonic density. Together they produce music that stays coherent locally while remaining unpredictable over long stretches.

Three core algorithmic techniques drive the system, selected for their complementary properties across time scales.

Markov Chain Melodic Generation

I use Markov chains for melody, where each note's probability depends on the one before it. When the narrative is stable, the melody moves in small steps; when transformation intensity rises, it starts making dramatic leaps. The matrices reset at each narrative phase transition, creating audible "chapters" in the music.

At the note level, melodic contours are generated by first-order Markov chains whose transition matrices are weighted by the current symbolic event type from RE:GE.^[14] Dodge and Jerse established the theoretical basis for stochastic melodic generation in their foundational computer music text — the chain's memory of one prior note produces locally coherent phrases while permitting global unpredictability. When identity stability is high, the transition matrix is biased toward stepwise motion (intervals of M2/m3); as transformation intensity rises, the matrix shifts toward larger leaps and chromatic inflections. The matrices are recomputed at each ritual phase transition rather than updated continuously, creating perceptible melodic "chapters" that correspond to narrative structure.

markov-melody.ts

// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/markov.ts

interface MarkovMatrix {
  fromPitch: number;     // MIDI pitch class 0–11
  transitions: Map<number, number>; // target pitch class → probability
}

function buildMatrix(eventIntensity: number): MarkovMatrix[] {
  // Low intensity: stepwise bias (m2/M2/m3)
  // High intensity: leap bias (P4/P5/tritone)
  const leapWeight = eventIntensity;
  const stepWeight = 1 - eventIntensity;

  return PITCH_CLASSES.map((pc) => ({
    fromPitch: pc,
    transitions: new Map([
      [pc + 1, stepWeight * 0.35],   // m2
      [pc + 2, stepWeight * 0.30],   // M2
      [pc + 3, stepWeight * 0.15],   // m3
      [pc + 5, leapWeight * 0.25],   // P4
      [pc + 7, leapWeight * 0.20],   // P5
      [pc + 6, leapWeight * 0.10],   // tritone
    ]),
  }));
}

Markov transition matrix computation — transition probabilities shift with symbolic event intensity

L-System Rhythmic Structure

I generate rhythmic patterns using L-systems -- string-rewriting rules originally designed to simulate plant growth. A simple rhythmic cell gets expanded recursively into a full bar pattern, and the narrative phase determines which expansion rules are active. Ceremony phases produce symmetrical, structured rhythms; transformation phases introduce syncopation and displacement.

Rhythmic patterns are generated through Lindenmayer systems — string-rewriting grammars originally devised for botanical growth simulation that produce self-similar rhythmic structures across multiple time scales.^[15] An L-system axiom defines the initial rhythmic cell; production rules expand it recursively into a full-bar pattern. The ritual phase from RE:GE selects which production rules are active — ceremony phases produce highly structured, symmetrical rhythms (high regularity), while transformation phases activate rules that introduce syncopation and metric displacement. The self-similar quality of L-system output means that rhythmic motifs at the beat level echo structural patterns at the bar level, creating a coherence that listeners perceive without being able to articulate.

lsystem-rhythm.ts

// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/lsystem.ts

type RhythmToken = 'Q' | 'E' | 'S' | 'R'; // quarter, eighth, sixteenth, rest

const PRODUCTION_RULES: Record<string, Record<RhythmToken, RhythmToken[]>> = {
  ceremony: {
    Q: ['Q', 'Q'],
    E: ['E', 'E'],
    S: ['S', 'R', 'S'],
    R: ['R'],
  },
  transformation: {
    Q: ['E', 'S', 'E'],          // subdivide for complexity
    E: ['S', 'R', 'S', 'S'],     // syncopate
    S: ['S', 'S', 'S', 'S'],
    R: ['S', 'R'],
  },
};

function expandRhythm(axiom: RhythmToken[], phase: string, depth: number): RhythmToken[] {
  if (depth === 0) return axiom;
  const rules = PRODUCTION_RULES[phase] ?? PRODUCTION_RULES.ceremony;
  const expanded = axiom.flatMap((token) => rules[token] ?? [token]);
  return expandRhythm(expanded, phase, depth - 1);
}

L-system rhythm engine — production rules selected by ritual phase identifier

Stochastic Harmonic Density

Harmonic complexity — the density and dissonance of the chord voicing at any moment — is controlled by a stochastic process whose parameters drift continuously with the transformation intensity reading from RE:GE.^[12] Xenakis's stochastic music employs Gaussian and Poisson distributions to produce textures that are neither totally random nor periodic — the same principle here governs how many simultaneous pitch classes are active, whether extensions (7ths, 9ths, 11ths) are present, and how rapidly the harmonic center drifts. The result is a harmonic envelope that breathes with the narrative: periods of symbolic stability produce consonant, slow-moving harmony; moments of peak transformation produce dense, chromatic clusters that resolve as the next stable phase begins.

Three-Layer Architecture

Layer 1: The Symbolic Engine

RE:GE provides the structural backbone — a stream of typed, timestamped symbolic events: entity state changes, ritual phase transitions, myth function activations, recursive depth changes. These events are abstract and carry no inherent sonic representation.^[3] Hofstadter's insight that formal systems can generate meaning through structural relationships — not through any intrinsic semantic content — is precisely what makes this translation possible. The symbolic events are meaningful because of how they relate to each other, not because of what they "sound like."

Layer 2: The Sonification Bridge

This is where the artistic decisions live. The bridge maps symbolic events to musical parameters:^[4] Hermann et al. establish that effective sonification requires a principled mapping between data dimensions and auditory parameters — arbitrary mappings produce noise, while structurally motivated mappings produce comprehensible sound. Each row in the mapping table below represents a deliberate choice grounded in music-theoretic reasoning.

Symbolic Event	Musical Parameter	Algorithm	Rationale
Identity stability	Tonal center strength	Markov bias weight	Stable identity = clear tonic
Transformation intensity	Harmonic complexity	Stochastic density	Greater change = more tension
Ritual phase	Rhythmic pattern	L-system rules	Ceremony = structured time
Recursive depth	Timbral layering	Voice count	Self-reference = voices within voices
Myth function type	Melodic contour	Markov transitions	Hero ascends, villain descends

Figure 1. Sonification bridge mapping — each symbolic event type is translated to a musical parameter through a music-theoretically motivated rationale

sonification-bridge.ts

// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/bridge.ts

interface SymbolicEvent {
  type: 'identity' | 'transformation' | 'ritual' | 'recursion' | 'myth';
  timestamp: number;
  intensity: number;    // 0.0 – 1.0
  depth: number;        // recursive nesting level
  phase?: string;       // ritual phase identifier
  function?: string;    // myth function name
}

interface MusicalParams {
  tonalCenter: number;        // scale degree 0–11
  harmonicComplexity: number; // 0.0 – 1.0 (stochastic density parameter)
  rhythmPattern: RhythmToken[]; // L-system expansion result
  timbralLayers: number;      // voice count (tracks recursive depth)
  melodicContour: number[];   // Markov chain pitch sequence
}

function sonify(event: SymbolicEvent): MusicalParams {
  return {
    tonalCenter: mapIdentityToTonic(event.intensity),
    harmonicComplexity: mapTransformToTension(event.intensity),
    rhythmPattern: expandRhythm(['Q', 'E', 'Q'], event.phase ?? 'ceremony', 2),
    timbralLayers: Math.min(event.depth + 1, 6),
    melodicContour: generateMarkovSequence(buildMatrix(event.intensity), 8),
  };
}

Sonification bridge core — maps symbolic events from RE:GE to musical parameter values through weighted transformation functions

Layer 3: The Performance System

I directed the implementation of a real-time audio synthesis engine utilizing Tone.js and the WebAudio API. Designed for live contexts — gallery installations, concert performances, and interactive exhibits — the system listens to the symbolic engine and responds with sub-10ms latency invariants. This layer handles spatialization, oscillator management, and real-time interaction buffers.^[5]

Full signal and data flow — symbolic events from RE:GE are translated through the sonification bridge into real-time synthesis, with performer interaction feeding back into the narrative engine

Interactive waveform visualization: three layered waveforms representing the Symbolic Engine, Sonification Bridge, and Performance System. Horizontal mouse controls time speed. Click fires a symbolic event propagating through all layers. Vertical mouse controls recursive depth, triggering...Interactive waveform visualization: three layered waveforms representing the Symbolic Engine (angular), Sonification Bridge (smooth), and Performance System (complex). Horizontal mouse controls time speed. Click fires a symbolic event propagating through all layers. Vertical mouse controls recursive depth, triggering counterpoint voice splits.

Integration: Max/MSP Bridge and Real-Time Interaction

The performance system does not operate in isolation. A bidirectional Max/MSP bridge — implemented as an OSC (Open Sound Control) transport layer — connects the generative engine to the broader live electronics ecosystem.^[16] Winkler's foundational text on Max/MSP interactive music composition establishes the design patterns this bridge follows: event dispatch, timing quantization, and the performer-as-parameter model where human gesture becomes data input rather than direct sound control.

The bridge operates in two directions. Outbound: the generative engine emits fragment charge levels, timbral layer counts, and harmonic density values as OSC messages to Max/MSP patches, which drive external hardware synthesizers, signal processors, and spatialization systems. Inbound: the performer's gestural controllers (pressure sensors, accelerometers, breath control) emit events that re-enter the engine as synthetic symbolic events — a physical gesture becomes a myth function activation, creating a feedback loop between human performance and machine narrative.

Max/MSP integration topology — bidirectional OSC transport connecting the generative engine to hardware synthesis and performer controllers

Three interaction modes are supported without modifying the core engine: autonomous (the system runs from symbolic events alone, with no performer input), guided (a performer shapes narrative direction via gestural control while the engine fills remaining parameters), and conducted (the performer drives all major phase transitions while the engine handles micro-level algorithmic generation). Each mode corresponds to a different allocation of creative agency between human and machine.^[17]

The Discovery: Recursion Sounds Like Counterpoint

This was the project's unplanned breakthrough. I first tried mapping recursive depth to reverb, and it sounded terrible. Then I realized recursion already is counterpoint -- each level of self-reference becomes a new voice commenting on the voices below it. The result sounds like Bach: distinct, layered, and clear enough to follow each level independently.

This was the project's defining moment, and it wasn't planned. When the recursive engine enters self-referential processing — an entity examining itself, a system modifying its own rules — we initially tried mapping recursive depth to reverb. It sounded terrible.^[7] Fux codified the rules of counterpoint as a pedagogical system — species counterpoint , where each "species" adds a layer of rhythmic and melodic complexity atop a cantus firmus. What we discovered is that recursion already is counterpoint: each level of self-reference is a new voice commenting on the voices below it, following rules that derive from but are not identical to the original.

I gave each recursive level its own melodic voice, derived from but distinct from its parent. Listeners consistently describe it as "voices arguing" or "a conversation with itself" -- accurately capturing the self-referential structure without knowing anything about the underlying system. The formal math created a musical insight that pure intuition wouldn't have found.

Counterpoint emerged from experimentation: each recursive level gets its own melodic voice, related to but distinct from its parent. The result is Bach-like clarity where you can follow each level of self-reference. Voices commenting on voices. The formal system created the conditions for a musical insight that pure intuition wouldn't have found.^[8] Lerdahl and Jackendoff's generative theory demonstrates that musical understanding is hierarchical — listeners parse music into nested grouping structures and metrical structures, precisely the kind of recursive nesting that the engine's symbolic events already encode.

The timbral layer manager implements this insight directly: depth 0 assigns a sine-wave voice at the base frequency; depth 1 assigns a sawtooth voice with a Markov chain derived from the parent's transitions; depth 2 adds a filtered noise voice whose cutoff frequency tracks the parent's harmonic density. Each voice layer follows species counterpoint rules — first species (note against note) at low intensity, second and third species (syncopated and florid) as transformation intensity rises. The voices are rendered through independent WebAudio oscillator nodes connected to a shared gain envelope, so each layer can be independently muted for rehearsal or diagnostic purposes.

Time Is the Hardest Translation

Narrative time and musical time operate on different scales. We tried linear compression (boring), event-driven (sparse), and finally landed on continuous with event punctuation — an ongoing musical texture driven by entity state, punctuated by significant events. This works because it mirrors how we experience narrative: continuous consciousness punctuated by significant moments.^[9] Meadows' distinction between stocks and flows maps directly onto the time problem: entity state is a stock that changes continuously, while symbolic events are discrete flows that perturb the system.

Time-mapping approaches attempted — from linear compression through event-driven to the final continuous-with-punctuation model

The engine runs an internal clock at 48kHz sample rate, synchronized to the WebAudio API's AudioContext timeline. Symbolic events from RE:GE arrive asynchronously; the bridge queues them and schedules delivery at the next bar boundary to avoid mid-phrase parameter jumps. This introduces a maximum 2-bar latency between symbolic event and audible response — acceptable for gallery contexts, tight for live concert. For the concert mode, an override parameter allows immediate scheduling at the cost of occasional phrase truncation.

Output and Results

The system has produced three categories of output: continuous generative compositions for gallery installation, live concert works with performer interaction, and network performance experiments with distributed instances.

Gallery recordings range from 6 to 14 hours of continuous unlooped output, demonstrating that the combination of Markov melodic generation, L-system rhythmic structure, and stochastic harmonic density produces material that does not repeat perceptibly across extended durations. The Markov chain state space (12 pitch classes × 12 pitch classes = 144 transitions) combined with 8 possible ritual phases and 6 recursive depth levels gives approximately 6,912 distinct parameter configurations — enough for genuine long-form variety.

Concert performances run 20–45 minutes covering one complete mythic cycle. The performer guides narrative direction via gestural control while the engine handles all micro-level algorithmic generation. Audience feedback has consistently identified the counterpoint-as-recursion mapping as perceptually distinctive — listeners describe the multi-voice texture as "voices arguing" or "a conversation with itself," accurately capturing the self-referential symbolic content without having been told about the underlying system.^[6] Small's concept of musicking — music as an activity rather than an object — resonates here: the performance is a process of symbolic reasoning made audible, not a fixed composition being reproduced.

Network performances connect multiple instances across geographically distributed performers, each running an independent symbolic engine whose events are broadcast to all other instances via WebSocket transport. The shared symbolic events create emergent harmonic relationships between instances — when two engines simultaneously activate high-intensity transformation events, their independently generated Markov chains probabilistically converge toward tritone relationships, producing a coordinated dissonance that no single performer directed.^[10]

Performance Contexts

Gallery installation — continuous 6–14 hour operation; spatial audio via 8-channel VBAP creates distinct narrative zones corresponding to symbolic engine sectors
Live concert — performer shapes narrative via gestural control; 20–45 minutes; one complete mythic cycle from separation through return
Network performance — multiple instances contributing to a shared narrative space via WebSocket event broadcast; emergent coordination without shared clock

Each context demands a different relationship between system autonomy and human control.^[11] Galanter identifies the central tension of generative art as the dial between order and chaos — highly ordered systems produce predictable but dull output; highly chaotic systems produce unpredictable but incoherent output. The three performance contexts represent three deliberate positions on this dial: gallery installation biases toward order (the system must sustain interest autonomously over 14 hours); network performance biases toward chaos (emergent coordination is the aesthetic); concert performance sits at the midpoint, with the performer as the dynamic regulator of the order-chaos balance.

By the Numbers

Architecture Layers

Algorithmic Techniques

Symbolic Event Types

Musical Parameters

Performance Modes

14h

Max Continuous Run

8ch

Spatial Audio

<10ms

Synthesis Latency

Figure 2. System metrics — three algorithmic techniques driving five musical parameters across three performance contexts

References

Eno, Brian. Generative Music. In Motion Magazine, 1996.
Roads, Curtis. The Computer Music Tutorial. MIT Press, 1996.
Hofstadter, Douglas. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, 1979.
Hermann, Thomas, Andy Hunt, and John G. Neuhoff. The Sonification Handbook. Logos Publishing House, 2011.
Rowe, Robert. Interactive Music Systems: Machine Listening and Composing. MIT Press, 1993.
Small, Christopher. Musicking: The Meanings of Performing and Listening. Wesleyan University Press, 1998.
Fux, Johann Joseph. Gradus ad Parnassum. Vienna (trans. Norton, 1965), 1725.
Lerdahl, Fred and Ray Jackendoff. A Generative Theory of Tonal Music. MIT Press, 1983.
Meadows, Donella H.. Thinking in Systems: A Primer. Chelsea Green Publishing, 2008.
Murray, Janet H.. Hamlet on the Holodeck: The Future of Narrative in Cyberspace. MIT Press, 1997.
Galanter, Philip. What is Generative Art? Complexity Theory as a Context for Art Theory. International Conference on Generative Art, 2003.
Xenakis, Iannis. Formalized Music: Thought and Mathematics in Composition. Pendragon Press, 1963.
Cope, David. Computers and Musical Style. A-R Editions, 1991.
Dodge, Charles and Thomas A. Jerse. Computer Music: Synthesis, Composition, and Performance. Schirmer Books, 1997.
Prusinkiewicz, Przemysław and Aristid Lindenmayer. The Algorithmic Beauty of Plants. Springer-Verlag, 1990.
Winkler, Todd. Composing Interactive Music: Techniques and Ideas Using Max. MIT Press, 1998.
Oliveros, Pauline. Deep Listening: A Composer's Sound Practice. iUniverse, 2005.

Generative Music System

Related Live Sites