Skip to main content
← All projects
ArtAudioPerformance

Generative Music System

From recursive theory to real-time sound

Related Live Sites

Generative Music System Concept Sketch

Algorithmic visualization representing the underlying logic of Generative Music System. Source: Dynamic Generation

The Translation Problem

How do you get from a formal system to something people actually experience? That's the core design problem of ORGAN-II. This project translates recursive narrative principles from RE:GE into a real-time generative music system. The music doesn't illustrate the narrative — it is the narrative, in a different medium.[1] Eno's concept of generative music — systems that produce ever-different and changing results through rules rather than fixed compositions — provides the philosophical foundation. The choices made during translation are themselves artistic decisions, and that's where the interesting work lives.[2]

Xenakis opened the field in 1963 by applying stochastic mathematics to compositional structures, arguing that music could be generated from probability clouds rather than authored note-by-note.[12] Cope's Experiments in Musical Intelligence demonstrated that style-specific grammars could generate plausible new works in the manner of established composers.[13] This project stands in both traditions — procedural in its generation, but grounded in an external symbolic system rather than a corpus of prior compositions.

Algorithmic Composition Techniques

Three core algorithmic techniques drive the system, selected for their complementary properties across time scales.

Markov Chain Melodic Generation

At the note level, melodic contours are generated by first-order Markov chains whose transition matrices are weighted by the current symbolic event type from RE:GE.[14] Dodge and Jerse established the theoretical basis for stochastic melodic generation in their foundational computer music text — the chain's memory of one prior note produces locally coherent phrases while permitting global unpredictability. When identity stability is high, the transition matrix is biased toward stepwise motion (intervals of M2/m3); as transformation intensity rises, the matrix shifts toward larger leaps and chromatic inflections. The matrices are recomputed at each ritual phase transition rather than updated continuously, creating perceptible melodic "chapters" that correspond to narrative structure.

markov-melody.ts
// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/markov.ts

interface MarkovMatrix {
  fromPitch: number;     // MIDI pitch class 0–11
  transitions: Map<number, number>; // target pitch class → probability
}

function buildMatrix(eventIntensity: number): MarkovMatrix[] {
  // Low intensity: stepwise bias (m2/M2/m3)
  // High intensity: leap bias (P4/P5/tritone)
  const leapWeight = eventIntensity;
  const stepWeight = 1 - eventIntensity;

  return PITCH_CLASSES.map((pc) => ({
    fromPitch: pc,
    transitions: new Map([
      [pc + 1, stepWeight * 0.35],   // m2
      [pc + 2, stepWeight * 0.30],   // M2
      [pc + 3, stepWeight * 0.15],   // m3
      [pc + 5, leapWeight * 0.25],   // P4
      [pc + 7, leapWeight * 0.20],   // P5
      [pc + 6, leapWeight * 0.10],   // tritone
    ]),
  }));
}
Markov transition matrix computation — transition probabilities shift with symbolic event intensity

L-System Rhythmic Structure

Rhythmic patterns are generated through Lindenmayer systems — string-rewriting grammars originally devised for botanical growth simulation that produce self-similar rhythmic structures across multiple time scales.[15] An L-system axiom defines the initial rhythmic cell; production rules expand it recursively into a full-bar pattern. The ritual phase from RE:GE selects which production rules are active — ceremony phases produce highly structured, symmetrical rhythms (high regularity), while transformation phases activate rules that introduce syncopation and metric displacement. The self-similar quality of L-system output means that rhythmic motifs at the beat level echo structural patterns at the bar level, creating a coherence that listeners perceive without being able to articulate.

lsystem-rhythm.ts
// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/lsystem.ts

type RhythmToken = 'Q' | 'E' | 'S' | 'R'; // quarter, eighth, sixteenth, rest

const PRODUCTION_RULES: Record<string, Record<RhythmToken, RhythmToken[]>> = {
  ceremony: {
    Q: ['Q', 'Q'],
    E: ['E', 'E'],
    S: ['S', 'R', 'S'],
    R: ['R'],
  },
  transformation: {
    Q: ['E', 'S', 'E'],          // subdivide for complexity
    E: ['S', 'R', 'S', 'S'],     // syncopate
    S: ['S', 'S', 'S', 'S'],
    R: ['S', 'R'],
  },
};

function expandRhythm(axiom: RhythmToken[], phase: string, depth: number): RhythmToken[] {
  if (depth === 0) return axiom;
  const rules = PRODUCTION_RULES[phase] ?? PRODUCTION_RULES.ceremony;
  const expanded = axiom.flatMap((token) => rules[token] ?? [token]);
  return expandRhythm(expanded, phase, depth - 1);
}
L-system rhythm engine — production rules selected by ritual phase identifier

Stochastic Harmonic Density

Harmonic complexity — the density and dissonance of the chord voicing at any moment — is controlled by a stochastic process whose parameters drift continuously with the transformation intensity reading from RE:GE.[12] Xenakis's stochastic music employs Gaussian and Poisson distributions to produce textures that are neither totally random nor periodic — the same principle here governs how many simultaneous pitch classes are active, whether extensions (7ths, 9ths, 11ths) are present, and how rapidly the harmonic center drifts. The result is a harmonic envelope that breathes with the narrative: periods of symbolic stability produce consonant, slow-moving harmony; moments of peak transformation produce dense, chromatic clusters that resolve as the next stable phase begins.

Three-Layer Architecture

Layer 1: The Symbolic Engine

RE:GE provides the structural backbone — a stream of typed, timestamped symbolic events: entity state changes, ritual phase transitions, myth function activations, recursive depth changes. These events are abstract and carry no inherent sonic representation.[3] Hofstadter's insight that formal systems can generate meaning through structural relationships — not through any intrinsic semantic content — is precisely what makes this translation possible. The symbolic events are meaningful because of how they relate to each other, not because of what they "sound like."

Layer 2: The Sonification Bridge

This is where the artistic decisions live. The bridge maps symbolic events to musical parameters:[4] Hermann et al. establish that effective sonification requires a principled mapping between data dimensions and auditory parameters — arbitrary mappings produce noise, while structurally motivated mappings produce comprehensible sound. Each row in the mapping table below represents a deliberate choice grounded in music-theoretic reasoning.

Symbolic EventMusical ParameterAlgorithmRationale
Identity stabilityTonal center strengthMarkov bias weightStable identity = clear tonic
Transformation intensityHarmonic complexityStochastic densityGreater change = more tension
Ritual phaseRhythmic patternL-system rulesCeremony = structured time
Recursive depthTimbral layeringVoice countSelf-reference = voices within voices
Myth function typeMelodic contourMarkov transitionsHero ascends, villain descends
Figure 1. Sonification bridge mapping — each symbolic event type is translated to a musical parameter through a music-theoretically motivated rationale
sonification-bridge.ts
// View implementation at:
// github.com/organvm-ii-poiesis/generative-audio-engine/src/bridge.ts

interface SymbolicEvent {
  type: 'identity' | 'transformation' | 'ritual' | 'recursion' | 'myth';
  timestamp: number;
  intensity: number;    // 0.0 – 1.0
  depth: number;        // recursive nesting level
  phase?: string;       // ritual phase identifier
  function?: string;    // myth function name
}

interface MusicalParams {
  tonalCenter: number;        // scale degree 0–11
  harmonicComplexity: number; // 0.0 – 1.0 (stochastic density parameter)
  rhythmPattern: RhythmToken[]; // L-system expansion result
  timbralLayers: number;      // voice count (tracks recursive depth)
  melodicContour: number[];   // Markov chain pitch sequence
}

function sonify(event: SymbolicEvent): MusicalParams {
  return {
    tonalCenter: mapIdentityToTonic(event.intensity),
    harmonicComplexity: mapTransformToTension(event.intensity),
    rhythmPattern: expandRhythm(['Q', 'E', 'Q'], event.phase ?? 'ceremony', 2),
    timbralLayers: Math.min(event.depth + 1, 6),
    melodicContour: generateMarkovSequence(buildMatrix(event.intensity), 8),
  };
}
Sonification bridge core — maps symbolic events from RE:GE to musical parameter values through weighted transformation functions

Layer 3: The Performance System

I directed the implementation of a real-time audio synthesis engine utilizing Tone.js and the WebAudio API. Designed for live contexts — gallery installations, concert performances, and interactive exhibits — the system listens to the symbolic engine and responds with sub-10ms latency invariants. This layer handles spatialization, oscillator management, and real-time interaction buffers.[5]

graph TD SE[Layer 1: Symbolic Engine RE:GE] -->|typed timestamped events| SB[Layer 2: Sonification Bridge] SE -->|entity state changes| SB SE -->|ritual phase transitions| SB SE -->|myth function activations| SB SE -->|recursive depth changes| SB SB -->|Markov matrix params| MC[Markov Melodic Generator] SB -->|L-system production rules| LS[L-System Rhythm Engine] SB -->|stochastic density params| SH[Stochastic Harmonic Layer] SB -->|voice count| TL[Timbral Layer Manager] MC -->|pitch sequence| PS[Layer 3: Performance System] LS -->|onset pattern| PS SH -->|chord voicing| PS TL -->|oscillator pool| PS PS -->|WebAudio synthesis| OUT[Live Sound] PS -->|VBAP spatialization| OUT PS -->|OSC messages| MAXMSP[Max/MSP Bridge] MAXMSP -->|gestural control| PS MAXMSP -->|performer events| SE
Full signal and data flow — symbolic events from RE:GE are translated through the sonification bridge into real-time synthesis, with performer interaction feeding back into the narrative engine

Integration: Max/MSP Bridge and Real-Time Interaction

The performance system does not operate in isolation. A bidirectional Max/MSP bridge — implemented as an OSC (Open Sound Control) transport layer — connects the generative engine to the broader live electronics ecosystem.[16] Winkler's foundational text on Max/MSP interactive music composition establishes the design patterns this bridge follows: event dispatch, timing quantization, and the performer-as-parameter model where human gesture becomes data input rather than direct sound control.

The bridge operates in two directions. Outbound: the generative engine emits fragment charge levels, timbral layer counts, and harmonic density values as OSC messages to Max/MSP patches, which drive external hardware synthesizers, signal processors, and spatialization systems. Inbound: the performer's gestural controllers (pressure sensors, accelerometers, breath control) emit events that re-enter the engine as synthetic symbolic events — a physical gesture becomes a myth function activation, creating a feedback loop between human performance and machine narrative.

graph LR GE[Generative Engine] -->|OSC /charge| MAX[Max/MSP Patch] GE -->|OSC /layers| MAX GE -->|OSC /harmony| MAX MAX -->|CV/Gate| SYNTH[Hardware Synthesizers] MAX -->|MIDI CC| FX[Signal Processors] MAX -->|VBAP coords| SPATIAL[8-Channel Spatialization] CTRL[Performer Controllers] -->|pressure, accel| MAX MAX -->|OSC /gesture| GE GE -->|synthetic SymbolicEvent| SE[RE:GE Engine]
Max/MSP integration topology — bidirectional OSC transport connecting the generative engine to hardware synthesis and performer controllers

Three interaction modes are supported without modifying the core engine: autonomous (the system runs from symbolic events alone, with no performer input), guided (a performer shapes narrative direction via gestural control while the engine fills remaining parameters), and conducted (the performer drives all major phase transitions while the engine handles micro-level algorithmic generation). Each mode corresponds to a different allocation of creative agency between human and machine.[17]

The Discovery: Recursion Sounds Like Counterpoint

This was the project's defining moment, and it wasn't planned. When the recursive engine enters self-referential processing — an entity examining itself, a system modifying its own rules — we initially tried mapping recursive depth to reverb. It sounded terrible.[7] Fux codified the rules of counterpoint as a pedagogical system — species counterpoint, where each "species" adds a layer of rhythmic and melodic complexity atop a cantus firmus. What we discovered is that recursion already is counterpoint: each level of self-reference is a new voice commenting on the voices below it, following rules that derive from but are not identical to the original.

Counterpoint emerged from experimentation: each recursive level gets its own melodic voice, related to but distinct from its parent. The result is Bach-like clarity where you can follow each level of self-reference. Voices commenting on voices. The formal system created the conditions for a musical insight that pure intuition wouldn't have found.[8] Lerdahl and Jackendoff's generative theory demonstrates that musical understanding is hierarchical — listeners parse music into nested grouping structures and metrical structures, precisely the kind of recursive nesting that the engine's symbolic events already encode.

The timbral layer manager implements this insight directly: depth 0 assigns a sine-wave voice at the base frequency; depth 1 assigns a sawtooth voice with a Markov chain derived from the parent's transitions; depth 2 adds a filtered noise voice whose cutoff frequency tracks the parent's harmonic density. Each voice layer follows species counterpoint rules — first species (note against note) at low intensity, second and third species (syncopated and florid) as transformation intensity rises. The voices are rendered through independent WebAudio oscillator nodes connected to a shared gain envelope, so each layer can be independently muted for rehearsal or diagnostic purposes.

Time Is the Hardest Translation

Narrative time and musical time operate on different scales. We tried linear compression (boring), event-driven (sparse), and finally landed on continuous with event punctuation — an ongoing musical texture driven by entity state, punctuated by significant events. This works because it mirrors how we experience narrative: continuous consciousness punctuated by significant moments.[9] Meadows' distinction between stocks (accumulations) and flows (rates of change) maps directly onto the time problem: entity state is a stock that changes continuously, while symbolic events are discrete flows that perturb the system.

graph LR A[Linear Compression] -->|too uniform| X1[Rejected] B[Event-Driven Only] -->|too sparse| X2[Rejected] C[Continuous + Punctuation] -->|mirrors experience| Y[Adopted] C --> S[Entity State = Continuous Texture] C --> E[Symbolic Events = Punctuation] S -->|harmonic drift via stochastic process| T[Tonal Movement] S -->|L-system evolution per bar| T E -->|Markov matrix reset| T E -->|rhythmic accent via L-system branch| T T --> O[Output: Musical Time]
Time-mapping approaches attempted — from linear compression through event-driven to the final continuous-with-punctuation model

The engine runs an internal clock at 48kHz sample rate, synchronized to the WebAudio API's AudioContext timeline. Symbolic events from RE:GE arrive asynchronously; the bridge queues them and schedules delivery at the next bar boundary to avoid mid-phrase parameter jumps. This introduces a maximum 2-bar latency between symbolic event and audible response — acceptable for gallery contexts, tight for live concert. For the concert mode, an override parameter allows immediate scheduling at the cost of occasional phrase truncation.

Output and Results

The system has produced three categories of output: continuous generative compositions for gallery installation, live concert works with performer interaction, and network performance experiments with distributed instances.

Gallery recordings range from 6 to 14 hours of continuous unlooped output, demonstrating that the combination of Markov melodic generation, L-system rhythmic structure, and stochastic harmonic density produces material that does not repeat perceptibly across extended durations. The Markov chain state space (12 pitch classes × 12 pitch classes = 144 transitions) combined with 8 possible ritual phases and 6 recursive depth levels gives approximately 6,912 distinct parameter configurations — enough for genuine long-form variety.

Concert performances run 20–45 minutes covering one complete mythic cycle. The performer guides narrative direction via gestural control while the engine handles all micro-level algorithmic generation. Audience feedback has consistently identified the counterpoint-as-recursion mapping as perceptually distinctive — listeners describe the multi-voice texture as "voices arguing" or "a conversation with itself," accurately capturing the self-referential symbolic content without having been told about the underlying system.[6] Small's concept of musicking — music as an activity rather than an object — resonates here: the performance is a process of symbolic reasoning made audible, not a fixed composition being reproduced.

Network performances connect multiple instances across geographically distributed performers, each running an independent symbolic engine whose events are broadcast to all other instances via WebSocket transport. The shared symbolic events create emergent harmonic relationships between instances — when two engines simultaneously activate high-intensity transformation events, their independently generated Markov chains probabilistically converge toward tritone relationships, producing a coordinated dissonance that no single performer directed.[10]

Performance Contexts

  • Gallery installation — continuous 6–14 hour operation; spatial audio via 8-channel VBAP creates distinct narrative zones corresponding to symbolic engine sectors
  • Live concert — performer shapes narrative via gestural control; 20–45 minutes; one complete mythic cycle from separation through return
  • Network performance — multiple instances contributing to a shared narrative space via WebSocket event broadcast; emergent coordination without shared clock

Each context demands a different relationship between system autonomy and human control.[11] Galanter identifies the central tension of generative art as the dial between order and chaos — highly ordered systems produce predictable but dull output; highly chaotic systems produce unpredictable but incoherent output. The three performance contexts represent three deliberate positions on this dial: gallery installation biases toward order (the system must sustain interest autonomously over 14 hours); network performance biases toward chaos (emergent coordination is the aesthetic); concert performance sits at the midpoint, with the performer as the dynamic regulator of the order-chaos balance.

By the Numbers

3
Architecture Layers
3
Algorithmic Techniques
5
Symbolic Event Types
5
Musical Parameters
3
Performance Modes
14h
Max Continuous Run
8ch
Spatial Audio
<10ms
Synthesis Latency
Figure 2. System metrics — three algorithmic techniques driving five musical parameters across three performance contexts

References

  1. Eno, Brian. Generative Music. In Motion Magazine, 1996.
  2. Roads, Curtis. The Computer Music Tutorial. MIT Press, 1996.
  3. Hofstadter, Douglas. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books, 1979.
  4. Hermann, Thomas, Andy Hunt, and John G. Neuhoff. The Sonification Handbook. Logos Publishing House, 2011.
  5. Rowe, Robert. Interactive Music Systems: Machine Listening and Composing. MIT Press, 1993.
  6. Small, Christopher. Musicking: The Meanings of Performing and Listening. Wesleyan University Press, 1998.
  7. Fux, Johann Joseph. Gradus ad Parnassum. Vienna (trans. Norton, 1965), 1725.
  8. Lerdahl, Fred and Ray Jackendoff. A Generative Theory of Tonal Music. MIT Press, 1983.
  9. Meadows, Donella H.. Thinking in Systems: A Primer. Chelsea Green Publishing, 2008.
  10. Murray, Janet H.. Hamlet on the Holodeck: The Future of Narrative in Cyberspace. MIT Press, 1997.
  11. Galanter, Philip. What is Generative Art? Complexity Theory as a Context for Art Theory. International Conference on Generative Art, 2003.
  12. Xenakis, Iannis. Formalized Music: Thought and Mathematics in Composition. Pendragon Press, 1963.
  13. Cope, David. Computers and Musical Style. A-R Editions, 1991.
  14. Dodge, Charles and Thomas A. Jerse. Computer Music: Synthesis, Composition, and Performance. Schirmer Books, 1997.
  15. Prusinkiewicz, Przemysław and Aristid Lindenmayer. The Algorithmic Beauty of Plants. Springer-Verlag, 1990.
  16. Winkler, Todd. Composing Interactive Music: Techniques and Ideas Using Max. MIT Press, 1998.
  17. Oliveros, Pauline. Deep Listening: A Composer's Sound Practice. iUniverse, 2005.