LingFrame — Linguistic Atomization Framework
Computational rhetoric across 46 works and 15 languages
The Problem: Rhetoric Without Instruments
For twenty-four centuries, the study of rhetoric has operated with essentially the same analytical toolkit: close reading, taxonomic classification, and argumentative intuition.[1] Aristotle's tripartite framework — ethos, pathos, logos — remains the dominant analytical lens, not because it is complete but because no systematic alternative has emerged that preserves humanistic categories while enabling computational precision. The digital humanities have produced powerful tools for text analysis, but they overwhelmingly target semantic content: what a text means (topic modeling, sentiment analysis, named entity recognition) rather than what a text does — how its syntactic structures produce rhythmic effects, how its figurative language distributes across argumentative architecture, how its rhetorical moves sequence to produce persuasion.[2] Burke's dramatistic framework — his insistence that language is fundamentally a mode of action, not merely representation — provides the theoretical mandate for this project. The Linguistic Atomization Framework treats texts not as containers of meaning but as machines that produce effects, and it decomposes those machines into their smallest operational parts.
Corpus and Scale
The framework operates on a curated corpus of 46 canonical works spanning 15+ languages and 12 distinct literary-rhetorical traditions. This is not a convenience sample — it is a deliberate attempt to test every analytical claim against genuine linguistic diversity.[3] Curtius demonstrated that the rhetorical traditions of Europe form a continuous chain from antiquity through the Middle Ages to modernity, with topoi (commonplaces) serving as the connective tissue. The corpus extends this genealogy beyond Europe: Sanskrit rhetoric (the alamkara tradition), classical Arabic (balagha), Chinese parallel prose (pianwen), and Japanese zuihitsu all have independent theoretical frameworks for analyzing how texts produce effects. By including works from each tradition alongside its theoretical apparatus, the framework can test whether analytical categories developed for Greek oratory generalize to Heian-period Japanese prose — and where they do not, the failures are as informative as the successes.[4] Auerbach's method of anchoring broad historical claims in microscopic textual analysis — his famous comparison of Homer and Genesis in the opening chapter — is the direct methodological ancestor of this framework's approach: every macro-level claim about rhetorical pattern must be grounded in atomized, verifiable textual evidence.
| Tradition | Language(s) | Representative Work | Period |
|---|---|---|---|
| Greek Classical | Ancient Greek | Aristotle, Rhetoric | 4th c. BCE |
| Roman Oratory | Latin | Cicero, De Oratore | 1st c. BCE |
| Sanskrit Poetics | Sanskrit | Bharata, Natyashastra | 2nd c. BCE |
| Arabic Rhetoric | Classical Arabic | Al-Jurjani, Dala'il al-I'jaz | 11th c. CE |
| Medieval European | Latin, Old French | Dante, De Vulgari Eloquentia | 14th c. CE |
| Chinese Parallel Prose | Classical Chinese | Liu Xie, Wenxin Diaolong | 5th c. CE |
| Japanese Zuihitsu | Classical Japanese | Sei Shonagon, The Pillow Book | 11th c. CE |
| Renaissance Humanism | Italian, Latin | Erasmus, De Copia | 16th c. CE |
| Enlightenment | English, French | Blair, Lectures on Rhetoric | 18th c. CE |
| Russian Formalism | Russian | Shklovsky, Art as Device | 20th c. CE |
| Structuralism | French | Genette, Narrative Discourse | 20th c. CE |
| Latin American Boom | Spanish | Borges, Ficciones | 20th c. CE |
Six Analysis Modules
The framework applies six configurable analysis modules, each operating at every level of the atomization hierarchy — from morpheme to whole-work architecture. The modules are: Figurative Language (tropes, schemes, and their distribution patterns), Rhythmic Structure (prosodic analysis, clause length variation, periodic vs. loose sentence construction), Argumentative Topology (enthymeme detection, topos mapping, warrant analysis), Narrative Mechanics (focalization, temporality, voice), Lexical Stratification (register analysis, etymological layering, code-switching patterns), and Pragmatic Force (speech act classification, implicature, illocutionary sequencing).[5] Genette's taxonomy of narrative functions — order, duration, frequency, mood, voice — structures the Narrative Mechanics module, providing a formal vocabulary for phenomena that close readers intuit but rarely formalize. Each module can operate at any granularity level: the Figurative Language module can identify a metaphor within a single clause or trace the distribution of metaphorical clusters across an entire work's argumentative architecture.[6] Jakobson's six functions of language — referential, emotive, conative, phatic, metalingual, poetic — inform the Pragmatic Force module, ensuring that the framework captures not just what a text says but what it does to its reader at each level of structure.
from dataclasses import dataclass, field
from typing import Protocol, Sequence
from enum import Enum
class GranularityLevel(Enum):
MORPHEME = "morpheme"
WORD = "word"
CLAUSE = "clause"
SENTENCE = "sentence"
PARAGRAPH = "paragraph"
RHETORICAL_MOVE = "rhetorical_move"
ARGUMENT = "argument"
WHOLE_WORK = "whole_work"
class AnalysisModule(Protocol):
"""Each module implements this protocol at every granularity level."""
name: str
def analyze(self, unit: "LinguisticUnit") -> "ModuleResult": ...
def supported_levels(self) -> set[GranularityLevel]: ...
@dataclass
class LinguisticUnit:
text: str
level: GranularityLevel
children: list["LinguisticUnit"] = field(default_factory=list)
annotations: dict[str, "ModuleResult"] = field(default_factory=dict)
source_work: str = ""
position: tuple[int, int] = (0, 0) # start, end offsets
def decompose(self, target: GranularityLevel) -> list["LinguisticUnit"]:
"""Recursively decompose to target granularity."""
if self.level == target:
return [self]
return [
sub_unit
for child in self.children
for sub_unit in child.decompose(target)
]
def annotate(self, modules: Sequence[AnalysisModule]) -> None:
"""Apply all applicable modules, then recurse to children."""
for module in modules:
if self.level in module.supported_levels():
self.annotations[module.name] = module.analyze(self)
for child in self.children:
child.annotate(modules) Visualization and Output
Raw analysis data is only as useful as its representation. The framework generates two classes of output: interactive HTML visualizations for exploratory analysis and structured data exports (JSON, CSV, TEI-XML) for downstream computational work. The HTML views allow a reader to navigate the atomization hierarchy — clicking a rhetorical move zooms into its constituent sentences, then clauses, then morphemes — with annotations from each analysis module overlaid as color-coded layers. Heat maps show where figurative density clusters; timeline views trace argumentative structure; side-by-side comparisons align parallel passages across translations.[7] Moretti's distant reading methodology — the deliberate refusal to close-read in favor of quantitative pattern detection across large corpora — informs the aggregate views, where individual works dissolve into tradition-level patterns: How does metaphor density in Greek oratory compare to Arabic balagha? Does periodic sentence structure correlate with argumentative complexity across languages? The structured exports feed these questions into statistical analysis, while the interactive views keep the individual text visible and navigable.[8] Manovich's cultural analytics framework — treating cultural artifacts as data while preserving their individuality — provides the design philosophy: every visualization maintains a path from aggregate pattern back to specific textual evidence.
Cross-Tradition Analysis
The most theoretically significant capability of the framework is cross-tradition comparison. Because every work is decomposed using the same hierarchical structure and annotated by the same six modules, it becomes possible to ask questions that have never been systematically addressable: Does the distribution of figurative language in Cicero's periodic oratory resemble the parallel structures of Liu Xie's pianwen prose? Are the argumentative topologies of the Natyashastra commensurable with Aristotle's enthymematic reasoning? These are not idle exercises in comparative rhetoric — they test whether the analytical categories we inherited from the Greco-Roman tradition are genuinely universal or culturally specific artifacts.[3] Curtius traced the survival of classical topoi through medieval Latin into the modern European vernaculars, but his method could not extend beyond the Latin-Christian tradition. The Linguistic Atomization Framework operationalizes a version of his genealogical method that crosses civilizational boundaries, using formal decomposition rather than philological intuition to detect structural homologies.[4] Auerbach demonstrated that a single passage, analyzed with sufficient care, can reveal an entire civilization's relationship to reality. The framework preserves this depth while extending its reach: not one passage from one tradition but every passage from twelve.
| Module | Morpheme | Word | Clause | Sentence | Paragraph | Rhet. Move | Whole Work |
|---|---|---|---|---|---|---|---|
| Figurative Language | * | * | * | * | * | * | |
| Rhythmic Structure | * | * | * | * | * | * | |
| Argumentative Topology | * | * | * | * | * | ||
| Narrative Mechanics | * | * | * | * | * | ||
| Lexical Stratification | * | * | * | * | * | ||
| Pragmatic Force | * | * | * | * | * |
Connection to ORGAN-II: From Analysis to Generation
The Linguistic Atomization Framework exists within ORGAN-I (Theoria) — the theoretical foundation of the eight-organ system. Its purpose is not merely scholarly: it directly enables the generative work of ORGAN-II (Poiesis). You cannot generate compelling text — whether procedural poetry, data-driven narrative, or interactive fiction — without understanding how compelling text is constructed at every level of granularity.[9] Reas and Fry's Processing project demonstrated that creative coding requires a deep understanding of the formal principles underlying visual art — color theory, composition, gestalt perception — before generative algorithms can produce aesthetically meaningful output. The Linguistic Atomization Framework provides the equivalent foundation for text: a formal inventory of rhetorical devices, rhythmic patterns, argumentative structures, and narrative mechanics that generative systems can draw upon as compositional primitives.[10] Galanter's definition of generative art — a practice where the artist creates a system that in turn creates the artwork — maps directly onto the relationship between ORGAN-I and ORGAN-II. The atomization framework is the knowledge base; the generative systems are the creative agents that query it. Without atomization, generation is blind pattern-matching. With it, generation becomes informed composition.
Testing Rhetoric Computationally
The test suite contains 142 tests organized across three categories: decomposition correctness (does the hierarchical structure preserve source text fidelity?), module agreement (do independent modules produce consistent annotations on shared units?), and cross-tradition validation (do analytical categories produce meaningful results outside their tradition of origin?).[5] Genette's own methodology — rigorously testing narratological categories against texts that should resist them — provides the testing philosophy. The most informative tests are the ones that fail: when the Argumentative Topology module, designed primarily around Aristotelian enthymeme structure, encounters a passage from the Natyashastra that uses a fundamentally different reasoning framework, that failure illuminates the limits of the analytical category rather than a bug in the code.[6] Jakobson's structural method — isolating the poetic function by contrasting it with the other five functions of language — informs the module agreement tests: if the Figurative Language and Pragmatic Force modules both annotate the same clause, their annotations should be complementary (describing different aspects of the same phenomenon) rather than contradictory.
By the Numbers
References
- Aristotle. Rhetoric. Oxford University Press (trans. Kennedy, 2007), 350 BCE.
- Burke, Kenneth. A Rhetoric of Motives. University of California Press, 1950.
- Curtius, Ernst Robert. European Literature and the Latin Middle Ages. Princeton University Press (trans. Trask), 1948.
- Auerbach, Erich. Mimesis: The Representation of Reality in Western Literature. Princeton University Press (trans. Trask), 1946.
- Genette, Gerard. Narrative Discourse: An Essay in Method. Cornell University Press (trans. Lewin), 1972.
- Jakobson, Roman. Linguistics and Poetics. MIT Press (in Style in Language, ed. Sebeok), 1960.
- Moretti, Franco. Distant Reading. Verso Books, 2013.
- Manovich, Lev. Cultural Analytics. MIT Press, 2020.
- Reas, Casey and Ben Fry. Processing: A Programming Handbook for Visual Designers. MIT Press, 2007.
- Galanter, Philip. What is Generative Art? Complexity Theory as a Context for Art Theory. International Conference on Generative Art, 2003.