TheSu XML (Thesis-Support XML) is a stand-off annotation schema designed for modelling ideas conveyed by textual sources and for linking those ideas to their surrounding discourse contexts.

The name derives from its two core components:

  • 'thesis' — A declarative statement (a claim, assertion, or proposition) conveyed explicitly or implicitly by a source
  • 'support' — A textual element that employs or targets a thesis (or another support, or other discourse components), providing its discursive context through argumentative justifications, explanatory reformulations, framing introductions, or other functional relationships

How It Works

The schema enables researchers to annotate individual theses, specifying details such as thematic classifiers, formal structure, and associated speakers. More significantly, it connects theses to relevant discursive contexts modelled as supports, allowing accurate mapping of the internal consistency of a discourse—that is, of the functional interconnections between its components.

In annotating Plutarch's discussion, a support employs the thesis "Lead can produce the most cooling of deadly drugs" as an argumentative premise to justify the conclusion "Lead is among the naturally cold substances". The support both employs the first thesis (as evidence) and targets the second thesis (as the claim being supported). This demonstrates how TheSu XML models supports that employ one or more theses to target other theses, revealing the complex argumentative structures within discourse.

Comparative Analysis Across Sources

A distinctive feature is TheSu XML's support for comparative analysis across different parts of the same source or multiple sources. The schema distinguishes between:

  • Theses — Specific statements occurring within sources
  • Propositions — Abstract ideas that multiple theses represent

By linking multiple statements interpreted as expressing essentially the same core idea under a single proposition—even across different contexts or sources—researchers can track how that idea varies in its details and is presented, used, or supported throughout a text or corpus.

Multiple statements expressing the proposition "Lead white is a cooling substance", attested in different sources, can be linked to a single abstract proposition, enabling systematic comparison of how this idea appears and functions across sources. For instance, Plutarch's thesis "Lead white is the most cooling of deadly drugs" and Dioscorides' statement that lead white has a "cooling" property can both be connected to the proposition "Lead white is a cooling substance". This illustrates TheSu XML's distinctive proposition-based linking feature: multiple specific statements (theses) from different sources are connected to a single abstract proposition, allowing researchers to systematically compare how the same core idea appears, is presented, and functions across different sources, revealing patterns of intellectual transmission and variation.

Visualisation and Applications

Both thesis-support relationships within individual sources and proposition-based comparative connections result in network structures that can be automatically visualised. One approach proposes two complementary visualisation designs:

  • Detailed argumentation maps — Hierarchical tree structures for close reading and individual argument analysis
  • Larger network graphs — Force-directed layouts for overviews, distant reading methodologies, and comparative analysis

These visualisation strategies represent one possible use of TheSu XML's machine-readable data. The schema's flexible structure enables researchers to develop custom visualisation approaches tailored to their specific research questions and analytical needs.

The schema is tailored for research in the history of ideas, philosophy, science, and technology, where systematic analysis of discourse structures and comparative study of ideas across sources are central concerns.

Purpose and Scope

The Challenge of Historical Text Analysis

Historical research on ideas, philosophy, and science often requires examining claims across multiple sources, genres, and time periods. This presents a methodological challenge: close reading of individual sources provides depth but limits scope, while broad comparative studies risk de-contextualising statements, leading to misinterpretation.

When researchers extract claims from their original contexts to compare them across sources, they may lose crucial information about how those claims function within their immediate discourse. A statement that serves as a serious philosophical tenet in one context might appear as rhetorical embellishment in another. Without access to the surrounding discourse, researchers cannot accurately assess a claim's purpose, commitment level, or argumentative role.

TheSu XML addresses this challenge by enabling researchers to map claims to their discursive contexts systematically. It preserves the connection between statements and their surrounding discourse, making it possible to conduct broad-scope comparative studies while maintaining access to the micro-contexts that prevent misinterpretation.

In his dialogue On the Apparent Face in the Orb of the Moon, Plutarch's spokesperson draws a parallel between the cosmos and the human body when discussing "the cosmos is rationally arranged": the stars are like eyes, the sun spreads heat and light as a heart spreads blood and breath, and so forth. However, this does not necessarily mean that Plutarch derived his conclusion about the cosmos being rationally arranged from human physiology, or that he lacked other grounds for this belief. He may have had other reasons, logically prior or more scientific, choosing to express them through biological comparisons for clarity or stylistic effect. Not all analogies and examples serve argumentative functions: some are intended for clarification, embellishment, or other purposes. When analysing Plutarch's philosophical and scientific thought, how can we determine whether a claim such as "the cosmos is rationally arranged" depends structurally on an analogy with the human body, or is supported by other forms of argument, such as logical or empirical demonstrations? The answer requires examining all instances of the claim across Plutarch's works—and in sources that influenced him—comparing how they are presented and justified across different contexts. TheSu XML supports this type of comparative analysis: through digital annotation, researchers can gain an overview of all discourses containing the claim, and assess whether a biological analogy serves as the sole supporting argument or appears alongside logical demonstrations, empirical considerations, or appeals to authorities. When other types of arguments are present, concluding that Plutarch applied biological thinking to his understanding of the cosmos becomes unwarranted, at least for this particular case. These analytical capabilities support hypothesis falsification and offer stronger methodological foundations for drawing historical conclusions.

Systematic Methods for Discourse Analysis

TheSu XML enables researchers to transform their interpretative readings of sources into structured, machine-readable datasets. The schema provides systematic methods for:

  • Distinguishing support functions — Supports can function in multiple ways: argumentatively (as justifications), expositively (as clarifications), expansively (as elaborations), or contextually (as framing). This classification enables researchers to determine whether analogies, examples, or other elements contribute structurally to arguments or serve merely stylistic purposes.
  • Interpretative reconstruction — Encode reasonable argumentative and rhetorical links implied by textual evidence, including implicit premises, conclusions, and enthymematic arguments. This mirrors established practices in history and philology, acknowledging interpreter mediation while maintaining fidelity to sources.
  • Procedural analysis — Model recipes and technical procedures as sequences of phases, with detailed annotation of ingredients, duration, repetitions, and conditions. Variant procedures can be mapped to primary processes, identifying commonalities and divergences—essential for research in the history of science and technology.
  • Speaker attribution and commitment — Associate claims with authors or speakers and specify commitment levels (earnest, jest, hypothesis, objection), enabling analysis of how different voices function within dialogues or how authors present ideas they may not fully endorse.
  • Quantitative analysis — The network structures formed by annotations enable computational analysis, including centrality measures to identify key theses, clustering to reveal thematic groups, and path analysis to trace argumentative chains across larger corpora.

Target Domains

TheSu XML is tailored for research in:

History of Ideas

Compare ideas across sources through proposition-based linking, track intellectual traditions, map influence networks between authors

History of Philosophy

Reconstruct argumentative structures including implicit reasoning, compare philosophical positions, trace how arguments evolve across works

History of Science and Technology

Analyse recipes and technical procedures through sequence annotation, compare variant processes, support experimental archaeology through detailed procedure mapping

Philology

Map discourse structures and rhetorical techniques, distinguish between literal statements and deliberate metaphors, analyse how analogies function within arguments

Digital Humanities

Create interoperable datasets following FAIR principles, enable network visualisation and quantitative analysis of discourse structures

Key Characteristics

Stand-Off Annotation

Annotations are stored separately from source texts, referencing text spans through external pointers rather than embedding markup directly in sources. TheSu XML does not standardise source segmentation—methods may vary according to project needs. This flexible approach preserves source integrity and enables:

  • Flexible referencing without modifying original sources
  • Multiple annotation layers on the same text
  • Discontinuous text references—skipping irrelevant words in the middle of a statement (e.g., referencing "the most cooling of deadly drugs [...] lead white" while skipping an intervening verb)
  • Preservation of source structure—existing markup (such as TEI divs and milestones) remains accessible for precise citation
  • Avoidance of overlapping hierarchy issues inherent in embedded markup
  • Applicability beyond text—can be adapted for any medium where components can be assigned identifiers (images, audio, video)

Interpretative Reconstruction

TheSu XML encodes reasonable argumentative and rhetorical links implied by textual evidence, not just surface syntax. This interpretative approach includes:

  • Implicit premises and conclusions—reconstructing missing steps in enthymematic arguments (e.g., if a text argues "Socrates is mortal" using the evidence "All men are mortal", the implicit premise "Socrates is a man" can be reconstructed to show the logical connection)
  • Pragmatic interpretation of discourse functions—determining whether analogies serve argumentative purposes or merely clarify or embellish
  • Acknowledgment of interpreter mediation while maintaining fidelity to sources—mirroring practices in scholarly translation and commentary
  • Distinction from surface-level annotation frameworks that map only explicit discourse markers

Network Modelling

Thesis-support relationships and proposition-based connections naturally form network structures, enabling:

  • Automatic visualisation through various approaches—for example, detailed hierarchical maps for close reading or force-directed network graphs for overviews and comparative analysis
  • Quantitative analysis—centrality measures identify key theses, clustering reveals thematic groups, path analysis traces argumentative chains
  • Pattern discovery through spatial proximity in network layouts
  • Both qualitative examination of individual nodes and computational analysis of network properties

Comparative Analysis

Unique proposition-based linking connects multiple theses expressing essentially the same core idea under a single abstract proposition, enabling:

  • Cross-source comparison—multiple statements from different authors (e.g., Plutarch's "Lead white is the most cooling of deadly drugs" and Dioscorides' "Lead white has a cooling property") linked to the proposition "Lead white is a cooling substance"
  • Tracking idea variation—systematic comparison of how the same core idea appears, is presented, and functions across different contexts
  • Identifying thematic clusters and intellectual traditions through proposition-based grouping
  • Mapping influence networks between authors by tracing proposition connections

Origin Story

2016: Bachelor's Thesis Inspiration

Daniele Morrone began developing TheSu XML during his bachelor's thesis research in ancient philosophy at Sapienza University of Rome. Inspired by G.E.R. Lloyd's Polarity and Analogy (1966), he was examining how biological metaphors and vitalistic thinking influenced ancient philosophers' cosmologies and astro-meteorological theories. Working across multiple sources, genres, and time periods, he needed a digital method to catalogue claims and compare them systematically—a need that existing tools could not address.

2016-2018: Master's Thesis Development

During his master's thesis (2016-2018), Morrone developed the first prototype, initially for his own research. He tested it by annotating Plutarch's dialogue On the Apparent Face in the Orb of the Moon, which revealed TheSu XML's potential for analysing complex interweavings of scientific, cosmological, and religious content.

2018-2023: Recognition and Expansion

As the prototype proved its value, Morrone recognised that TheSu XML could serve researchers beyond his own work. Development continued during his PhD (2016-2022) as part of the ERC project AlchemEast (2018-2022), and later within the ERC project PlatoViaAristotle (2022-2023). In 2023, he publicly released the schema, publishing the definition in the KU Leuven RDR repository (DOI: 10.48804/KD8QPO, published 2023-11-27).

Current Development (2024-2027): FWO Fellowship

Morrone is currently advancing TheSu XML through an FWO Postdoctoral Fellowship at KU Leuven, focusing on:

  • Refining ontology and syntax through historical case studies
  • Developing TheSu Annotator GUI for streamlined annotation
  • Establishing standards for dataset sharing
  • Designing a web environment for publishing and consulting annotations

Project Title: "Advancing Digital History with TheSu XML: The Next Step in its Development, with a Historical-Philosophical and Chemical Exploration of Lead and Lead White in Greco-Roman Sources"

Host Institution: KU Leuven (De Wulf-Mansion Centre)
Supervisor: Prof. Jan Opsomer
Co-supervisors: Prof. Margherita Fantoli (LECTIO, KU Leuven), Prof. Matteo Martelli (University of Bologna)

Contact

TheSu XML was created and is maintained by Daniele Morrone.

Email
Social X
FWO special research associate and PlatoViaAristotle ERC Project Associated Member
De Wulf - Mansion Centre for Ancient, Medieval and Renaissance Philosophy
Kardinaal Mercierplein 2 - box 3200
3000 Leuven, Belgium