### Programmable Corpora: Towards a Combined Words/Networks
Analysis of Literary Texts
[Frank Fischer](https://www.hse.ru/en/org/persons/182492735) ¹ ² · Eugenia Ustinova ¹ ¹ Higher School of Economics, Moscow
² DARIAH-EU
This presentation: **bit.ly/2WPybvF**
[SUNBELT 2019](https://www.fourwav.es/view/717/info/) · Montréal 🇨🇦 · 20 June 2019 -- ### TOC
1. Network Research on Drama 2. Programmable Corpora: Networks for Literary Studies 3. Catching Protagonists: Quantitative Dominance Relations --- ## Chapter 1.
## Network Research on Drama -- ## Literary Network Data as Example Files (1993)
Donald Knuth: The Stanford Graph Base: A Platform for Combinatorial Computing. ACM Press, 1993.
Networks extracted from fiction („Anna Karenina“, „Les Misérables“, etc.) are based on co-occurrences.
-- ### One of the First Literary Network Analyses (1998)
![Simple Storys, Network Graph, with Pajek (1998)](images/simple-stories-network.png)
Thomas Schweizer, Michael Schnegg: *Die soziale Struktur der „Simple Storys“. Eine Netzwerkanalyse.* 1998. ([PDF](https://www.ethnologie.uni-hamburg.de/pdfs-de/michael-schnegg/simple-stories-publikation-michael-schnegg.pdf))
Network extracted from [Ingo Schulze’s](https://en.wikipedia.org/wiki/Ingo_Schulze) novel „Simple Storys“, done by ethnologists. 38 nodes (characters).
Edges: positive, negative or exchange relationships. Visualisation with [Pajek](http://vlado.fmf.uni-lj.si/pub/networks/pajek/).
-- ### „Les Misérables“ as Example File in Gephi (2008) ![Gephi Screenshot, CC](images/gephi-les-miserables.png)
(Screenshot taken from [workshop material](https://jasonheppler.org/courses/csu-workshop/gephi.html) by Jason Heppler, 2016. Licensed under CC BY-NC-SA 4.0.)
-- ### Franco Moretti’s Analysis of „Hamlet“ (2011) ![Network of Hamlet selon Moretti](images/moretti-hamlet.gif)
(Source: [newleftreview.org](https://newleftreview.org/II/68/franco-moretti-network-theory-plot-analysis).)
-- ## Distant-Reading Showcase (2016) ![465 drama networks at a glance](images/distant-reading-showcase-poster.jpg)
*Distant-Reading Showcase* (released at DHd2016, Leipzig).
Download via Figshare. DOI: [10.6084/m9.figshare.3101203.v2](https://dx.doi.org/10.6084/m9.figshare.3101203.v2).
-- ### Extraction of Literary Network Data:
A „Digital Spectator“
- following an idea of Solomon Marcus, „Poetica matematică“, 1970 - operationalisation of ‘interaction’:
„Two characters interact with one another if they perform a speech act within the same segment of a drama (usually a ‘scene’).“ --- ## Chapter 2.
## Programmable Corpora:
Networks for Literary Studies -- ## **DraCor**
- DraCor: Drama Corpora Platform (https://dracor.org/) - DraCor is a showcase for the concept of „Programmable Corpora“ - Programmable Corpora: collection of full-text literary corpora with an API and Linked Open Data -- ## Frontend dracor.org ![DraCor-Frontpage)](images/dracor-frontpage.png)
https://dracor.org/ (public beta!)
-- ## Connected Corpora
- German, Russian, Spanish, Swedish, Italian, Shakespeare, Ancient Greek, Roman, … - full texts encoded in TEI (XML standard format in the Humanities, ~ 550 elements for the encoding of documents) - example: [Shakespeare’s „Hamlet“](https://dracor.org/shake/hamlet) -- ## DraCor Technology Stack ![DraCor Technology Stack](images/dracor-drawio.svg)
All repos are open source: https://github.com/dracor-org
-- ## DraCor API
- provides metadata and network data for all plays in CSV or GEXF format - character constellations per segment (*dynamic graphs!*) - spoken text (all characters, only female/male, or per character) - live documentation via Swagger: https://dracor.org/documentation/api/ -- ### Pushkin’s „Boris Godunov“ (1/2) ![betweenness](images/boris-godunov-gephi.png)
Correlations: Label size = betweenness centrality; heat of nodes = word-based measures.
(Data source: GEXF file from https://dracor.org/rus/pushkin-boris-godunov.)
-- ### Pushkin’s „Boris Godunov“ (2/2) ![Boris Godunov (dynamic graph)](images/pushkin-boris-godunov.gif)
**Dynamic graph**, generated with [**ndtv**](https://cran.r-project.org/web/packages/ndtv/index.html) package. Data coms directly from the **DraCor API**.
Script by Ivan Pozdniakov ([source code on RPubs.com](https://rpubs.com/Pozdniakov/godunov)).
-- ## DraCor Shiny App ![Shiny App](images/shiny-kaethchen.png)
https://shiny.dracor.org/ (by Ivan Pozdniakov, based entirely on DraCor API).
-- ## Small-World Phenomenon in Russian Drama ![Small-World Phenomenon in Russian Drama)](images/sw_size_v2.svg)
Testing all plays against the criteria in Watts/Strogatz 1998.
Repo: https://github.com/pixelmagenta/rusdracor-small-worlds.
-- ## Gamification: „Brecht Beats Shakespeare!“ ![card game](images/card-game-dh2018.jpg)
„Brecht Beats Shakespeare!“ (released at DH2018, México).
Download via Figshare. DOI: [10.6084/m9.figshare.5926363.v1](https://doi.org/10.6084/m9.figshare.6667424.v1).
--- ## Chapter 3.
## Catching Protagonists:
Quantitative Dominance Relations -- ## What Are „Quantitative Dominance Relations“?
- term coined by Manfred Pfister (1977) - modelled by us as multi-dimensional vector of network- and word-based features -- ## Calculation of Rankings for 8 Measures
(For Every Character in Every Play) **5 network-based**: - Degree - Weighted degree - Closeness centrality - Betweenness centrality - Eigenvector centrality **3 word-based**: - Number of words - Number of speech acts - Number of appearances -- ## Example: Character Rankings for „Hamlet“ (Excerpt)
|Character |Words|Speech
Acts|Appea-
rances|Between-
ness|Close-
ness|Weighted
Degree|Degree|Eigenvector|TEXT
(total)|NETWORK
(total)| |----------|-----|-----------------|------------------|------------------|----------------|--------------------|------|-----------|------------------|---------------------| |Horatio |4 |4 |4 |1 |1 |4 |1 |3 |4 |1 | |Gertrude |7 |6 |3 |3 |2 |3 |2 |1 |5 |2 | |Claudius |2 |2 |2 |4 |3 |2 |3 |2 |2 |3 | |Hamlet |**1**|1 |1 |2 |4 |1 |**4** |4 |1 |4 | |Laertes |5 |5 |7 |5 |5 |6 |5 |6 |6 |5 | |Polonius |3 |3 |5 |6 |6 |5 |6 |5 |3 |6 | |Ophelia |6 |7 |8 |7 |7 |9 |7 |7 |7 |7 | |… |… |… |… |… |… |… |… |… |… |… |
Top-7 characters of the play regarding their degree values (out of 38 characters in total).
Hamlet ranks 1st for the number of words, but only 4th for degree (too many monologues? 🤔).
-- ## Correlation of Word- & Network-Based Rankings ![Spearman’s rank correlation coefficient](images/cor_coeff.svg)
Distribution of **Spearman’s rank correlation coefficient** for
word- vs. network-based measures in Russian Drama Corpus.
-- ## Ranking Results
- for > 50% of plays, a definite *quantitative* main character is identified, ranking first in at least 7 of the 8 measures (in Russian and German Drama Corpus) - yet literary texts can have more than one definite protagonist -- ## Major and Minor Characters
- division into **four groups** (following „Reallexikon der deutschen Literaturwissenschaft“, 2003): - I. main characters - II. secondary characters - III. marginal characters - IV. functional characters - using ```cut``` function (in R) -- ## Groups of Importance ![Groups of importance](images/percentages_five.svg)
Shares for each group of characters in the entire Russian Drama Corpus.
-- ## The „Chekhov Effect“ ![Chekhov effect](images/first_group_degree.svg)
Percentages of first-group characters divided
by **degree** in the entire corpus **by decade**.
--- Thanks.
https://dracor.org/
#ProgrammableCorpora