Towards a Combined Words/Networks
Analysis of Literary Texts
Frank Fischer ¹ ² · Eugenia Ustinova ¹
¹ Higher School of Economics, Moscow
² DARIAH-EU
This presentation:
bit.ly/2WPybvF
SUNBELT 2019 · Montréal 🇨🇦 · 20 June 2019
Donald Knuth: The Stanford Graph Base: A Platform for Combinatorial Computing. ACM Press, 1993.
Networks extracted from fiction („Anna Karenina“, „Les Misérables“, etc.) are based on co-occurrences.
Thomas Schweizer, Michael Schnegg: Die soziale Struktur der „Simple Storys“. Eine Netzwerkanalyse. 1998. (PDF)
Network extracted from Ingo Schulze’s novel „Simple Storys“, done by ethnologists. 38 nodes (characters).
Edges: positive, negative or exchange relationships. Visualisation with Pajek.
(Screenshot taken from workshop material by Jason Heppler, 2016. Licensed under CC BY-NC-SA 4.0.)
(Source: newleftreview.org.)
Distant-Reading Showcase (released at DHd2016, Leipzig).
Download via Figshare. DOI: 10.6084/m9.figshare.3101203.v2.
https://dracor.org/ (public beta!)
All repos are open source: https://github.com/dracor-org
Correlations: Label size = betweenness centrality; heat of nodes = word-based measures.
(Data source: GEXF file from https://dracor.org/rus/pushkin-boris-godunov.)
Dynamic graph, generated with ndtv package. Data coms directly from the DraCor API.
Script by Ivan Pozdniakov (source code on RPubs.com).
https://shiny.dracor.org/ (by Ivan Pozdniakov, based entirely on DraCor API).
Testing all plays against the criteria in Watts/Strogatz 1998.
Repo: https://github.com/pixelmagenta/rusdracor-small-worlds.
„Brecht Beats Shakespeare!“ (released at DH2018, México).
Download via Figshare. DOI: 10.6084/m9.figshare.5926363.v1.
5 network-based:
3 word-based:
Character | Words | Speech Acts |
Appea- rances |
Between- ness |
Close- ness |
Weighted Degree |
Degree | Eigenvector | TEXT (total) |
NETWORK (total) |
---|---|---|---|---|---|---|---|---|---|---|
Horatio | 4 | 4 | 4 | 1 | 1 | 4 | 1 | 3 | 4 | 1 |
Gertrude | 7 | 6 | 3 | 3 | 2 | 3 | 2 | 1 | 5 | 2 |
Claudius | 2 | 2 | 2 | 4 | 3 | 2 | 3 | 2 | 2 | 3 |
Hamlet | 1 | 1 | 1 | 2 | 4 | 1 | 4 | 4 | 1 | 4 |
Laertes | 5 | 5 | 7 | 5 | 5 | 6 | 5 | 6 | 6 | 5 |
Polonius | 3 | 3 | 5 | 6 | 6 | 5 | 6 | 5 | 3 | 6 |
Ophelia | 6 | 7 | 8 | 7 | 7 | 9 | 7 | 7 | 7 | 7 |
… | … | … | … | … | … | … | … | … | … | … |
Top-7 characters of the play regarding their degree values (out of 38 characters in total).
Hamlet ranks 1st for the number of words, but only 4th for degree (too many monologues? 🤔).
Distribution of Spearman’s rank correlation coefficient for
word- vs. network-based measures in Russian Drama Corpus.
cut
function (in R)Shares for each group of characters in the entire Russian Drama Corpus.
Percentages of first-group characters divided
by degree in the entire corpus by decade.
Thanks.
#ProgrammableCorpora