![workshop logo cropped](images/workshop-logo-cropped.png) ### Tolstoy Everywhere Unleashing the Information Hidden in the
90-Volume Collected Works
[Boris Orekhov](http://nevmenandr.net/bo.php) ¹ · [Frank Fischer](https://www.hse.ru/en/org/persons/182492735) ¹ ² ¹ Higher School of Economics, Moscow
² DARIAH-EU
[Slavic DH Workshop](https://cdh.princeton.edu/events/2019/05/slavic-dh-workshop-russian-literary-studies-digital-age/) · Princeton University 🇺🇸 · 28 May 2019, 1:30–3:00 p.m. -- ### **Chapters**
1. The Tolstoy Project 2. The 91st Volume – An Exceptional Index 3. The Web App 4. Summary --- # Chapter 1.
### The Tolstoy Project -- ### Digital Edition
- digital edition of collected works in 90 volumes, published between 1928 and 1958: novels, short stories, plays, letters, diaries, … (~ 46,000 pages) - cooperation between [Tolstoy Museum](http://tolstoymuseum.ru/) and Higher School of Economics, Moscow -- ### Digitising Tolstoy ![movie](images/startup.jpg) - *Startup* (2014), movie about the history of Yandex - 0:32:46: “The complete works of Leo Tolstoy is now uploaded to the system.” - in the 1990s when the movie takes place, Tolstoy was not yet fully digitised 😊 -- ### Digitising Tolstoy
- it started with [„All of Tolstoy in One Click“](http://tolstoy.ru/projects/tolstoy-in-one-click/): all volumes scanned and OCRed (supported by ABBYY) - corrections were crowdsourced - abundant media coverage: [New Yorker](https://www.newyorker.com/books/page-turner/crowdsourcing-tolstoy), [The Guardian](https://www.theguardian.com/books/2013/oct/16/all-leo-tolstoy-one-click-project-digitisation), etc. - texts available in PDF, EPUB, FB2, MOBI and XHTML (see http://tolstoy.ru/creativity/90-volume-collection-of-the-works/) -- ### Our Contribution
- texts converted to [basic TEI](https://github.com/tolstoydigital/TEI) - [online search interface](http://search.effits.ru/) - richer markup - onthologies - linked open data - where do we start? index! --- # Chapter 2.
### The 91st Volume – An Exceptional Index -- ### Index
- indexes for a long time and not by chance accompany academic editions - book not as actual reading device, but as instrument (looking for references and quotes) - indexes make this work efficient - proper names in a fiction text have a special function -- ### Proper Names
- draw attention (including capitalisation) - heavily loaded associatively - often have a special spelling (foreign names) - important factor of the overall structure of a text, especially if they appear with high density -- ### 91st Volume ![91st volume (cover)](images/91st-volume-cover.jpg) - supplement volume containing indexes of works and proper names (fiction and non-fiction works like diaries, letters) - 16,256 registered entries - good starting point to create an onthology -- ### Onthology
| entity | ID | wikidata | comment | geo | |:--------------------------:|:----:|:---------:|:---------------------:|:--------------------:| | Лао-Цзы (Лао-тзе) | 7627 | Q24446595 | | | | Алексеева Мария Васильевна | 272 | | дочь В. И. Алексеева. | | | Алексеевка Тульской губ. | 274 | | | 53.590620, 38.093302 | | Алексеевка Самарской губ. | 275 | | | 52.580285, 51.275200 | -- ### Next Step: Web App
- web app: http://index.tolstoy.ru/ - target audience: specialists as well as enthusiasts - general question: even with digitised full texts at hand, how can we benefit from the structuring and registration efforts of the past? --- # Chapter 3.
### The Web App -- ![web app statistics](images/webapp-statistics.png)
brown: number of references in the text – beige: number of mentions in the comments
-- ### Search for Entities
- original index functionality retained: mapping proper names to volumes and pages - one-click direct jumps to pages - added functionality: - flexible search: entering „ava“ will list results like „Poltava“, „Bavariâ“, „Abdulla-al’-Mamun Zuravardi“ - word-cloud representation conveys a first idea about most frequent words in the corpus -- ### Flexible Search
![search](images/001.png) -- ### Case in Point: Dürnstein (Austria) ![search](images/duernstein.jpg) -- ### Search With Tag
![tag](images/003.png) -- ### Alphabetical List
![tag](images/002.png) -- ### Co-Occurrence Calculator
![calculator](images/008.png) -- ### Entities Network
![Victor Hugo co-occurrence network](images/screenshot-victor-hugo.png)
word-cloud representation for entry ‘Victor Hugo’ showing co-occurring named entities – URL: http://index.tolstoy.ru/person/4079/
-- ### Individual Pages for All Entities ![card](images/009.png) -- ### Studying Life and Works of Leo Tolstoy by Means of Network Analysis (1/3) ![whole graph](images/general-graph.jpg)
co-occurence graph for all 90 volumes – network is downloadable and can be customised – URL: http://index.tolstoy.ru/general/graph/
-- ### Studying Life and Works of Leo Tolstoy by Means of Network Analysis (2/3)
- studying co-occurrences of proper names in the same environment (in our case, on the same page) help to understand larger contexts - example: - the Hindu scripture „Bhagavad Gita“ is found on 5 pages in Tolstoy’s Complete Works, see http://index.tolstoy.ru/person/2138/ - it shares these five pages with a total of 43 other names mentioned -- ### More Examples
- the proximity of these mentionings is not accidental, in our case they form a kind of an „India cluster“ containing works like „Hitopadesha“, „Dhammapada“, „Vamana Purana“, or names like Sri Ramakrishna Paramahansa - for Tolstoy, these titles and names are part of a set of carriers of philosophical knowledge, and are associated with names like Xenophon, Montaigne, Montesquieu, Pascal, Skovoroda, Socrates - these networks provide great opportunities for understanding the whole range of Tolstoy’s interests and ideas -- ### Studying Life and Works of Leo Tolstoy by Means of Network Analysis (3/3) Nodes ranked by weighted degree in the fictional network (vols. 1–45):
*
1. Россия/Русь (2778) 2. Наполеон I Бонапарт (2712) 3. Москва (2689) 4. Александр I Павлович (1873) 5. Петербург (1830) 6. Кутузов Михаил Илларионович/Голенищев-Кутузов (1515) 7. Франция (1390) 8. Европа (1212) 9. Англия (938) 10. Германия (789)
*
Many thanks to Daniil Skorinkin for providing this data.
-- ### Exploring the Text Structure
![heatmap](images/007.png) --- # Summary -- - the work of the editors of Tolstoy’s complete works is exceptional - it allows you to systematize the entities in the texts of Tolstoy and build an ontology without NLP - entities allow you to reconstruct the way Tolstoy thinks - entities allow you to see the whole picture of the text - making indexes machine-readable is an additional way to research the Collected Works of single authors or similar printed text collections that come with rich and detailed indexes - functionalities of traditional indexes are not only retained, but also enhanced (flexible searches, visualisations of co-occurrences, etc.) -- Thanks.
– https://hum.hse.ru/digital/ –