stone house on a hill in Leshten Bulgaria, wagon in front

Read, Hot & Digitized: Bulgarian Dialectology as Living Tradition

Read, hot & digitized: Librarians and the digital scholarship they love — In this new series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship.  Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.

Creating and publishing open access linguistic data is an invaluable way to support research in digital approaches to linguistics, and to lend support to making more scholarly research openly available to a broad audience. Bulgarian Dialectology as Living Tradition contributes to this body of open access data with its searchable and interactive database of oral speech in Bulgarian, representing a wide range of dialects recorded in 69 different Bulgarian villages. Data is presented in the oral recordings themselves and in the 184 transcriptions of those recordings, with a variety of features–such as tokens with associated tags for grammatical, lexical, and linguistic trait information–available for each text. This collection of Bulgarian linguistic materials is an important resource for studying the language, and the project will be of interest to anyone interested in computational linguistics, digital approaches to studying and analyzing languages, and, of course, in Slavic languages.

The website’s homepage.

The site breaks down its texts into lines, which are themselves comprised of associated tags for grammatical, lexical, and linguistic trait information. Each text can be viewed in three ways: the Glossed View, which shows tokens with grammatical information, English glosses, and Bulgarian lemmas; the Line Display, which shows a line of text and its English translation; and the Cyrillic Line Display, with the original Bulgarian lines in Cyrillic script. In addition to these views, there are five types of search available to users; from the website: the wordform search, lexeme search, linguistic trait search, thematic content search, and phrase search.

This project succeeds at its goal “to return the focus of dialectology to its source in living, natural speech, to provide a broad, representative covering of this speech throughout the chosen region, and to make this material accessible to a wide spectrum of users.” The use of field recordings not only makes these recordings broadly accessible in a way that may be difficult absent digital technologies, but allows users, whether casual browsers of the site or researchers in an academic setting, to hear the language and its many dialects as it is actually spoken. The foregrounding of this dialectal speech in its “natural village context” forwards forms of a language that are often markedly different from standardized, more urban ways of speech.

A map showing where interviews were recorded within Bulgaria.

The site’s creators took care to make the documentation of the site’s creation available publicly, so that others who might wish to create similar digital collections could draw on the work. The site was developed using the open source content management system Drupal, a framework that allows a greater ease of reproduction/repurposing of work and which  furthers the goals and values of open source software development by creating a healthier, more robust ecosystem of scholarship and digital humanities work using freely accessible technologies.

The wordform search interface.

The project serves as an important contribution to digital scholarship in Slavic Studies. The large volume and unique content of the recordings and texts make for a valuable corpus, and the creators’ commitment to supporting other projects by using open source software and making their documentation on the site’s creation publicly available is also very admirable. I hope to see it inspire other projects that likewise support open source within the digital humanities.

For more information on digital linguistic methods, open source projects, and the Bulgarian language, please consult the UTL resources below:

Blagoeva, Diana, Svetla, Koeva, Vladko, Murdarov, Georg Rehm, and Hans Uszkoreit. The Bulgarian Language in the Digital Age. Berlin : Springer, 2012.

Computational Linguistics. Cambridge, MA: MIT Press Journals, 1984.

 Crompton, Lane. Doing More Digital Humanities: Open Approaches to Creation, Growth, and Development. Milton: Routledge, 2020.

Gold, Matthew. Debates in the Digital Humanities. Minnesota: University of Minnesota Press, 2012.

Leave a Reply