After five years of collaboration across campus, the University of Texas Libraries along with partners at the Harry Ransom Center and Blanton Museum of Art unveiled a new way to explore the university’s world-renowned cultural and research collections in one place. The Campus Collections search interface – accessible through the Libraries catalog – connects users with digitized materials from the Libraries, Harry Ransom Center and the Blanton Museum of Art.
The new service is part of a Mellon Foundation–funded project (2020–2025) to create a unified discovery platform for the university’s arts and cultural heritage holdings. While each partner institution continues to manage its own digital collections, the new interface allows researchers, students and the public to search across all three collections simultaneously – a first for the university.
The grant project team consisted of several Libraries staff across the organization, including Aaron Choate, Wendy Martin, Mirko Hanke, Devon Murphy, Melanie Cofield, Alisha Quagliana, Mandy Ryan, and Dustin Slater. As members of the Access Systems unit – which manages the Alma/Primo library services platform powering the Libraries’ catalog and discovery environment – Cofield and Quagliana collaborated closely with Metadata Analyst Devon Murphy and colleagues at the Blanton Museum and Ransom Center. Together, they worked to align descriptive standards and ensure system compatibility across institutions.
The service relies on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to synchronize records across systems, setting a baseline for shared metadata that includes titles, rights statements, identifiers and thumbnail links. The project’s success now positions the project partners to share their records more broadly through other aggregation platforms like TxHUB, operated by the Texas Digital Library.
Very few institutions have used the library catalog’s capacity to harvest metadata through OAI-PMH,” said Metadata Analyst Devon Murphy. “We really forged a new path to broaden the scope of discovery and access for our users.”
The Campus Collections search builds on the Libraries’ ongoing work to integrate more of the university’s digital assets into its discovery environment. Since launching Alma/Primo in 2020, the Libraries have harvested metadata from the Texas ScholarWorks institutional repository and the Libraries Collections Portal. These integrations allow users to find open-access research, digitized archives, maps, audio and video alongside traditional catalog materials.
The Access Systems team also oversees Alma Digital, a companion service used to manage licensed and restricted digital content such as streaming media and digital scores. Together, these tools create a more cohesive and accessible digital ecosystem for the university community.
The Campus Collections interface is now available for public use at search.lib.utexas.edu.
Each year during International Open Access Week, the University of Texas Libraries joins a global conversation about the equitable sharing of knowledge. This year’s theme – Who Owns Our Knowledge? – challenged us to consider how scholarship is created, shared, and sustained in the public interest.
Through Texas ScholarWorks, the Libraries amplifies the ideas of our campus community by providing open, long-term access to the research and creative works that shape our world. The digital repository showcases the vast and varied knowledge produced across the Forty Acres – from innovative language education to community-based research.
Among the open access collections available through the repository that we highlighted during this year’s recognition:
Hindi Urdu for Health: Language for Health Developed for the healthcare profession, this project expands communication and cultural understanding through Hindi-Urdu language learning. Designed for advanced learners and professionals, it offers materials that bridge linguistic skills with real-world applications in medicine.
Latino Research Institute Supporting interdisciplinary study of Latino populations in Texas and beyond, the Institute’s archive provides an invaluable resource for scholars, policymakers, and community advocates working to improve the lives of Latino communities across the U.S.
John L. Warfield Center for African & African American Studies A hub for activist scholarship, the Warfield Center advances critical race theory, Black feminism, and creative expression. Its digital collections reflect a commitment to civic engagement, cultural production, and the global study of Black life.
National Deaf Center on Postsecondary Outcomes Federally funded to close gaps in education and employment for deaf people, the National Deaf Center provides open, evidence-based strategies to improve accessibility and opportunity across communities.
Teresa Lozano Long Institute of Latin American Studies (LLILAS) A cornerstone of Latin American scholarship since 1940, LLILAS connects disciplines and nations. Its repository collections include conference proceedings, scholarly publications, and papers that advance understanding of Latin America’s cultures and histories.
As we reflect on who owns – and who benefits from – our collective knowledge, Texas ScholarWorks stands as a testament to the power of open access to break barriers, foster collaboration, and make scholarship truly public.
Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship. Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.
On September 10, Princeton University Library unveiled a new digital and physical exhibition, titled “Forms and Function: The Splendors of Global Book Making.” The exhibition is a feast to anyone interested in book history, and especially those who want to learn about how the formats of a “book” varied through time and space. It is also a rare opportunity for the public to view some of the least known hidden gems in Princeton’s collections.
The exhibition includes manuscripts and printed books from Western, Islamic, East, South and Southeast Asian, and Mesoamerican cultures. There are seventy-four items on digital display, and they represent many materials for book making that may not be familiar to a contemporary and Western audience, including bark, textiles, shell, lacquer, and copper. The earliest produced book on display is an Egyptian clay cylinder from the 6th century BCE, while the latest is an Indian artistic book made with copper plates from 2020.
Three “traditions” of book formats are featured in the exhibition: the codex tradition, the East Asian tradition, and the pothī tradition.
The codices, defined in the exhibition as “single- or multi-gatherings of sheets folded inside each other, with texts on both sides, sewn together, and usually attached onto covers,” gradually replaced scrolls, and became the preferred format for early Christianity but later spread to Central and South Asia and was also adopted by Islamic and Hindu traditions. The exhibit includes an extremely rare early Coptic manuscript of Gospel of St. Matthew, and a palimpsest parchment on which the text was once erased to allow reuse.
Figure 1: Georgian palimpsest
Also included is a Chinese edition of Missale Romanvm produced by the Jesuits in 1670 which was printed with woodblock but bound in a European codex format.
The East Asian tradition, which included the Chinese, Japanese, and Korean cultures, demonstrates a wide range of mediums and materials to produce reading materials and extensive influence into other Eurasia regions. Among the bamboo slips and Dunhuang scrolls is an inner garment with over 700 “eight-legged” exemplary exam essays written on it, totaling more than half a million miniature characters.
Figure 2. Pulinsidun daxue baguwen sichou chenyi
Another rare item on display is a reproduction of ink rubbings that the late-Qing statesman, Duanfang (端方, 1861–1911), made from Egyptian and Greek objects during his diplomatic missions in the early 1900s.
Figure 3. Aiji wuqiannian guke
The pothī tradition, heavily influenced by the palm leaf, one of the earliest materials in the region used for writing texts, is no less diverse in terms of materials and formats that supported the texts. The exhibition features an earliest example of paper making from Nepal (1140), on which the popular Pañcarakṣā sūtra (Sūtra of the five protectresses) is written.
Figure 4. Pañcarakșā sutra (Sutra of the five protectresses)
Coming after the palm leaves, later materials, such as birch bark, gold, and paper, mimicked its progenitor’s shape. The loose pages were usually stacked to make a bundle. With Brahmanism and Buddhism, the format spread across South and Southeast Asia and reached the Mongols and Manchus through Tibet.
Figure 5. Coqbbertv (The emergence and migration of humankind)
Here is an example of a relatively understudied Dongba manuscript from the Naxi people, an ethnic minority living in China’s Yunnan province.
Beyond the main three themes, the exhibition also showcases some formats that different traditions share: single-sheet, scrolls, and accordion style. One of the highlights from this section is one of the earliest printed texts in the world, the Hyakumantō darani from Nara-era Japan.
Figure 6. Hyakumantō darani (A dhāraņī from inside a one-million-pagoda)
The work was commissioned by the court in 764. Printed Buddhist spells were inserted into mini pagodas. These short texts, also known as “mantras,” are verbal formulas and chants for various spiritual purposes. Currently, “tens of thousands of the pagodas and several thousand printed spells still exist.”
Last but not least, the exhibit shines light on even more materials that were used to serve as the media for texts. The hard surfaces of stone, metal, and bones were widely used across the globe. For example, a conch shell with Maya glyphs is on display in this section.
Figure 7. 1 Ajaw 3 Chakat (17 March, 761 CE)
The exhibition was curated by Dr. Martin Heijidra, Director of the East Asian Library at Princeton. The online version includes an interactive timeline and map, where viewers can click on the numbered titles of the items to go to their catalogue records, which has a brief but detailed description of the item and additional readings about the research on each of the items
Figure 8. A section of the interactive map
Online viewers can also download the PDF files of the accompanied catalogue and exhibition brochure. The digital exhibition not only provides an alternative for those who cannot see it in person, but it also gives it another form of life that will extend after the exhibition hall welcomes another array of objects.
Tian, Tian. “Duanfang’s Egyptian Rubbings: The First Egyptian Collection in Late Imperial China.” Antiquity 99, no. 406 (2025): 1129–42. https://doi.org/10.15184/aqy.2025.10098.
Galambos, Imre. “The Chinese Pothi: A Missing Link in the History of the Chinese Book.” The Medieval History Journal 27, no. 1 (2024): 152–72. https://doi.org/10.1177/09719458241231669.
Thanks to the generous support of the Center for European Studies and the UT Libraries, I was recently able to travel to London, England and Lisbon, Portugal. On my trip, I had the chance to attend a scholarly conference, acquire unique materials to add to UTL’s collections, network with academics, vendors, and librarians, and purchase books for the UT Libraries’ collections.
A street in London lined with bookstores containing antiquarian and rare books.
My time in London was an invaluable opportunity to build stronger connections with an international cohort of colleagues. For example, I met with one of the UT Libraries’ vendors who I work with to procure rare materials on early twentieth century European politics. The vendor I met with, Carl Slienger, frequently supplies us with items not held by any other North American libraries, making the materials he sources very important for our distinctive holdings of pamphlets and other propagandistic literature, as well as antiquarian books that enhance our holdings of rare and unique European occult and spiritualist materials. I also met with a colleague at the British Library to discuss coding workflows and best practices for working with digital materials. Meeting with my colleague at the British Library was likewise very beneficial, as much of my work involving digital methodologies is focused on programming in Python and other languages, and I am currently supervising a project focused on using Python to automate digital archival workflows.
Ian outside of the British Library.
In Lisbon, I attended and presented at the The Alliance of Digital Humanities Organizations (ADHO) Digital Humanities 2025 conference. My poster presentation focused on software packages I have written in the Rust programming language to support multilingual computational approaches to linguistics and digital humanities. My poster highlighted three software packages: a package for performing lemmatization, a key natural language processing task, on text; a package for assessing the readability of a text containing a variety of algorithms to choose from; and a package to perform stylometric analysis on text. They were all built with multilingual support in mind, and as such are specifically designed to move outside of an Anglocentric paradigm often found in technologies for natural language processing and textual analysis, creating new opportunities for multilingual and non-English textual analysis and digital humanities. Beyond my own presentation, I was able to attend talks on other digital research methodologies throughout the conference. Being able to attend talks by colleagues from all around the globe was both invigorating and rewarding, and an invaluable way to stay on top of the current research being done in the digital humanities. I also took the opportunity to acquire a small amount of zines while in Lisbon, adding to our collection of unique materials that we would not be able to purchase without undergoing a foreign acquisitions trip.
The poster session area at the DH 2025 conference in Lisbon.
This trip allowed me the opportunity to represent UT Austin internationally to a diverse group of colleagues, and I’m grateful that I was able to serve the Libraries in such a capacity. I look forward to building on our distinctive holdings and further expanding UT’s collections while continuing to work on using digital methodologies to enhance accessibility for research and open source software.
The Libraries welcome the arrival of new Senior Vice Provost and Director Robert H. McDonald this semester, marking the beginning of a new era for one of the university’s most vital academic resources.
McDonald, who brings decades of leadership experience in academic libraries and digital scholarship from institutions including the University of Colorado Boulder and Indiana University Bloomington, arrived at the Forty Acres eager to connect with the campus community and immerse himself in Longhorn traditions.
McDonald has launched his tenure with a full slate of onboarding activities – meeting administrative staff, touring library facilities, and connecting with colleagues across campus.
The Libraries’ participation in Longhorn Welcome activities provided the new senior vice provost with an early opportunity to experience university traditions firsthand. Two of UT’s notable kickoff events, Moov-In and Gone To Texas, occurred moments after McDonald christened his term, offering the newcomer a glimpse into burnt orange culture.
At a co-sponsored graduate student social reception in the Perry-Castañeda Library’s Scholars Lab just before the beginning of classes, McDonald had the opportunity to interact with new students from a range of disciplines, including data science, ethnic studies, social work, and mathematics. He also made the rounds at Libraries-hosted Welcome Week events such as the Game Night, Zine Fest, and Bibliogarden – where he checked out his first book from the UT collections: Born to Run: A Hidden Tribe, Superathletes, and the Greatest Race the World Has Never Seen by Christopher McDougall.
In addition to meeting students and other members of the university community, McDonald ventured to several of the Libraries’ specialized facilities, including the Collections Preservation and Research Center, the Fine Arts Library, the Architecture and Planning Library and Alexander Architectural Archives, the Benson Latin American Collection, and the Collections Deposit Library. The visits, he noted, helped him deepen his understanding of the Libraries’ system and the breadth of its collections.
McDonald has continued establishing ties with other campus leaders, meeting with Libraries’ stakeholders and UT leadership around the Forty Acres, with plans to continue outreach with deans, faculty, and administrative partners to advance shared priorities.
Fresh from hosting his first all-staff meeting, where he had the opportunity to meet with staff in a more informal setting and hear more about work and activity around the Libraries, McDonald also attended his first UT football game – a rite of passage he shared with visiting colleague Michael Meth, Dean of the University Library at San José State University, who was in town to support the Spartans in what was a losing bid against the favored Longhorns.
As he continues to settle into his new role, McDonald has emphasized continuity and collaboration. His early weeks attest to a leader eager not only to understand the Libraries’ legacy but also to shape its future at the heart of the university’s teaching and research mission.
The Libraries kicked off the fall semester with a slate of engaging events as part of Longhorn Welcome Week 2025 (9/25-29), offering students opportunities to connect, create, and explore library resources and spaces in new ways. From trivia contests and art-making to zine collaging activities, the Libraries helped set the stage for a vibrant start to the academic year.
Game Night
The week began on Monday evening at the Perry-Castañeda Library with Game Night, where nearly 90 attendees gathered for an evening of friendly competition, pizza, and prizes. Students tested their knowledge in UT- and Austin-themed trivia rounds, winning much-coveted Labubu collectibles, while others played Bingo for UTL swag. Tables buzzed with card and board games like Uno, Bananagrams, Connect 4, and dominos. The library also showcased game-related books from its collections – Dungeons & Dragons titles proved especially popular. Beyond the games, the event provided opportunities for connection and community-building, with students exchanging numbers to plan future gaming sessions and meet-ups.
Exploring Color and Geometry in Islamic Art
The Fine Arts Library hosted Color & Geometry in Islamic Art on Tuesday, where more than 30 students explored traditional and contemporary craft techniques. Participants painted wooden puzzles, decorated fabric and paper with Foundry-made stamps, designed jewelry, and experimented with Arabic calligraphy. Foundry tours highlighted the library’s creative technologies, and puzzle- and bead-stringing activities were crowd favorites. The event blended hands-on learning with cultural exploration, giving students a chance to engage with both artistic traditions and cutting-edge library resources.
Zine Making Party
The popular annual Zine Making Party drew about 55 participants on Wednesday, where attendees flexed their DIY muscles, collaging, cutting, and creating minimalist artworks. Students used magazines donated by UTL staff to craft one-page zines on topics meaningful to them, and many explored the Fine Arts Library’s extensive zine collection. Faculty even joined in the fun, underscoring the event’s wide appeal. Past years’ collages remain on display at the library’s entrance, offering a living archive of student creativity.
Bibliogarden
Activities returned to PCL on Wednesday, where the Libraries’ Bibliogarden brought together nearly 50 attendees and UT Libraries staff from across disciplines. Students designed bookmarks, explored leisure-reading recommendations from the new “leisure cart,” and browsed a curated selection of zines, chapbooks, cookbooks, and global literature. Highlights included Southeast Asian cookbooks paired with homemade photos, and a table from Austin Public Library where students could sign up for local library cards. The event fostered community while showcasing the breadth of UT Libraries’ collections and services.
Closing with Cinema: Minari Screening
The week concluded on Friday with a screening of Minari, the acclaimed 2020 film about a Korean American family building a life in rural Arkansas. Students and community members gathered at the Perry-Castañeda Library to share in the story of resilience and belonging – an apt reflection of the welcoming spirit that defined the week. Co-sponsored by the Center for East Asian Studies’ Korea Program and the Center for Asian American Studies, the film drew a full house, and capped an excellent week of introductory experiences for new and returning Libraries users.
The Libraries’ efforts did not go unnoticed. The Libraries’ Game Night was honored as a Longhorn Welcome Event of the Year, a recognition that underscores the event’s positive impact on helping new and returning Longhorns feel at home on the Forty Acres.
Through Longhorn Welcome Week 2025, the Libraries underscored its role not only as a resource hub but also as a vibrant community space where students can learn, create, and connect. Whether through games, art, zines, or shared stories, the Libraries offered students multiple ways to launch into the new academic calendar with curiosity and connection.
Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship. Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.
This post was written by Sojeong Ryoo, the Global Studies Digital Projects GRA at Perry-Castañeda Library and a current graduate student at the School of Information.
Sometimes, an ordinary personal diary can be an extraordinary historical resource that provides a glimpse into the times. The Jiam Diary Digital Humanities Project, led by Professor JiYoung Jung at Ewha Womans University,is based on the nearly eight years (95 months) of journals kept by Yun Yi-hu (pen name: Jiam), a yangban (nobleman) of the Joseon dynasty in the late 17th century.
Between 1692 and 1699, Yun Yi-hu — a retired nobleman in the Honam region — kept a meticulous diary of his daily life. Known as Jiam Ilgi, the three-volume, 920-page record captures everything from farming, fishing, and travel to visits with friends, political events, and even the activities of household slaves. It offers a vivid portrait of both personal routines and the broader social world of 17th-century Korea.
The Jiam Diary Digital Humanities Project transforms this rich historical source into searchable, visualized data. More than 80,000 pieces of semantic information have been extracted and organized into interactive maps, timelines, and relationship networks, enabling users to explore the people, places, events, and lifestyles mentioned in the diary. This digital approach opens a new window into the everyday life of a Joseon noble.
On the site, you can explore data visualizations in five themes: Lifestyle, Person, Place, Event, and Slave.
Lifestyle visualization is designed to help users explore lifestyle-related data. The eight-year calendar is arranged vertically by year and horizontally by month, with original and translated diary entries for each day appearing in the blank space on the right. Clicking the small square menu box in the upper left allows you to choose from over 80 lifestyle categories and view related records. Examples of lifestyle categories include: family, returning home, visiting, slaves, bathing, private punishment, guests, lodging, prison, transactions, architecture, capital affairs, civil service examinations, weather, farming, theft, literature, unusual events, funerals, hunting, fishing, arts, entertainment, medicine, disease, and slave hunting.
Person visualization highlights people mentioned in the diary. Yun Yi-hu’s family, kinship ties, and political connections are shown as an interactive network. Clicking a person icon allows you to filter and view the diary text for the date the person was mentioned.
Place visualization displays a map of Korea, marking locations mentioned in the diary. Places can be categorized by type — administrative districts, buildings, roads, mountains, fields, rivers, and islands — and filtered using the menu on the bottom left. Clicking on a place reveals diary entries linked to that location. For example, selecting Seoul — the capital of the Joseon dynasty — returns an impressive 455 entries. Clicking on a diary date on the right reveals the original and translated text of the diary entries that mention “Seoul.”
Event visualization organizes events from the diary along a timeline, making it easy to see which major events Yun Yi-hu experienced and how long they lasted. Clicking on an event box shows the relevant diary entry, with entries displayed chronologically below.
Slave visualization examines the nobi (slave). The slave system existed during the Joseon dynasty. As a nobleman, Yun Yi-hu owned many household slaves. The visualization links each slave in the center to the lifestyle activities associated with them, displayed around the outside. This allows us to examine the roles played by specific slaves in Yun Yi-hu’s life.
Looking into the records of Yun Yi-hu, a man who lived centuries ago, we see him interacting with family and friends, traveling, rejoicing, and grieving — experiences not so different from our own lives today. The Jiam Diary Digital Humanities Project brings this rich life to light through diverse visualizations of his diary, offering deep insight and inspiration.
To learn more about the Joseon dynasty or Korea in general, check out these resources in the UT Libraries’ collections:
As we quickly become absorbed in the fall semester, I want to extend a warm welcome to you all – whether you’re arriving on the Forty Acres for the first time or returning for another academic year. I joined the University of Texas at Austin just a little over two weeks ago, and in that brief time, I’ve been struck by the energy, character, and sense of possibility that define this community.
My career path has led me through a variety of roles at institutions including the University of Colorado Boulder, Indiana University, and the San Diego Supercomputer Center (UCSD), where I built expertise in digital preservation, new models of scholarly communication, and library leadership and management all with an eye toward the future of higher education. I am honored to bring that experience to this exceptional university.
In these early days, I’ve been getting to know campus life, meeting colleagues, exploring libraries, and discussing ways to support your work. With new leadership taking on key positions, it’s a period of renewal, and I am grateful to be involved.
The University of Texas Libraries are central to this momentum. More than collections and spaces, our libraries are places where ideas are tested and forged, collaborations are formed, and knowledge is shared. This fall, we have been proud to be part of welcome events across the campus, and we want to welcome and help connect all students both with our unbelievable content as well as with each other. It is the strength of our community that enables our success.
As I continue to settle in to both the campus and the Austin community, I look forward to meeting many of you in our libraries, at events, or simply as we cross paths on campus. Please don’t hesitate to stop and say hello. Together with our dedicated library staff, I am committed to ensuring the Libraries remain a vital partner in your academic, professional, and personal journey. I’m grateful for the opportunity to join you at this exciting moment for UT, and I look forward to what we will accomplish together in the year ahead.
Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship. Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.
The foundation of digital humanities is data. Lots of it.
As the early phases of AI have shown us, there is a staggering amount of textual data available to manipulate and compute–both openly available and that which exists behind paywalls. All too often the depth and accessibility of digital scholarly textual data in non-English and non-Roman scripts is lacking. Rather than be left behind or constrained by these lacuna, individual scholars are working to generate their own digital research corpora, often building upon AI tools.
Recently I was introduced to the MITRA project and have been nothing short of amazed.
A research project from the University of California-Berkeley’s AI Research Lab, MITRA “focuses on bridging the linguistic divide between ancient wisdom source languages and contemporary languages through the application of advanced Deep Learning and AI technologies.” Using Gemini APIs, MITRA builds upon an extensive digitized text corpus and contributions from translators and researchers alike to “harness AI technologies to promote the scholarly study and personal practice of the dharma and to accelerate academic and individual research through open-source collaboration on datasets, models and applications.” In so doing, MITRA aims to “overcome the challenges inherent in low-resource language translation,” to “minimize language barriers,” and to create “more equitable access to literature and wisdom.”
I have engaged with OCR and digital text conversion for years but have always found it to be a labor intensive and ultimately less-than-satisfying [or accurate] experience, especially for non-roman languages and scripts. Of Interest to me, therefore, is how MITRA has harnessed AI to allow one to drag-and-drop PDF files into the tool at which point it can both detect the language (Sanskrit & other Devanagari-based languages, Tibetan, scriptural Chinese or English) and use OCR to produce a relatively accurate text file. That unto itself is pretty amazing. From there, however, one can quickly transliterate, translate and/or explain the text into Sanskrit, Buddhist & Modern Chinese, Russian, Korean, Japanese, German, French, Italian, Hindi or Spanish.
To test it out, I grabbed a small amount of openly accessible text from HathiTrust. I chose an early Hindi novel, namely Rāmalāla Varmmā’s Banārasī Dupaṭṭā Yā Gularū Zarīnā from 1916 which is readily available in PDF form on HathiTrust. I grabbed the first page of the novel which looks like this:
Page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā from HathiTrust
I then put a PDF of that page into MITRA to see if it could OCR the text. Despite some blurriness of the original source text, it most certainly could OCR it (even if not 100% accurate):
MITRA’s OCR of page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā
Encouraged, I then asked MITRA to both transliterate (take the text written in Devanagari script and convert to roman script) and to translate the text which it also did quite quickly and easily:
MITRA’s transliterationTranslation of page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā
Ever more optimistic, I then clicked on “English explained” and MITRA was also quite adept at parsing the translated text, the original script of the text, and the grammar and vocabulary.
MITRA’s “English Explained” of page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā
I repeat, I stand amazed.
While MITRA has clearly captured my attention and my appreciation, I will note that there are other similar projects currently available and equally commendable, from Andrew Ollett’s Indological and OCR tools [and fabulous related explanations] to Tyler Neill’s toolkit, Skrutable.
Likewise, the UT Libraries is here to help explore the production of your own digital content for research. The Scan Tech Studio in the PCL Scholars Lab has the hardware and software you might need to convert print into digital texts, as well as a group of specialists to help you. We have online guides to introduce the practices and concepts of OCR as well as recordings from OCR workshops.
I encourage anyone interested in exploring non-English or non-roman digital texts to jump in, kick the tires, and have some fun with these impressive conversion projects.
In the digital age, historical maps hold a wealth of information, but unlocking their full potential for geospatial analysis and historic research often requires labor-intensive georeferencing. An innovative project the University of Texas Libraries is evolving this process through the power of machine learning.
The Libraries boast a vast cartographic collection in the Perry-Castañeda Library Maps Collection including thousands of items that have been scanned for online and digital use, yet only a fraction of them are georeferenced, hampering their utility for scholars and researchers. Recognizing the immense challenge of manually georeferencing tens of thousands of maps, the Libraries have turned to cutting-edge technology to automate this arduous task.
Georeferencing – the process of assigning geographic coordinates to a map image – is essential for accurately situating maps on the Earth’s surface within GIS (Geographic Information System) software. Traditionally, this has been painstaking manual work, but the emergence of machine learning offers a promising alternative.
Enter the proof-of-concept project spearheaded by geospatial and data specialists at the Libraries, which focuses on automating the georeferencing of Sanborn Fire Insurance maps–a pivotal component of their collection. Sanborn maps provide invaluable insights into urban development and infrastructure from the late 19th and early 20th centuries.
To tackle this ambitious undertaking, the project team developed a custom annotation tool to identify street intersections on a small subset of maps from the collection. Leveraging object detection models trained on machine learning algorithms, the tool automatically detects these intersections, streamlining the georeferencing process.
Optical character recognition (OCR) technology is then employed to extract street labels associated with the intersections identified by the object detection model. This data is then cross-referenced with a modern street intersection dataset derived from OpenStreetMap, enabling the precise georeferencing of the historical maps.
Remarkably, the automated process has already achieved a significant milestone, successfully georeferencing 14% of the Sanborn maps with a level of accuracy comparable to manual methods. This initial success paves the way for scaling up the project to encompass the entire collection of Sanborn Fire Insurance maps, as well as extending the approach to other map collections in the future.
Looking ahead, the project team is ideating enhancements of the process and further refining its accuracy. Continuous refinement of the machine learning models, improvements to the OCR process for reading street labels, and collaboration with other experts in the field are just a few avenues being explored to optimize the georeferencing workflow.
In an era where data-driven insights are increasingly shaping our understanding of the past, initiatives like the Libraries’ machine learning project offer a glimpse into the transformative potential of technology in historical research. By harnessing the power of machine learning, the Libraries are discovering ways to unlock the spatial dimensions of history and illuminate new pathways for scholarship and discovery.