Read, Hot and Digitized: AI for OCR & Translation

Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship.  Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.


The foundation of digital humanities is data.  Lots of it.

As the early phases of AI have shown us, there is a staggering amount of textual data available to manipulate and compute–both openly available and that which exists behind paywalls.  All too often the depth and accessibility of digital scholarly textual data in non-English and non-Roman scripts is lacking.  Rather than be left behind or constrained by these lacuna, individual scholars are working to generate their own digital research corpora, often building upon AI tools.

Recently I was introduced to the MITRA project and have been nothing short of amazed.

A research project from the University of California-Berkeley’s AI Research Lab, MITRA “focuses on bridging the linguistic divide between ancient wisdom source languages and contemporary languages through the application of advanced Deep Learning and AI technologies.”  Using Gemini APIs, MITRA builds upon an extensive digitized text corpus and contributions from translators and researchers alike to “harness AI technologies to promote the scholarly study and personal practice of the dharma and to accelerate academic and individual research through open-source collaboration on datasets, models and applications.”  In so doing, MITRA aims to “overcome the challenges inherent in low-resource language translation,” to “minimize language barriers,” and to create “more equitable access to literature and wisdom.” 

I have engaged with OCR and digital text conversion for years but have always found it to be a labor intensive and ultimately less-than-satisfying [or accurate] experience, especially for non-roman languages and scripts.  Of Interest to me, therefore, is how MITRA has harnessed AI to allow one to drag-and-drop PDF files into the tool at which point it can both detect the language (Sanskrit & other Devanagari-based languages, Tibetan, scriptural Chinese or English) and use OCR to produce a relatively accurate text file.  That unto itself is pretty amazing.  From there, however, one can quickly transliterate, translate and/or explain the text into Sanskrit, Buddhist & Modern Chinese, Russian, Korean, Japanese, German, French, Italian, Hindi or Spanish. 

To test it out, I grabbed a small amount of openly accessible text from HathiTrust.  I chose an early Hindi novel, namely Rāmalāla Varmmā’s Banārasī Dupaṭṭā Yā Gularū Zarīnā from 1916 which is readily available in PDF form on HathiTrust.  I grabbed the first page of the novel which looks like this:

Page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā from HathiTrust

I then put a PDF of that page into MITRA to see if it could OCR the text.  Despite some blurriness of the original source text, it most certainly could OCR it (even if not 100% accurate):

MITRA’s OCR of page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā

Encouraged, I then asked MITRA to both transliterate (take the text written in Devanagari script and convert to roman script) and to translate the text which it also did quite quickly and easily:

Ever more optimistic, I then clicked on “English explained” and MITRA was also quite adept at parsing the translated text, the original script of the text, and the grammar and vocabulary. 

MITRA’s “English Explained” of page one of Banārasī Dupaṭṭā Yā Gularū Zarīnā

I repeat, I stand amazed.

While MITRA has clearly captured my attention and my appreciation, I will note that there are other similar projects currently available and equally commendable, from Andrew Ollett’s Indological and OCR tools [and fabulous related explanations] to Tyler Neill’s toolkit, Skrutable

Likewise, the UT Libraries is here to help explore the production of your own digital content for research.  The Scan Tech Studio in the PCL Scholars Lab has the hardware and software you might need to convert print into digital texts, as well as a group of specialists to help you.  We have online guides to introduce the practices and concepts of OCR as well as recordings from OCR workshops

I encourage anyone interested in exploring non-English or non-roman digital texts to jump in, kick the tires, and have some fun with these impressive conversion projects. 

Historic Maps, New Coordinates

Machine Learning Meets the Sanborn Maps

In the digital age, historical maps hold a wealth of information, but unlocking their full potential for geospatial analysis and historic research often requires labor-intensive georeferencing. An innovative project the University of Texas Libraries is evolving this process through the power of machine learning.

The Libraries boast a vast cartographic collection in the Perry-Castañeda Library Maps Collection including thousands of items that have been scanned for online and digital use, yet only a fraction of them are georeferenced, hampering their utility for scholars and researchers. Recognizing the immense challenge of manually georeferencing tens of thousands of maps, the Libraries have turned to cutting-edge technology to automate this arduous task.

Georeferencing – the process of assigning geographic coordinates to a map image – is essential for accurately situating maps on the Earth’s surface within GIS (Geographic Information System) software. Traditionally, this has been painstaking manual work, but the emergence of machine learning offers a promising alternative.

Enter the proof-of-concept project spearheaded by geospatial and data specialists at the Libraries, which focuses on automating the georeferencing of Sanborn Fire Insurance maps–a pivotal component of their collection. Sanborn maps provide invaluable insights into urban development and infrastructure from the late 19th and early 20th centuries.

To tackle this ambitious undertaking, the project team developed a custom annotation tool to identify street intersections on a small subset of maps from the collection. Leveraging object detection models trained on machine learning algorithms, the tool automatically detects these intersections, streamlining the georeferencing process.

Optical character recognition (OCR) technology is then employed to extract street labels associated with the intersections identified by the object detection model. This data is then cross-referenced with a modern street intersection dataset derived from OpenStreetMap, enabling the precise georeferencing of the historical maps.

Remarkably, the automated process has already achieved a significant milestone, successfully georeferencing 14% of the Sanborn maps with a level of accuracy comparable to manual methods. This initial success paves the way for scaling up the project to encompass the entire collection of Sanborn Fire Insurance maps, as well as extending the approach to other map collections in the future.

Looking ahead, the project team is ideating enhancements of the process and further refining its accuracy. Continuous refinement of the machine learning models, improvements to the OCR process for reading street labels, and collaboration with other experts in the field are just a few avenues being explored to optimize the georeferencing workflow.

In an era where data-driven insights are increasingly shaping our understanding of the past, initiatives like the Libraries’ machine learning project offer a glimpse into the transformative potential of technology in historical research. By harnessing the power of machine learning, the Libraries are discovering ways to unlock the spatial dimensions of history and illuminate new pathways for scholarship and discovery.


Read a research article about the project at: https://www.tandfonline.com/doi/full/10.1080/15420353.2025.2462737

“Visiting Days”: An Archive of Family Care at São Paulo’s Largest Women’s Prison

An archive acquired through the LLILAS Benson Archiving Black América–Black Diaspora Archive initiative documents scenes from prison visiting days

Brazil is among the top incarcerators of women worldwide, with Black women accounting for 65 percent of this population. The largest women’s prison in the country is the Penitenciaria Feminina Santana (Santana Women’s Penitentiary) in São Paulo. Every weekend, families of incarcerated women arrive to visit their loved ones on the inside. On Avenida Ataliba Leonel, the busy thoroughfare just outside the prison, two tents, or barracas, serve as informal storage sites where visitors pay to store their belongings prior to lining up to enter the prison. The tents also offer food for purchase.

A group of several dozen people cluster around the entrance gates of the large women's penitentiary in São Paulo, Brazil. One woman sits at the curb, a small child by her side. Many of the people have white plastic bags on the ground near them. The prison entrance is a pale yellow archway, trimmed in medium grayish blue, with a gate if the same blue, the name of the prison written above. In the foreground there is the surface of the street with many lines painted for crosswalks.
Facade of the largest women’s prison complex in Latin America, the Santana Women’s Penitentiary. Photo: Flávia Biazeto. Black Diaspora Archive.

In the archive Dias de Visita/Visiting Days: Strategies for Connections, Affections and Black Encounters in Latin America’s Largest Women’s Penitentiary, LLILAS PhD student Ana Luiza Biazeto has assembled images and oral histories from her visits to the barracas, where she interviewed family members of incarcerated women, as well as some of the people who set up and run the tents. Biazeto became familiar with the prison and the visiting area during research for her master’s thesis, which was about Black incarcerated women in the prison.

A large blue tarp creates a tent with an open front. People can be seen standing or sitting under the tarp—one with an umbrella, one bent over holding a white plastic bag. Various bags and at least one suitcase are visible. On the rainy street in the foreground, a man rides by on a bicycle.
Loira’s tent welcomes visitors on a rainy day. Photo: Flávia Biazeto. Black Diaspora Archive.

While many incarcerated women are completely separated from the lives of their families and loved ones during their imprisonment, others are visited by family on a regular basis. During her first year as a PhD student, in a 2022 seminar on urban Brazil, Professor Lorraine Leu asked Biazeto some pointed questions about the women she had interviewed in the prison: How were their children doing? Who were their families? Leu’s questions inspired Biazeto to think more deeply about the dias de visita and what she could learn in this setting. She applied for, and received, an Archiving Black América–Black Diaspora Archive (ABA–BDA) archival acquisition award, which afforded her an opportunity to better understand the dynamic of the families.

Two small boys, both with shorn heads, face away from the camera. They are standing on a paved median facing a two-lane road. In the background, pale yellow three-story building can be seen. The sky above it is gray. The boys stand with their shoulders touching. They wear flipflop sandals, matching voluminous sweatpants that are light blue with a wide navy blue band across the knee, and long-sleeved sweatshirts.
Brothers, taken by their grandmother, wait to visit their mother in the Santana Women’s Penitentiary. Photo: Ana Luiza Biazeto. Black Diaspora Archive.

“I learned of things that I never would have imagined,” said Biazeto in an interview during spring 2025. “I was in connection with many mothers, many grandmothers, who visit women. There were many children, running around there on the avenue. And as I interviewed, I cried along with the women. The children came and showed me the drawings they were making, many the age of [my youngest child]. And just as my master’s thesis involved a painful process, it is also a painful thing to confront these realities.”

Barraca da Loira and Barraca da Adriana, named for the women who run them, are part of the informal economy and are protected by the Primeira Comanda da Capital, or PCC, an organized crime unit in São Paulo that is sometimes called upon by the state to act. Biazeto says the PCC might be on hand to make sure people line up in an orderly manner to visit the prison.

A group of white plastic bags sit on a dirty orange tarp. Each one is tied with a knot at the top. On some, a small yellow square of paper with a handwritten number is visible attached with a metallic hook. Some belongings, such as a dark plaid umbrella, can be seen peeking out of the bags.
Visitors’ items are put in plastic bags and locked with a password in Adriana’s tent. Photo: Flávia Biazeto. Black Diaspora Archive.

During her fieldwork, Biazeto conducted interviews with Adriana, who runs one tent, and with Karina, the daughter of “a Loira” (“Blondie”), who runs the other. Additionally, Adriana and her son, Paulo, recommended visiting family members for Biazeto to interview.

“They knew the people, they heard their stories, they sold them coffee, they welcomed the people,” Biazeto said.

Closed containers of cake and several individually wrapped sandwiches made with white bread sit on a wooden table. Two cake containers are stacked one atop the other, while a knife sits atop a single container.
The cake and snacks sold at Barraca da Loira. Photo: Flávia Biazeto. Black Diaspora Archive.

In the excerpt below, Biazeto discussed her fieldwork in more depth. The following conversation is translated from the Portuguese and condensed.

Q: What were some of the things that surprised you?

Biazeto: I saw a mother putting on makeup to show her daughter that she was ok. Because she said that her role was to maintain her daughter’s well-being inside the prison. She said, “I cry here with you, but I go in there with a smile for her to have hope, that I’m waiting for her out here, and that everything is all right.” So she puts on makeup, she applies eye shadow, puts on lipstick, fixes her hair.

In a grainy photo, a woman applies red lipstick to her mouth. She holds a mirror and the silver top of the lipstick tube in one hand, while applying the color to her open mouth in the other. She is wearing a leopard-print jacket.
A mother applies lipstick before visiting her daughter in the women’s penitentiary. Photo: Flávia Biazeto. Black Diaspora Archive.

I also saw—although the statistics say the opposite—I saw many men going to visit their women. Taking their kids. So this happens in a way that research doesn’t show. These men are also invisibilized. Principally Black men, because when we talk about the Brazilian prison system, we’re principally talking about race. I saw a grandfather bringing a grandson to see his daughter. I saw a father bringing a little girl in a stroller, giving her a bottle. Cooking for the women. Breaking those gender barriers somewhat.

Professor Christen Smith commented [on my research], “You are bringing in new viewpoints [novos olhares].” Because it’s the man who works the dawn hours as garbage collector, street sweeper; comes back home, cooks, takes his daughter, and goes to the gate of the penitentiary. So those were the things that surprised me.

A slender man in loose gray sweatshirt and sweatpants stands with his back to the camera. He is facing a crowded line across the street from the entrance gate to the women's prison. He holds a small child against the left side of his chest. The child is wearing gray sweats, a blue-and-white hat with ear covers and a pompom on top, and bright red sneakers. The man has short black hair and a cigarette tucked behind his right ear.
Father takes his child to see mother, sentenced and imprisoned in PFS. Photo: Flávia Biazeto. Black Diaspora Archive.

Also, mothers who brought food not just for their own daughters, but for the cellmates, and the block-mates, because they didn’t have visitors. A mother said, “My daughter shares the food I bring, even if it’s a spoonful for each person.”

Q: What are these women serving time for?

Biazeto: In general, it is drug trafficking. Sometimes it’s a family business; sometimes inherited from the mother, or along with a male partner. Generally it is the user who is criminalized, not the dealer.

Q: What more would you like to share about your work?

Biazeto: [I’ve been encouraged by Lorraine Leu to think about the (im)possibilities of Black futures in the context of the prison.] To see the children running around there, in the middle of a busy avenue, is to think about Black resistance. Right there, you witness the formation of a community that supports and sustains its members somehow, whether it’s sharing information on legal issues, or the workings of the prison system. Many times, this information comes from outside, from family members exchanging information between themselves. I could see a solidarity among those family members. I think that this archive keeps alive the memory of people who are resisting the Brazilian police state. It is a new way of resisting.

A small dark blue tent, open on two sides and held up by metal poles, reads "Barraca da Loira" in bright yellow letters (Loira's Tent). Inside the tent, there is a small metal table with a few full plastic bags, one or two large Thermos bottles, and two round plastic containers containing cake. Suspended from a makeshift clothing line outside strung from a larger pole is a rope with a few articles of clothing hung from it, among them a hot-pint long-sleeved sweatshirt. Several people stand facing the tent with their backs to the camera. In the foreground, a small amount of the street is visible, including an orange traffic cone.
Barraca da Loira sells flip-flops, soft drinks, coffee, cake, and underwear. They also rent out clothes and serve as a locker. Photo: Flávia Biazeto. Black Diaspora Archive.

Ana Luiza Biazeto will spend the 2025–2026 academic year in Brazil to continue her dissertation research on resistance and resilience among Black women and their families in the Brazilian carceral system.

The contents of the Visiting Days archive can be reviewed via Texas Archival Resources Online (TARO). The Black Diaspora Archive is an initiative of Black Studies, LLILAS Benson Latin American Studies and Collections, and the Office of the President. The archive is housed at the Nettie Lee Benson Latin American Collection.

Read, Hot and Digitized: Country of Words | بلد من كلام

One of the more complex questions we encounter in area studies is how we define a nation. Is it lines on a political map? A shared territory? For Palestinians, traditional maps can often feel inadequate, showing borders and divisions but failing to capture the full, lived reality of a people. The remarkable digital-born project, Country of Words | بلد من كلام : A Transnational Atlas for Palestinian Literature, by Refqa Abu-Remaileh (Freie Universität Berlin) rethinks the very idea of a map. Instead of depicting political boundaries, it offers a form of literary cartography.

Screenshot of literature under British occupation essay title
Screenshot
Screenshot of Literary Diasporas essay title
Screenshot

At its heart, Country of Words is an interactive, web-based atlas that visualizes the vast geography of Palestinian literature. When you visit the site, you are met with a world map dotted with points and featuring an accompanying timeline. Each dot represents a location—Gaza, Jerusalem, Beirut, but also Paris, Santiago, and Iowa City—that appears in a work of Palestinian fiction or poetry. Clicking on a dot reveals an excerpt from the literary work set in that place, presented in both its original Arabic and in English translation.

The accompanying timeline summarizes essential events in Palestinian history and allows you to read essays on how these historical moments influenced and were shaped by key works of Palestinian literature. Additionally, you can look at an overall network visualization; a variety of visualizations of biographies, historical events, publishing histories, and publishing networks; and audio interviews with key current and recent Palestinian literary figures.

The project is the work of Prof. Dr. Refqa Abu-Remaileh and her team at Freie Universität Berlin. It grew not from a desire to create a simple database, but rather from a potent intellectual argument. The project contends that for a people so often defined by exile and displacement, literature itself has created a “country”—a homeland of memory, imagination, and shared experience that transcends physical borders. This atlas makes that homeland visible.

As a librarian, I see this as a powerful tool for teaching and research. It allows students to literally see the global reach of the Palestinian experience. For scholars, it is a dynamic data visualization that can spark new questions about place, identity, and literary networks. It is a beautiful, poignant, and profoundly human entry point into a rich literary tradition. It invites you to wander through this country of words and discover the stories that connect a people, wherever they may be.

To dive deeper into the literary world mapped by the project, here are a few key works from the UT Libraries’ collections that speak to the themes of place, exile, and memory:

  • غسان الكنفاني، الأثار الكاملة. The complete collection of a foundational writer of modern Palestinian literature, Ghassan Kanafani. Included is his novella, Men in the Sun, about Palestinian men seeking to cross a border in a water tanker. It is a searing allegory of the search for life and dignity in the face of statelessness.
  • After the Last Sky: Palestinian Lives by Edward Said. A landmark book of essays and reflections paired with photographs by Jean Mohr. Said, a major figure in postcolonial studies, meditates on the nature of Palestinian identity in exile.
  • Enter Ghost by Isabella Hammad. Isabella Hammad’s second novel centers on Sonia, an actress who journeys back to Palestine and joins a production of Hamlet in the West Bank. Enter Ghost offers a vivid portrait of contemporary Palestine and explores themes of exile, belonging, and the deep bonds formed through family and collective struggle.