Navigating the Data Landscape: An Open Source Workflow

Recent years have witnessed explosive growth in the volume of research publications (Hanson et al., 2024). In order to maintain the basic tenets of scholarship, stakeholders such as funders and publishers are increasingly introducing policies to promote research best practices. For example, the 2022 Nelson Memo directed federal agencies that dispense at least $100m in research funding to revise policies around making the outputs of federally funded research available. Concurrent with the evolution of these policies, research institutions are innovating and developing the necessary infrastructure to support researchers, for which the libraries are an essential component.

These stakeholders and various subgroups within them have a range of interests in tracking the publishing of research outputs. In order to make data-driven decisions around what services we provide in the libraries and how we provide them, we need data about our research community. There is a long history of tracking publication of articles and books, and the infrastructure for doing so is relatively well-developed (e.g., Web of Science, Scopus, Google Scholar). In this regard, we are well-positioned to continue monitoring these outputs in line with the new stipulations for immediate public access in the Nelson Memo. However, the Nelson Memo also stipulated that the research data supporting publications need to be shared publicly. Compared to open access publishing, open sharing of data is less developed culturally and structurally, which makes it all the more important to develop a workflow to begin to gather data on this front.

Predictably, the infrastructure for tracking the sharing of data is not nearly as well-developed as that for articles or books. While some of this is likely due to the relative lack of emphasis on data publishing, there are a variety of reasons why tracking data isn’t quite as easy for motivated parties. Journals, in spite of wide-ranging aesthetic and syntax standards, have relatively uniform metadata standards. In large part, this is because of the homogeneity of their products, across disciplines, which are primarily peer-reviewed research articles that are typeset into PDFs. This allows proprietary solutions like Web of Science and Scopus to harvest vast amounts of metadata (through CrossRef) and to make it available in a readily usable format with relatively little work required to format, clean, or transform. In contrast, research data are published in a wide variety of formats, ranging from loosely structured text-based documents like letters or transcripts to objects with complex or structured formatting like geospatial data and genomic data. As a result, there can be significant differences between platforms that host and publish research data, ranging from general to discipline-specific metadata and file support, level of detail in author information, use of persistent identifiers like DOIs, and curation and quality assurance measures (or lack thereof).

Horizontal bar chart comparing the frequency of different name permutations of UT Austin that were entered in UT Austin datasets. A total of eight different permutations were detected, ranging from 'University of Texas at Austin' to 'UT Austin.' The most common is to use 'at Austin' rather than some form of punctuation like a comma or hyphen instead of 'at.'
Comparison of annual volume of dataset publications. ‘All’ refers to the volume across all discovered repositories and is compared to our institutional repository, the Texas Data Repository, and two common generalists, Dryad and Zenodo.

While a few proprietary solutions are beginning to emerge that purport to be able to track institutional research data outputs (e.g., Web of Science), these products have notable shortcomings, including significant cost, difficulty assessing thoroughness of retrieval, and limited number of retrievals. In order to create a more sustainable and transparent solution, the Research Data Services team has developed a Python-based workflow that uses a number of publicly accessible APIs for data repositories and DOI registries. The code for running this workflow has been publicly shared through the UT Libraries GitHub at https://github.com/utlibraries/research-data-discovery so that others can also utilize this open approach to gathering information about research data outputs from user-defined institutions; the code will continue to be maintained and expanded to improve coverage and accuracy. To date, the workflow has identified more than 3,000 dataset publications by UT Austin researchers across nearly 70 different platforms, ranging from generalist repositories that accept any form of data like Dryad, figshare, and Zenodo to highly specialized repositories like the Digital Rocks Portal (for visualizing porous microstructures), DesignSafe (for natural hazards), and PhysioNet (for physiological signal data).

Horizontal bar chart comparing the total number of UT-Austin-affiliated datasets published in different repositories. Only repositories with at least 30 datasets are individually listed; the remainder are grouped into an 'Other' category. The Texas Data Repository has the most discovered datasets (nearly 1,250), followed by Dryad, Zenodo, Harvard Dataverse, the aggregated 'other', ICPSR, figshare, DesignSafe, Mendeley Data, the Digital Rocks Portal, and EMSL. No repository other than the Texas Data Repository has more than 400 datasets.
Comparison of total number of dataset publications between repositories. Only repositories with more than 30 UT-affiliated publications are depicted individually; all others are grouped into ‘Other.’

This work is still very much in progress. Perhaps equally important to the data that we were able to obtain are the data we suspect exist, but were unable to retrieve via our workflow (e.g., we didn’t retrieve any UT-affiliated datasets from the Qualitative Data Repository, even though we are an institutional member), as well as the variation in metadata schemas, cross-walks, and quality, which can help to inform our strategies around providing guidance on the importance of high-quality metadata. For example, this process relies on proper affiliation metadata being recorded and cross-walked to DataCite. Some repositories simply don’t record or cross-walk any affiliation metadata, making it essentially impossible to identify which, if any, of their deposits are UT-affiliated. Others record the affiliation in a field that isn’t the actual affiliation field (e.g., in the same field as the author name); some even recorded the affiliation as an author. All of this is on top of the complexity introduced by the multiple ways in which researchers record their university affiliation (UT Austin, University of Texas at Austin, the University of Texas at Austin, etc.)

Horizontal bar chart comparing the frequency of different name permutations of UT Austin that were entered in UT Austin datasets. A total of eight different permutations were detected, ranging from 'University of Texas at Austin' to 'UT Austin.' The most common is to use 'at Austin' rather than some form of punctuation like a comma or hyphen instead of 'at.'
Comparison of the frequency of different permutations of ‘UT Austin’ that were entered as affiliation metadata in discovered datasets.

We also have to account for variation in the granularity of objects, particularly those that receive a PID. For example, in our Texas Data Repository (TDR), which is built on Dataverse software, both a dataset and each of its constituent files receives a unique DOI – each file is also recorded as a ‘dataset’ because the metadata schema used by the DOI minter, DataCite, doesn’t currently support a ‘file’ resource type. We thus have to account for a raw data output that will initially inflate the number of datasets in TDR by at least two orders of magnitude. The inverse of this is Zenodo, which assigns a parent DOI that always resolves to the most recent version, with each version of an object getting its own DOI (so all Zenodo deposits have at least two DOIs, even if they are never updated).

The custom open source solution that we have developed using Python, one of the most common software languages (per GitHub), offers the flexibility to overcome the challenges posed by differences between data repositories and variations in the metadata provided by researchers. Our approach also avoids the shortcomings of proprietary solutions as it offers transparency so that users can understand exactly how dataset information is retrieved, and it is available at no cost to anyone who might want to use it. In many ways, this workflow embodies the best practices that we encourage researchers to adopt – open, freely available, transparent processes. It also allows others (at UT or beyond) to adopt our workflow, and if necessary, to adapt it for their own purposes.

Spanish Paleography + Digital Humanities Institute Focuses Research on Colonial Texts

Scholars and graduate students from institutions across the country gathered at the Benson Latin American Collection for the Spanish Paleography + Digital Humanities Institute. The immersive three-day program provided intensive training in reading and transcribing Spanish manuscripts from the 16th to 18th centuries while introducing participants to digital humanities tools that enhance historical research.

Funded by  LLILAS’s U.S. Department of Education’s Title VI Program and the Excellence Fund for Technology and Development in Latin America, the institute sought to equip researchers with specialized skills to navigate colonial texts, visualize historical data, and foster a collaborative academic community. The event was spearheaded by LLILAS Benson Digital Scholarship Coordinator Albert A. Palacios, and brought together a cohort of graduate students and faculty members specializing in history, literature, linguistics, and related disciplines.

The institute focused on three key objectives: providing paleography training, introducing participants to digital humanities tools, and fostering a collaborative research network. Participants engaged in hands-on workshops to develop their ability to accurately read and transcribe colonial manuscripts. They also received instruction on open-source technologies for text extraction, geospatial analysis, and network visualization. The program fostered a community of scholars who will continue sharing insights and resources beyond the institute.

Participants had the opportunity to work with historical materials, including royal documents, inquisition records, religious texts, and economic transactions. Case studies were examined through paleography working groups, where scholars collaboratively deciphered difficult handwriting styles and abbreviations.

To apply their newly acquired digital humanities skills, each participant developed a pilot research project using Spanish colonial manuscripts. These projects utilized handwritten text recognition (HTR) technology, geographical text analysis, and data visualization tools to enhance historical inquiry. The final day of the institute featured a lightning round of presentations, allowing scholars to showcase their preliminary findings and discuss future applications.

This year’s participants hailed from universities across the U.S., including the University of Chicago, the University of North Texas, Columbia University, the University of Texas at El Paso, the University of California-Santa Barbara, Purdue University, City College of New York, West Liberty University, Oklahoma State University, and the University of California-Merced. The interdisciplinary nature of the group enriched discussions, providing diverse perspectives on archival research and manuscript interpretation.

A highlight of the institute was the introduction and use of the handwritten text recognition (HTR) model the LLILAS Benson Digital Scholarship Office trained and recently launched on 17th and 18th century Spanish handwriting preserved at the Benson. This innovation is expected to significantly accelerate the study of colonial-era documents and democratize access to these historical resources.

Additionally, the program provided a comprehensive list of recommended paleography resources, including books, digital collections, and online tools to support continued scholarship in Spanish manuscript studies.

Palacios is leading an online Spanish version of the institute for participants worldwide this spring and fall. He will be leading another onsite institute June 4-6, 2025.The demand for the LLILAS Benson Spanish Paleography + Digital Humanities Institute in the Colonial Latin Americanist field underscores the growing interest in merging traditional archival research with computational methodologies. By equipping scholars with both paleographic expertise and digital tools, the institute is paving the way for innovative research on the Spanish Empire and its historical records.

Transforming Text: A Year of the Scan Tech Studio

The Scan Tech Studio (STS), located in the new PCL Scholars Lab, is a self-service facility designed to empower scholars and researchers in digitization, image processing, and text analysis projects. Equipped with advanced scanning equipment and software, the STS allows the UT community to independently digitize materials, apply optical character recognition (OCR) and handwritten text recognition (HTR), and engage in digital text analysis. From helping patrons scan historical documents to applying machine-readable techniques to modern texts, the STS has had an exciting first year guiding users in elevating their research.

The team behind this effort is the Scan Tech Studio Working Group, composed of seven librarians and digitization experts dedicated to helping scholars maximize the studio’s resources. We’re also grateful for the support of UT Libraries IT and the Scholars Lab Graduate Research Assistants, who keep everything running smoothly behind the scenes. The working group develops workshops, creates research guides, and promotes the use of digital scholarship tools related to OCR, HTR, and text analysis. Additionally, we offer guidance on copyright considerations and assist users in navigating the complexities of text recognition and analysis. Over the past year, the STS Working Group has been instrumental in fostering a dynamic learning environment within the Scholars Lab and building campus-wide connections to unlock the studio’s potential.

The working group has been dedicated to developing services that meet the evolving needs of the campus community. So far, our primary focus has been providing consultations and instruction related to digitization, OCR/HTR, and text analysis. With the diverse expertise of our team, we’ve been able to offer tailored, one-on-one consultations and small group sessions that help users think through the various stages of their digital projects, from planning to execution. Scheduling time with STS experts is simple through our user-friendly request form, ensuring patrons have easy access to specialized support.

Overall, we received 18 reservation requests, which meant that users had a consultation with one of the STS Working Group members, needed the space for digitization, and/or used our digital tool to OCR their materials. Many of these requests came from graduate students, specifically from the Department of History and the School of Information.

In addition to consultations, we’ve developed instructional tools such as a comprehensive research guide on research data management and the use of the studio’s equipment and software. The STS has also become a valuable teaching space, regularly hosting classes that integrate the studio’s technology into their curriculum, allowing students hands-on experience with advanced digitization tools and methods.

Reflecting on the past year, the STS has hosted several workshops inside and outside the studio to showcase its tools and demonstrate the possibilities to the campus community. For example, STS team members led workshops at this past summer’s Digital Scholarship Pedagogy Institute, focusing on digitization, OCR, and text analysis. Additionally, we contributed to the Digital Humanities Workshop Series, providing training in these specialized areas. 

It’s also worth noting that the working group dedicates time to internal development by hosting workshops for ourselves, allowing us to learn from one another and build up our collective skillset. As the saying goes, the best way to learn is to teach—and we’ve embraced this approach to better serve our users!

Due to the Scan Tech Studio being a new service, we wanted to partner with existing programs and reach out to various centers. We invited and provided an overview of our services to different centers around campus, such as JapanLab and the Center for Middle Eastern Studies. This gave us great insight into the needs around campus regarding digitization and OCR.

Additionally, we provided training in using specialized OCR tools such as Abbyy FineReader, a paid program under Adobe that is exclusively available at STS. It works exceptionally well for accurately OCRing text and training. We had about 36 uses in just our first year in the space.

As we continue to see the success of our space, we are planning to expand our services and tools. We aim to create additional resources covering various OCR tools and processes. We also plan to continue to collaborate with the Digital Humanities Workshop series to present different OCR and text analysis tools. Additionally, we intend to develop workshops tailored to researchers, including pre-research and post-research workshops. These workshops will help researchers understand what they need to do when conducting their research to ensure a successful OCR experience and facilitate the beginning of text analysis upon their return. We look forward to seeing how the groundwork we laid during the first year will impact our service in the upcoming year.

As you can see, we have a lot of promising plans to build off the Scan Tech Studio’s successful first year. We look forward to continuing to grow the space as a new hub for digitization and text analysis on campus. Scan you feel the excitement? 

Illuminating Explorations: Music and Childhood Culture

Hannah Neuhauser, 2025 PhD in Musicology, Butler School of Music


“Illuminating Explorations” – This series of digital exhibits is designed to promote and celebrate UT Libraries collections in small-scale form. The exhibits will highlight unique materials to elevate awareness of a broad range of content. “Illuminating Explorations” will be created and released over time, with the intent of encouraging use of featured and related items, both digital and analog, in support of new inquiries, discoveries, enjoyment and further exploration.

Music is a portal and can unlock a door to a fantastical sonic landscape, brimmed with mystic, melodic magic. We turn a page and open ourselves to discovering an entirely different realm, full of magic and mystery. Dangers may lurk around each corner, giants may want to gobble us up for lunch, and at times, the path may be so utterly twisted that we almost lose ourselves. Suddenly, the darkness becomes light, and in silence, we find ourselves back in the safety of our childhood bedrooms. The lion’s roar – a radiator. The pitter pattering of tiny Wild Things – the rain outside. Yet within a small, singular space, we traveled to another world and returned on the other side changed. In Throw the Book Away (2013), Anne Doughty remarks that regardless of how much a child reads, it is the experience of self-reliance and youthful agency that will ensure a protagonist’s survival in an unknown labyrinth. They must hear the warnings, read the signs, and act on their own. 

Cover image of the score book Songs and a Sea Interlude by Oliver Knussen and Maurice Sendak, from the opera Where the Wild Things Are.

This exhibit recreates these aural portals. However, instead of reading a book, we invite you to immerse yourself in the experience of children’s music. The Music and Childhood Culture Spotlight Exhibit seeks to inform scholars of the rich history of children’s music by highlighting hidden gems from the UT Austin library collections. Did you know A.A. Milne commissioned his own songbook for Winnie the Pooh in 1929? Or that Carole King wrote a children’s television special called Really Rosie in 1975? It was a huge hit and we have the score, which you can check out to sing to your younger friends! Selections also range from audio recordings like Danny Kaye’s narration of Tubby the Tuba (1947) to Oliver Knussen’s operatic score of Where the Wild Things Are (1982) and a wealth of interdisciplinary scholarship from Mozart’s influence on childhood labor (Mueller) to the rise of Young People’s Records (Bonner). 

Cover image of the score book Really Rosie by Carole King

Music is a psychological tool to study emotional regulation “without rules or limitations, it is pure assimilation” and media can stimulate fantasy for children to pretend “as if” they are something else (Gotz et, all, 2005 p.13). Numerous scholars discuss the sentimentality and destruction of child development due to media dependency, but children will always make their own ideas of media to understand, transgress, rebel, and connect with their surroundings (Parry, 2013). Here, in this exhibit, we seek to highlight the positive attributes of musical media that allow children (and our inner child) to enact their own creative cultures through their imaginations and identify the “traces” of media that we value. 

Cover image of the book Mozart and the Mediation of Childhood by Adeline Mueller

I hope you enjoy these discoveries as much as I did. 


Works Cited 

  • Doughty, Amie A. Throw the Book Away: Reading versus Experience in Children’s Fantasy. Jefferson, North Carolina: McFarland & Company, Inc., Publishers, 2013. 
  • Gotz, Maya, Dafna Lemish, Hyesung Moon, and Amy Aidman. Media and the Make-Believe Worlds of Children. Routledge, 2005. 
  • Parry, Becky. Children, Film, and Literacy. London: Palgave Macmillan, 2013.

Haricombe Establishes Distinguished Lecture Fund

The University of Texas Libraries is happy to announce the establishment of the Lorraine J. Haricombe Distinguished Lecture Fund, adding to the transformative legacy of Vice Provost and Director Lorraine J. Haricombe, who will retire later this year.  This parting generosity is emblematic of a leader who has exemplified the values of curiosity, diversity, and community engagement throughout her extraordinary tenure.

The Distinguished Lecture Fund will support an annual speaker series, bringing renowned experts to campus to inspire critical thinking, enrich dialogue, and showcase the breadth of UT Libraries’ resources. Funds distributed from the endowment will provide programming support, including speaker stipends, travel expenses, event logistics, and other related costs.

This initiative reflects Lorraine Haricombe’s vision for fostering intellectual exploration and building connections across disciplines. Her commitment to innovation and education will continue through this fund to uplift students, faculty, and the broader Austin community for years to come.

A contribution to the Lorraine J. Haricombe Distinguished Lecture Fund ensures that her legacy of inspiration lives on, creating opportunities for impactful conversations and the exchange of ideas well into the future.


If you are interested in supporting this fund, visit the UT Libraries Giving Page to make your contribution today.

Ernesto Cardenal’s Centennial

The papers of Nicaragua’s beloved poet-priest-politician reside at UT’s Benson Latin American Collection; January 20, 2025, is the centennial of his birth


Admired and controversial, Ernesto Cardenal was a towering figure in Central American culture and politics. As Nicaragua’s minister of culture under the Sandinista government, which took power in 1979, he oversaw a national program that taught poetry to Nicaraguans of all ages and all walks of life. 

A black-and-white photo shows many children in the background, facing the camera under a grassy roof open-air structure. In the foreground, several men are seated on a wooden floor. A microphone is being held above them to the left. The man on the left at the front of the photo is Ernesto Cardenal, wearing a black beret, wire-rimmed glasses and a simple collarless white shirt. He has shoulder-length white hair, and a full white beard and mustache. Another man to the right of him is speaking to an interviewer whose face is not visible.
Ernesto Cardenal (left of center) as Minister of Culture in Nicaragua. Undated photo, Benson Latin American Collection.

His relationship with the Sandinista government would eventually sour. As a result, the safety of his literary archive was in peril, leading to its eventual acquisition by the Benson in 2016.

In honor of Cardenal’s centennial, we link to previously published writings by UT Austin faculty and staff that examine various aspects of his life.


Ernesto Cardenal Papers


Ernesto Cardenal stands in profile in a black-and-white photo, on the left, at the stern of a small boat. He wears a simple white shirt and light-colored pants, a band around his forehead, glasses. He is holding a white net in his hand. He has shoulder-length white hair, beard, and mustache. The boat has the words San Juan de la Cruz painted on it, with the word Cruz symbolized by a cross.
Ernesto Cardenal, photo by Sandra Eleta

“The archive features rare editions of Cardenal’s writings, translations of his poetry, interviews, photographs, videos, newspaper clippings, documentaries about his life and work, and hundreds of letters to and from key protagonists of Nicaraguan culture and politics.”

Read more: Papers of Nicaraguan Luminary Find a Home at the Benson Latin American Collection

Ernesto Cardenal Papers on Texas Archival Resources Online


Cardenal at LLILAS Benson


Color photo of Ernesto Cardenal at age 91, reading his poetry at the Benson. He is wearing his black beret, a dark jacket, wire-rimmed glasses, and his hair is white, covering his ears. He holds a piece of paper in one hand and gestures with the other.
Cardenal reads his poetry to a packed house at the Benson. Photo: Travis Willmann.

The opening of the Ernesto Cardenal Papers is celebrated at a roundtable and bilingual poetry reading at the Benson. At the event, Cardenal reads his own poetry, which is passionately interpreted into English by poet Celeste Mendoza.

Watch video (poetry reading starts at one-hour mark): “Ernesto Cardenal in Word and Action” Reading and Roundtable

Cardenal in Hard Times


LP cover for Cardenal's libro-disco recording of Oración por Marilyn Monroe and other poems. The cover features Andy Warhol's alterations of Monroe's photo (or a copy thereof), in color, four in a square.
Warhol-inspired libro-disco cover. Caracas, 1972. Benson Latin American Collection.

“[T]he voice of Ernesto Cardenal broke with our routine of studying a limited range of literary texts, mostly focused on intimate, politically inoffensive themes,” writes Professor Luis Cárcamo-Huechante. “In the midst of times of censorship and coercion, it was Cardenal’s verses that awoke me to an unexpectedly revelatory linkage between poetry and social issues, literary writing and collective history.”

Read Cárcamo-Huechante’s essay in English or Spanish

Interview in Managua and Digital Exhibition


Ernesto Cardenal, his arms aloft and outstretched, is saying mass in this black-and-white photo. He wears a poncho. One the table in front of him is a metal wine cup. He has shoulder-length whitish hair, beard, and mustache, and dark-rimmed glasses. Behind him hangs a white sheet illustrated with drawings.
Saying mass. Ernesto Cardenal Papers, Benson Latin American Collection

In spring 2016, José Montelongo, former Benson librarian, visited Cardenal in Managua. The occasion was the Benson’s recent acquisition of Father Cardenal’s personal papers. In these excerpts from their conversation, Cardenal talks about poetry, science, and religion, about the famous poetry workshops he helped create, about the successes and failures of the Nicaraguan Revolution, and more.

Watch the video (in Spanish with English subtitles)

The digital exhibition “Remembering Ernesto Cardenal: Selections from His Archive,” organized by Latin American Archivist Dylan Joy, traces key moments in the life of the poet, priest, revolutionary, liberation theologist, sculptor, and activist.

Visit the digital exhibition

Hasta siempre . . .


Black-and-white close-up of an older Ernesto Cardenal, who is looking directly into the camera. His black beret is visible. He wears wire-rimmed glasses. His white hair is in bright relief with a black background.
Ernesto Cardenal, undated photo. Benson Latin American Collection.

“Ernesto Cardenal was a fighter: for justice, against dictatorship, for equality, for his faith, and for the power of art and beauty to shine light in a dark world. He was tireless in this lifelong struggle, striving until his final days for a better Nicaragua and true justice for all people. LLILAS Benson is proud to help to carry on his legacy.”

Virginia Garrard, Professor Emerita of History; former director, LLILAS Benson

Read the Obituary: “Ernesto Cardenal Is Dead at 95: The Nicaraguan Poet, Priest, and Revolutionary Chose the Benson Collection for His Archive”

Support Knowledge, Inspire Futures: Year-End Giving to UT Libraries

As the calendar readies its turn toward a new year, it’s the perfect moment to reflect on the causes that inspire us—those that ignite curiosity, foster innovation, and unite communities. The University of Texas Libraries stand as a pillar of these values, shaping lives and driving academic excellence.

Your year-end gift to the Libraries is more than a donation; it’s a profound investment in education, discovery, and the transformative power of knowledge. Contributions help sustain vital resources, from state-of-the-art technology to groundbreaking collections, ensuring students and scholars can achieve their potential.

Why Give?
Giving is more than generosity; it’s about creating meaningful impact. A tax-deductible year-end gift to UT Libraries aligns your philanthropic vision with your financial goals while making a lasting difference.

This year, there’s a unique opportunity to honor a leader who has exemplified these values. As UT Libraries prepares for the retirement of Director Lorraine J. Haricombe in 2025, we celebrate her extraordinary legacy with the creation of the Lorraine J. Haricombe Distinguished Lecture Fund.

A Legacy of Inspiration
The Distinguished Lecture Fund will support an annual speaker series, bringing renowned experts to campus to inspire critical thinking, enrich dialogue, and showcase the breadth of UT Libraries’ resources. This enduring initiative reflects Lorraine Haricombe’s commitment to curiosity, diversity, and community engagement.

Your contribution to this fund ensures that her vision continues to uplift students, faculty, and the broader Austin community for years to come.

Be Part of Something Bigger
Join us in supporting a legacy of learning and discovery. Whether you’re reflecting on the libraries’ impact on your life or investing in a brighter future for others, your year-end gift can help UT Libraries thrive.

Visit the UT Libraries Giving Page to make your contribution today. Together, we can honor a visionary leader and champion a future where knowledge knows no bounds.

Read, Hot and Digitized: The Library of Lost Books

Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship.  Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.


In 2019 I have written about Footprints, a project that aims to track the circulation of printed ‘Jewish books’ around the world as it is evidenced through provenance research.[1] The Library of Lost Books is a “citizen science” (aka crowd sourcing) international project that similarly to Footprints, aims to trace books looted by the Nazis from the library of the Higher Institute for Jewish Studies in Berlin (Hochschule für die Wissenschaft des Judentums) and track their journey around the world through provenance data. Operating from 1872 until it was closed down by the Nazis in 1942, the Higher Institute was dedicated to the study of Jewish history and culture, as well as rabbinical studies in Liberal Judaism.[2] The original library is reported to have around 60,000 volumes, but only around 5,000 of them have been rediscovered since the war. The Leo Baeck Institute for the Study of German-Jewish History and Culture in Jerusalem, with support from the German Foundation for Remembrance, Responsibility and Future (EVZ) and the German Ministry of Finance (BMF), has created this platform not only for the purpose of tracing the lost books, but also to commemorate and educate about the Higher Institute for Jewish Studies in Berlin, its scholars, and its students.

The Library of Lost Books website features a database, an interactive map, and an online exhibition detailing the history and legacy of the Higher Institute. A great place to begin is with “The story of the missing books” and its three chapters (with optional narration). Chapter one gives the background story of the Higher Institute, its founders, and its academic landscape. The Higher Institute represented a modern current within Judaism, namely Liberal Judaism, that emphasizes universal values — personal freedom, individuality, and social responsibility. Chapter two describes the last years of the Higher Institute before it was shut down. It shows a chronology of systematic discrimination according to the Nazi ideology, explains why and how the books were moved out of the library, and brings detailed biographies of students and employees, some of whom kept working at the library as forced laborers until their deportation to concentration camps, while others managed to escape. This chapter also details the rescue efforts of part of the books before 1942. Chapter three details the fate of the books after the war; it explains how they were transferred to [nowadays] the Czech Republic, or were scattered in various libraries in Berlin. Yet, some books were relocated to the United States or Russia.  

The project owners are using “citizen science” (aka crowd sourcing) in their quest to identify the missing books. Users are encouraged to look at books in their local libraries or second-hand bookstores, cross-reference their findings with the virtual library, and share their findings through the platform, using a “lost and found” form. In order to support this provenance research, the project offers ‘hunting supplies’, including a detailed checklist that includes stamps, accession numbers, call numbers, and paper labels that might be found in or on the books. The goal is to virtually reunite the found books through the Library of Lost Books; the project owners specifically state that “physical copies will remain in their places where they were discovered, as that is also a part of their story.”[3]

The platform is also a teaching tool for educators. It includes learning units for students about pre-war Jewish life in Berlin, Nazi looting practices, provenance research, the importance of cultural heritage, and the roll of libraries in the pre-internet era. Besides English, all content on the platform has versions in German and Czech, as it is suspected that most of the books ended up in either Germany or the Czech Republic (although some were already located also in the United States, United Kingdom, and Israel).

From the database view, one could browse the few thousands of books that were already identified so far. Browsing options are by book titles/authors/publication year, by owners (past and present), and by individuals or institutions that found the books. Clicking on a book entry, one gets full tracing information: gallery of associated images, supportive provenance evidence, and a timeline of related events, showing how the book traveled through time and various owners. See for example the eventful life of the title Zekhor le-Avraham : sheʼelot u-teshuvot – through interactive map. This book was published in 1837 in Istanbul, acquired by the Higher Institute (year unknown), looted in 1942, salvaged after the war, resurfaced in Jerusalem (year unknown), acquired by a private donor for UCLA in 1963, and currently held by UCLA. Similarly, thousands of books that were looted from the Higher Institute resurfaced years after the war and are now reunited again in the virtual library of lost books.   


Related resources:

Glickman, Mark, and Rachel Gould. Stolen Words : The Nazi Plunder of Jewish Books / Mark Glickman ; Designed by Rachel Gould. 1st ed. Philadelphia, [Pennsylvania: The Jewish Publication Society, 2016. Digital. https://search.lib.utexas.edu/permalink/01UTAU_INST/9e1640/alma991058395754406011

Rydell, Anders. The Book Thieves : The Nazi Looting of Europe’s Libraries and the Race to Return a Literary Inheritance / Anders Rydell ; Translated by Henning Koch. Trans. by Henning Koch. New York, New York: Viking, 2017. Print. https://search.lib.utexas.edu/permalink/01UTAU_INST/9e1640/alma991046059239706011

Peiss, Kathy Lee. Information Hunters : When Librarians, Soldiers, and Spies Banded Together in World War II Europe / Kathy Peiss. New York, New York: Oxford University Press, 2020. Digital. https://search.lib.utexas.edu/permalink/01UTAU_INST/9e1640/alma991058667683006011


[1] https://texlibris.lib.utexas.edu/2019/04/read-hot-and-digitized-footprints-the-chronotope-of-the-jewish-book/

[2] https://wienerholocaustlibrary.org/exhibition/the-library-of-lost-books/

[3] Citizen Science – Leo Baeck Institute

Building a Bot: An Exploration of AI to Assist Librarians

Recognizing the looming impact of artificial intelligence on the current landscape in higher education and libraries, the University of Texas Libraries has been experimenting with an AI-driven chatbot that could eventually augment library staff to ensure continuous availability of assistance at times when our staff are not available.

The project, which serves as a research initiative rather than a production service, explores the potential of artificial intelligence to enhance user experience.

This exploration has been spearheaded by Hannah Moutran, a recent graduate and Library Specialist, in coordination with Aaron Choate, the Director of Research & Strategy at the University of Texas Libraries.

Choate has been instrumental in organizing an AI interest group aimed at educating library, archives, and museum staff about the potential and applications of AI. Moutran, whose academic research has focused on AI implementations, was a perfect fit for leading the chatbot project, which emerged from discussions among the principals in the interest group.

The chatbot, envisioned as a backup for the “Ask a Librarian” service, is being tested to understand its capacity to provide uninterrupted assistance when human staff are unavailable. To build this system, Moutran analyzed chat logs from the fall 2022 semester, gaining insights into the types of questions users asked and the responses provided by librarians. This analysis revealed that users were often referred to other librarians, departments, or websites for more detailed information.

The chatbot’s development has been guided by a series of in-depth interviews with five librarians from the “Ask a Librarian” service. These interviews uncovered several design priorities for the AI system, including transparency about its nature and data usage, the accuracy of information provided, and alignment with the library’s mission of fostering human connections. Ethical considerations were front and center in the design, ensuring that users know when they are interacting with an AI rather than a human.

To address these concerns, the development team chose the Voiceflow platform and the Claude language model. This combination allows the chatbot to offer controlled responses by providing users with links to library resources rather than attempting to answer questions directly. The system also incorporates disclaimers, user memory, and predefined rules to ensure that the chatbot aligns with the library’s values and operates within ethical boundaries. It is a deliberate design choice meant to avoid the common pitfalls of AI-generated misinformation.

In addition to providing general library support, the team has integrated an AI flow specifically designed to assist users with research. This tool can help users brainstorm topic ideas, generate initial search links, and even provide citation and writing assistance. The AI can also connect users with specialized librarians based on the nature of their inquiry, giving a brief explanation of each librarian’s area of expertise to help guide users toward the most appropriate contact.

While the pilot project has shown promise, the developers are clear about the experimental nature of this tool. While it offers new ways to assist users, there is no immediate push to implement it in its current form. The focus is on exploration—understanding how AI might improve library services in the future, rather than rushing to deploy it as a finished product.

“This project isn’t about replacing human librarians,” explained Choate. “It’s about providing a tool that can fill gaps when staff aren’t available, allowing librarians to focus on more complex and human-centric tasks.”

One of the most significant aspects of this project is the freedom granted by the Libraries to experiment without the immediate pressure of launching a production service. This flexibility has allowed Moutran and Choate to focus on the chatbot’s ethical design and explore the long-term role AI might play in library services.

Presented by Moutran as a Capstone project for the School of Information, the chatbot has received valuable feedback from testers, highlighting the importance of trust and verification of AI-generated information. And her work with the chatbot has provided valuable insights into conversational design, user experience research, and prompt engineering.

As the chatbot continues to develop, the project remains a research initiative with the potential to reshape how libraries use AI. With a focus on transparency, ethics, and user support, the chatbot may one day serve as a supplementary tool for both users and librarians, increasing access to library services and freeing staff to focus on more complex inquiries.

Access the prototype: UT Libraries Assistant Chatbot Demo

Benson Acquisition: Augusto Roa Bastos Papers

The Nettie Lee Benson Latin American Collection is thrilled to announce the acquisition of the literary archives of César Vallejo and Augusto Roa Bastos, two giants of Latin American letters. These archives augment the Benson’s already significant collection of materials that represent the region’s writers, thinkers, and intellectual leaders, making the library, and the UT campus, an invaluable resource for students, faculty, and researchers from all corners of the globe.

By MELISSA GUY and DANIEL ARBINO

Paraguay’s most significant writer, Augusto Roa Bastos (1917–2005) is known for his contributions to the Latin American Boom and the post-dictatorship novel, particularly through his works Hijo de hombre (1960) and Yo el Supremo (1974). The latter is a historical fiction of the José Gaspar Rodríguez de Francia dictatorship in the nineteenth century.

The book cover of a commemorative edition of "Yo, el Supremo" is a dark mustard-yellow and features a woodblock-print image of a black fist jutting across the page horizontally from the left. Falling from the hand is a large, bright-red drop of blood.
Cover of a commemorative edition of “Yo el Supremo,” published on the centennial of the author by the Real Academia Española.

Roa Bastos grew up in Iturbe, a provincial town where his father worked as an administrator on a sugar plantation. It was there that he was exposed to Guaraní, and developed a tremendous love for Paraguay’s most spoken Indigenous language. He later went to Asunción for his formative school years and, as a young man, served in the Chaco War as a medical auxiliary. Significant portions of his life were spent outside of Asunción, allowing the writer to have a deeper knowledge of the country at large.

Like the characters in Yo el Supremo, Roa Bastos was no stranger to the effects of dictatorship during his lifetime. In fact, he fled to Argentina in 1947 along with 500,000 other Paraguayans to escape the iron fist of President Higinio Morínigo. Roa Bastos would live over four decades in exile between Buenos Aires and Paris before returning to his homeland in 1989 after the fall of the Alfredo Stroessner dictatorship. However, his commitment to Paraguayan culture never wavered as his literary career, which produced short stories, novels, poetry, essays, screenplays, and children’s literature, demonstrated a commitment to the South American nation through his themes: collective memory, bilingualism (Guaraní/Spanish), and Indigeneity. His style, which pulled from magical realist and neobaroque tendencies, blended different time periods (pre-colonial and contemporary) to interrogate Paraguayan society.

Two yellowing sheets of paper side by side have a handwritten list with items numbered 1 through 42. The left-hand sheet is titled Índice (Index). Each line has a short title written in cursive by the author.
Handwritten index related to “Yo el Supremo.” Augusto Roa Bastos Papers, Benson Latin American Collection.

“In acquiring the literary archives of the great Paraguayan writer Augusto Roa Bastos, the Benson Latin American Collection is today blessed with the author’s handwritten notes relating to Yo el Supremo, one of the region’s most exorbitantly ambitious, baroquely virtuosic, groundbreaking novels of the late twentieth century,” writes Professor César A. Salgado of the Department of Spanish and Portuguese. “The novel was published in 1974 as part of an agreement among top Latin American Boom writers to produce ‘dictator novels’ that dissected authoritarian regimes in their respective countries. In a group that included Alejo Carpentier’s El recurso del método (1974), Gabriel García Márquez’s El otoño del patriarca (1975), and Carlos Fuentes’ Terra Nostra (1975), Yo el Supremo outdid these other immensely accomplished works by structuring its penetrating, thoroughly researched psychological portrait of José Gaspar Rodríguez de Francia (Paraguay’s undisputed ‘enlightened despot’ from 1811 to 1840) as it were a lively philosophical debate about how dictation, writing, literacy, orality (including Guaraní traditions), absolute power, and impermanence could be both complicit and antithetical to each other. If the ludological radicality of Rayuela galvanized the Boom in 1962, Yo el Supremo brought it to a close in 1974 by showing how deep-seated mechanisms of supremacist rule, set up at the start of nation formation in Latin America, could easily resurface across its history. With Yo el Supremo, Roa Bastos thus launched a fully postmodern critical and creative agenda for the region.”

A yellowing sheet of paper, heavily creased in the middle, bears both typed notes and handwritten notes. Some of the typed lines are scribbled over with wavy lines written in pen. The elegant handwritten part looks to be written with an ink pen.
Handwritten and typed notes titled “Themes for Paraguayan Stories,” Roa Bastos Papers. Courtesy Nettie Lee Benson Latin American Collection.

The Augusto Roa Bastos Papers is a versatile collection that spans the author’s career. It contains poetry, speeches, essays, correspondence, and manuscript drafts. Among the jewels of the collection are letters between the author and his daughter, Mirta Roa Mascheroni, and handwritten comments regarding Yo el Supremo and his novel Madama Sui. This collection provides researchers with profound insight into the writer’s life, particularly his time during exile, and his creative process from beginning to end. It pairs well with the Miguel Ángel Asturias Papers for similar topics regarding exile and the Boom.


The Roa Bastos acquisition was made possible in part by the Drs. Fernando Macías and Adriana Pacheco Benson Centennial Endowment.


Melissa Guy is director of the Nettie Lee Benson Latin American Collection.

Daniel Arbino is former Head of Collection Development for the Benson.

UT Libraries