Machine Learning Meets the Sanborn Maps
In the digital age, historical maps hold a wealth of information, but unlocking their full potential for geospatial analysis and historic research often requires labor-intensive georeferencing. An innovative project the University of Texas Libraries is evolving this process through the power of machine learning.
The Libraries boast a vast cartographic collection in the Perry-Castañeda Library Maps Collection including thousands of items that have been scanned for online and digital use, yet only a fraction of them are georeferenced, hampering their utility for scholars and researchers. Recognizing the immense challenge of manually georeferencing tens of thousands of maps, the Libraries have turned to cutting-edge technology to automate this arduous task.
Georeferencing – the process of assigning geographic coordinates to a map image – is essential for accurately situating maps on the Earth’s surface within GIS (Geographic Information System) software. Traditionally, this has been painstaking manual work, but the emergence of machine learning offers a promising alternative.
Enter the proof-of-concept project spearheaded by geospatial and data specialists at the Libraries, which focuses on automating the georeferencing of Sanborn Fire Insurance maps–a pivotal component of their collection. Sanborn maps provide invaluable insights into urban development and infrastructure from the late 19th and early 20th centuries.
To tackle this ambitious undertaking, the project team developed a custom annotation tool to identify street intersections on a small subset of maps from the collection. Leveraging object detection models trained on machine learning algorithms, the tool automatically detects these intersections, streamlining the georeferencing process.
Optical character recognition (OCR) technology is then employed to extract street labels associated with the intersections identified by the object detection model. This data is then cross-referenced with a modern street intersection dataset derived from OpenStreetMap, enabling the precise georeferencing of the historical maps.
Remarkably, the automated process has already achieved a significant milestone, successfully georeferencing 14% of the Sanborn maps with a level of accuracy comparable to manual methods. This initial success paves the way for scaling up the project to encompass the entire collection of Sanborn Fire Insurance maps, as well as extending the approach to other map collections in the future.
Looking ahead, the project team is ideating enhancements of the process and further refining its accuracy. Continuous refinement of the machine learning models, improvements to the OCR process for reading street labels, and collaboration with other experts in the field are just a few avenues being explored to optimize the georeferencing workflow.
In an era where data-driven insights are increasingly shaping our understanding of the past, initiatives like the Libraries’ machine learning project offer a glimpse into the transformative potential of technology in historical research. By harnessing the power of machine learning, the Libraries are discovering ways to unlock the spatial dimensions of history and illuminate new pathways for scholarship and discovery.
Read a research article about the project at: https://www.tandfonline.com/doi/full/10.1080/15420353.2025.2462737