Read, Hot, and Digitized: Adventures in Data-Sitting

Read, hot & digitized: Librarians and the digital scholarship they love — In this series, librarians from the UT Libraries Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship. Our hope is that these monthly reviews will inspire critical reflection of, and future creative contributions to, the growing fields of digital scholarship.

It will come as no surprise that I, the English Literature Librarian, was a nerdy little bookworm as a child. I actively participated in the Book It! reading program, a literacy initiative sponsored by Pizza Hut. The premise of Book It! was simple: After completing five books and getting the sign-off from my teacher, I would “earn” a coupon for a personal pan pizza. When I was in 5th grade, I read enough Baby-Sitters Club (BSC) books in a single week to earn three pizzas. I felt a tinge of guilt because I had skipped early chapters in each book where the text was reused, word-for-word, from previous books in the series. It was always Chapter 2!

Every devoted Baby-Sitters Club fan knows the text was reused to introduce the characters and the premise of the series. There were over 200 books published in the span of 13 years – of course some of it would be repetitive! But let’s take it a step further. What if we could quantifiably demonstrate the reuse of Chapter 2 text, while also comparing stylistic and narrative changes across multiple ghostwriters and cultural trends? And how would you do this kind of analysis of 200+ novels, spin-offs, and graphic novel adaptations? Well, a feminist collective of scholars called the Data-Sitters Club (DSC) is attempting to do just that. 

Cover art for the Data-Sitters Club, by artist Claire Chenette

The Data-Sitters Club describe their project as “a fun way to learn about computational text analysis for digital humanities”. They created a corpus of Ann M. Martin’s influential young adult series and have analyzed it using a variety of DH methods and tools (Python, R, TEI, Voyant, just to name a few). The Baby-Sitters Club has had a long pop culture shelf-life for Gen X and Millennial readers, with the recent Netflix reboot (which was sadly canceled after two seasons) and the podcasts Stuck in Stonybrook and the Baby-Sitters Club Club. According to the publisher Scholastic, the series has been in print since 1986 and has sold more than 190 million copies. Given the series’ immense popularity and continued pop culture influence, the books are a gold mine for researchers interested in gender, race, class, and sexuality, but, like much of girl culture, the books haven’t been the subject of serious research.

So the Data-Sitters Club saw opportunity for new research, while also making DH more accessible, especially to women and other marginalized groups often sidelined in DH projects. The DSC does this through a series of 16 blog posts on their GitHub site, written to mimic the narrative style of the book series, including titles that riff off the originals. Each blog post covers a use case for the BSC corpus and features a different tool, coding language or technique. Two of my favorites are DSC #2: Katia and the Phantom Corpus and DSC #5: The DSC and the Impossible TEI Quandaries. (A running joke throughout the blog is that later posts refer the reader back to “Chapter 2” to explain the corpus and how it was created, an intentional reference to the Chapter 2 in the original series that reused text to explain the series’ premise.)

Cover art for DSC #2: Katia and the Phantom Corpus, which parodies an original Baby-Sitters Club book cover that I’m pretty sure I read in 3rd or 4th grade. Image courtesy of the Data-Sitters Club

One thing you won’t find on the DSC GitHub site is the corpus itself. The team scanned print books to create a legal corpus, but as of right now, it’s not available publicly online. The DSC has used the project as an advocacy tool to promote the loosening of ebook copyright restrictions to build literary corpra for private research. In partnership with the non-profit Authors Alliance, they wrote to the Librarian of Congress asking for exemptions to the Digital Millennium Copyright Act of 1998 to access the full BSC corpus. Of all the DSC blog posts, I found DSC #7: The DSC and the Mean Copyright Law to be the most fascinating – and frustrating.

I would recommend the Data-Sitters Club blog to any emerging DH scholar or librarian looking to try a new tool or method. Much of the content is highly technical, but the fun, approachable tone of each blog post makes the content accessible. I hope they are able to get legal access to the full ebook corpus so we can see more research on the Baby-Sitters Club books and better understand their cultural impact on a generation of women and girls.

You can find print copies of the original Baby-Sitters Club series in the PCL Youth Collection, and I highly recommend the recent essay collection We Are the Baby-Sitters Club: Essays and Artwork from Grown-up Readers, available at the PCL.