Read, Hot & Digitized: Visualizing Wikipedia’s Gender Gap

Read, hot & digitized: Librarians and the digital scholarship they love — In this new series, librarians from UTL’s Arts, Humanities and Global Studies Engagement Team briefly present, explore and critique existing examples of digital scholarship.  Our hope is that these monthly reviews will inspire critical reflection of and future creative contributions to the growing fields of digital scholarship.

Wikipedia is a website that many of us use every day – yes, even us librarians! Wikipedia was founded with utopian ideals, with its democratic approach to content creation and always-free, open knowledge. Therefore, it seems like the ideal platform to address structural inequalities in our information systems that reflect and reinforce racism, misogyny, homophobia, and transphobia and combinations thereof.

However, Wikipedia has a long-standing problem of gender imbalance both in terms of article content and editor demographics. Only 18% of content across Wikimedia platforms are about women. The gaps on content covering non-binary and transgender individuals are even starker: less than 1% of editors identify as trans, and less than 1% of biographies cover trans or nonbinary individuals. When gender is combined with other factors, such as race, nationality, or ethnicity, the numbers get even lower. This gender inequity has long been covered in the scholarly literature via editor surveys and analysis of article content (Hill and Shaw, 2013; Graells-Garrido, Lalmas, and Menczer, 2015; Bear and Collier, 2016; Wagner, Graells-Garrido, Garcia, and Menczer, 2016; Ford and Wajcman, 2017). To visualize these inequalities in nearly real time, the Humaniki tool was developed.

Humaniki was created in 2020 by merging two previous data visualization projects. Data scientist Maximillian Klein created the Wiki Data Human Gender Indicators project in 2016. The French project Denelezh was created by Enzel Le Mir for Wikimedia France in 2017. Both projects utilized the Wikidata API and merged because of their significant overlap and shared mission, and Klein recently received a grant from the Wikimedia Foundation to continue this work. Humaniki is also built using Python, and its backend code is available on GitHub

Humaniki has many ways to explore this data. One of the most interesting is to look at the numbers based on language. Wikipedia isn’t just available in English, and Humaniki offers users the chance to look at gender representation for biographies in 529 languages! Another interesting data point is Year of Birth, and the trends in the Humaniki data suggest the gender gap closes slightly for biographies about younger people. For example, 23% of biographies on people born in 1963 are about women. For biographies on people born in 1983, however, 29% are about women. 

Humaniki also provides numbers of biographies on people who identify as “other genders” (people whose gender identity is not cisgender). For each metric, you can review the “Other Genders Breakdown,” which lists out all the gender identities (trans women, trans men, nonbinary, genderfluid, two-spirit, etc.) included in that particular data point. The “Other Genders” metric is important because the numbers are so stark. Looking back to our examples from 1963 and 1983, only 16 biographies in the 1963 dataset and 31 from 1983 are about people who don’t identify as cisgender – that’s out of more than 50,000 biographies! This highlights the great need to create and expand articles on people who identify outside of the traditional gender binary.

Humaniki is a useful tool for building awareness of the Wikipedia gender gap, and there are many ways to act upon this knowledge and get involved. The UT Libraries sponsors multiple Wikipedia edit-a-thons focused on improving articles about women and LGBTQ+ people. Every March, we host Queering the Record, a homegrown edit-a-thon to improve queer and trans representation, and we participate in the international campaign Art + Feminism, which focuses on gender, feminism, and the arts. Additionally, we’ve hosted one-off edit-a-thons covering Latinx and Mexican women, Indigenous languages, and women and LGBTQ+ people in STEM fields. Keep an eye on the UT Libraries events page to learn about future edit-a-thons!

Scholarship and Popular Press on the Wikipedia Gender Gap

Bear, Julia B., and Benjamin Collier. “Where are the women in Wikipedia? Understanding the different psychological experiences of men and women in Wikipedia.” Sex Roles 74, no. 5-6 (2016): 254-265. 

Filipacchi, Amanda. “Wikipedia’s Sexism Toward Female Novelists.” The New York Times, April 24, 2013. 

Ford, Heather, and Judy Wajcman. “‘Anyone can edit’, not everyone does: Wikipedia’s infrastructure and the gender gap.” Social Studies of Science 47, no. 4 (2017): 511-527.

Gordon, Maggie. “Wikipedia Editing Marathons Add Women’s Voices to Online Resource.” Houston Chronicle, November 9, 2017. https://www.houstonchronicle.com/life/article/Adding-women-s-voices-to-Wikipedia-12344424.php

Graells-Garrido, Eduardo, Mounia Lalmas, and Filippo Menczer. “First women, second sex: Gender bias in Wikipedia.” In Proceedings of the 26th ACM Conference on Hypertext & Social Media, pp. 165-174. 2015.

Hill, Benjamin Mako, and Aaron Shaw. “The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation.” PloS One 8, no. 6 (2013): e65782–e65782.

Paling, Emma. “The Sexism of Wikipedia.” The Atlantic, October 21, 2015. https://www.theatlantic.com/technology/archive/2015/10/how-wikipedia-is-hostile-to-women/411619/

Stephenson-Goodknight, Rosie. “Viewpoint: How I Tackle Wiki Gender Gap One Article at a Time.” BBC News, December 7, 2016. https://www.bbc.com/news/world-38238312

“The Nobel Prize Winning Scientist Who Wasn’t Famous Enough for Wikipedia.” The Irish Times, October 3, 2018. https://www.irishtimes.com/life-and-style/people/the-nobel-prize-winning-scientist-who-wasn-t-famous-enough-for-wikipedia-1.3650212

Wagner, Claudia, Eduardo Graells-Garrido, David Garcia, and Filippo Menczer. “Women through the glass ceiling: gender asymmetries in Wikipedia.” EPJ Data Science 5 (2016): 1-24.

Scant Communications, Devastating Impacts

Dale J. Correa is the Middle Eastern Studies Librarian and History Coordinator for the UT Libraries, and she regularly teaches on research data/citation management for the humanities at The University of Texas at Austin.

Hannah Chapman Tripp serves as the Biosciences Librarian and has provided research help with a variety of citation management programs at The University of Texas at Austin and previous institutions.

Where Did My Data Go?

In Fall 2020, registered Mendeley users received a message via email titled “Improving Mendeley to Better Support Researchers,” regarding some intended updates to Mendeley’s service model. These changes included the removal of several Mendeley library features, including the Public Groups feature that allowed for large groups to share references and notes openly. These groups were particularly appealing to some scholars as they represented a method to share resources openly, publicly, and free of cost in both invited and open group settings (without a limit on membership to the group). Under the Public Groups umbrella, both the invite-only and the open groups were included in Mendeley’s feature-removal plans. Unfortunately, Mendeley’s email did not explicitly state the intention to delete the Public Groups from individual Mendeley users accounts with the coming update — which went into effect in March 2021, and meant that individual users found their locally-stored files from these groups deleted on their own machines.

Researchers who used this feature were somewhat unlikely to have encountered that email message or have read it through thoroughly. After all, many emails from services utilized by researchers contain information about updates, but much of it goes unread. And, of course, some email systems would automatically detect messages like this one as spam or junk, and so would send them directly to a folder that, unless checked, frequently goes unnoticed and unchecked.

As “announced,” Mendeley went ahead with the plan and began removing certain features, including Mendeley Feed, Mendeley Profiles and Mendeley Funding in December 2020. In March 2021, Mendeley began retiring Public Groups. It does not seem that there was further, specific communication regarding the Public Groups retirement in the lead-up to this change in March.

While we fully acknowledge the need for commercial companies to pivot priorities, continue development of what’s working and in some cases remove features that are less popular and see less return on investment, the awareness campaign for these changes clearly did not reach enough of the affected audience to warrant the deletion of features from an individual user’s Mendeley library. The failure of this important information to reach registered Mendeley users is evidenced by many, many, many reactions on Twitter from the scholarly community. While most scholars understand the need to make changes to a platform and continue to improve the services offered, they are also outraged at the lack of effective communication prior to deleting this feature.

Mendeley has acknowledged that there was not enough time or communication involved in this plan to remove features, and has since re-enabled the invite-only groups, a subset of the Public Groups, for a brief period of time so users can retrieve their data. It is a significant concern of many researchers that all of the content in the Open Groups (which was the other option under the Public Groups umbrella) is not going to be restored and that the data has been lost permanently. For many academics, this is a devastating realization, as years of research and references have been erased with deficient notice. Although Mendeley has apologized for the handling of these changes, the fact remains that some scholars — including those in the more vulnerable categories of PhD student, post-doc and non-tenured faculty — are left without vast quantities of their research.

Lessons Learned, Principles to Practice

While this is an unfortunate situation, we hope that some takeaways can be gained from the experience. For researchers, the importance of backups, knowing your product and an awareness of the fact that changes are quite likely, are a few of the points we hope to address.

Backing up research data is important, regardless of the type of data or original format. A best practice in data retention habits is the 3-2-1 rule, wherein three copies of research data are maintained, in two separate formats locally, and one copy offsite. Some researchers wrongly assumed that with Mendeley’s storage and syncing they were achieving at least a portion of this best practice; however, they learned in practice that when data is deleted from the Mendeley web version, that deletion can be synced down to any local copy of Mendeley connected to the web. In order to have the 3-2-1 rule appropriately in practice with Mendeley data, researchers must back up a copy of their data to an external hard drive location and an online cloud storage solution separate from Mendeley. What makes this situation trickier is that, starting in 2018, Mendeley began encrypting researchers’ local data folders, making it very difficult to access one’s own data when not using the Mendeley interface (although some researchers have identified workarounds to the encryption). What should be backed up, rather, is data exports from Mendeley in open file formats and PDFs, including notes, to ensure that researchers will be able to access, use, and rebuild their reference libraries if their Mendeley data itself becomes corrupt or a change in Mendeley services affects their access.

With RIS (Research Information Systems bibliographic citation file format) files and PDFs backed up to the local machine as well as to a back up option like UT’s Box, researchers would have the option to continue using Mendeley, or move their data to another citation management software such as Zotero or EndNote. For those who are continuing to use Mendeley, incorporating a backup system as described above is the recommended option for ensuring long term access to integral research references, notes, and files (particularly annotated PDFs).

It is also important to keep abreast of changes in the software. As librarians, we are just as guilty as the next person of not reading terms of use or new update details before initiating a download. We could all make a better effort to read through the software’s terms of use.

Mendeley — owned by a for-profit company — will continue to optimize the most attractive, state-of-the-art, and revenue-generating features and functionality in their product. This process inevitably means refocusing efforts and making tough decisions about what features to no longer support. However, the realities of software changes and obsolescence are not confined to Mendeley or, for that matter, to for-profit companies. For example, the backups you made decades ago to a floppy disk are likely no longer retrievable due to hardware changes and potential software obsolescence.

So, whether you have lost your data with this change in Mendeley services or you are one of the lucky ones who was not relying so heavily on the free Public Groups features, we strongly recommend that you use a sensible back up system; back up in open formats from which you can easily retrieve your data no matter what system you’re using; and keep an eye on the crucial changes that come with software updates. We are here to assist with data and citation management best practices — please see the Research Organization with Citation Managers LibGuide for more information.

Building On Black Lives Matter

At their core, library collections have an intention to reflect the values of society and to represent the resources that the community most needs to advance those values. Historically, though, the lack of diversity in the realm of scholarship and publishing disregarded the promotion of certain voices, and so collections have been somewhat carelessly conceived and built without adequate attention to, or equity for, all points of view.

Part of the strategic focus for the Libraries is the concept of IDEA – Inclusion, Equity, Diversity and Accessibility – and making a conscious effort to permeate organizational work within its framework. Libraries are by nature democratic institutions, but as we’ve come to recognize over the recent years – and more poignantly in the last twelve months – there is much work to be done to improve the fairness and justice of our systems, and how we operate them. Taking a hard look at how and why we gather the resources we do is low-hanging fruit for redressing past practices, and for beginning to recognize and atone for those shortcomings.

A recent effort by the Libraries’ Scholarly Resources Division to consider ways to apply IDEA concepts to their work resulted in a significant project to begin diversifying the Libraries’ collections practices. The effort was holistic in approach, but work on specific subject areas bears special notice for the initial success of outcomes. One of those areas which is of currency to recent history is the collections related to the Black Lives Matter movement.

Social Sciences Librarian Bill Kopplin took up the project in part because of its current social relevance, but also because of its interest to campus communities.

“At its heart the BLM movement is an extended anti-government protest, so it seems like it was already by definition an integral part of my subject purview,” explains Kopplin, “but it was also obvious that there was a great deal of interest in this subject on campus.” 

Bill Kopplin

 “There was both individual research interest, and classroom use going on,” says Kopplin. “And I have checked the circulation records for some of our older print books on the civil rights movement and those check out numbers are very high.  Of course, the BLM movement fits into the much larger social, political, and historical context of the civil rights movement, which is an extremely interdisciplinary subject area, so as a social sciences liaison librarian, it was all good.”

Kopplin suspected that the BLM collections needed attention, but to begin the process of building out the BLM collections for the Libraries, he needed to get an idea of what was “on the shelves.” “I actually have a fair amount of experience comparing collections dating back to my days as the computer science bibliographer,” he says, “and since I knew that the Black Lives Matter movement was a relatively recent phenomena, I realized the number of entries in various library catalogs under a BLM subject heading would be both very specific and relatively low in absolute number.

“Comparing them would be doable and hopefully informative as to the relative amount of recent collection activity that was going in at various campuses by our peer institutions,” he continues. “So last summer I looked at the BLM catalog entries, and while it was a bit hard to make definitive statements, it was clear to me that we didn’t have as many titles as some of our other fellow libraries.” 

That proved to be a generous characterization. UT and state peer Texas A&M were on the low end of subject area collections for BLM materials nationwide among research libraries. The topic was relatively emergent, with terminology still significantly in a developmental period, and a lot of work needed to be done on targeting resources that were useful to the field of study and traversed the various facets of the subject. The Libraries had a pretty meager 11 titles that could be considered in the area; to contrast, Kopplin discovered that Penn State had 44.

But the comparative infancy of the subject area had the converse effect of somewhat simplifying the solution to the deficit in the collections. “If I was considering collections in a large subject area like chemistry I would obviously have to target a small subset of that to do any interesting collecting, but the BLM movement is so far a pretty small subject area when looked at as part of the overall book publishing industry, so I didn’t really do much targeting,” explains Kopplin. “Basically, if a title showed up on a published list of ‘best BLM books’ and it was available to us as an orderable ebook in GOBI (the Libraries’ main book vendor), I would try to order it. And there were scores of these ‘best books’ lists to go on.” 

“So, if someone somewhere recommended a BLM title on a published list, I treated that like a favorable book review and I would try to order it.”

Since the inception of Kopplin’s work on the project, the Libraries has acquired more than 100 titles, and that collection continues to grow to support increased interest in Black Lives Matter and related subjects around social justice, systemic racism and police brutality. Scholarly Resources Division staff are reviewing approval plans – arrangements with a large vendors to automatically get needed resources from major publishers – to improve processes and ensure that historical homogeneity in publishing doesn’t impede the Libraries efforts at diversifying the collections. 

“My upcoming summer project is to go back and re-examine our holdings in comparison to our peers to see if we have made any progress,” says Kopplin “But I’m not too worried, the project itself has been the reward and it is really pleasing to know that our collection is now stronger in this specific area.”

The work Kopplin is doing is just a small part of the much larger effort at collections diversification, though. As head of collection development, Carolyn Cunningham is involved in oversight of the various efforts, and views it as a new part of normal practice for the Libraries going forward.

“Of course, there are many other librarians working to make our collections relevant to our students and researchers,” says Cunningham. “All of the subject librarians use their expertise to monitor the publications coming out in their areas and make sure we get important resources.”

“The team is committed to using an IDEA lens in all of our work, beyond special projects or short-term initiatives,” she continues. “This means that we approach every request for a book, every new product offer, and every decision about how to use collection funds with the frame of mind that we will strive to include diverse voices in our collection and orient ourselves toward finding and making available resources that include the many experiences and perspectives of our campus community and beyond.”

For his part, though, Kopplin has taken away a greater appreciation for the subject. “I can’t tell you how rewarding this project has been to me personally.”

Kopplin relates a significant discovery from his research to explain.

“I’m a car guy, love everything about cars. How do cars related to BLM, you ask? Interstate 375 –the Walter P. Chrysler Freeway in downtown Detroit –is a little-known example of the little-known phenomena of infrastructure racism.  It is a 1-mile long highway that held the distinction of being the shortest interstate in the national system. It was not needed as a transportation solution.  It was built to level a historically African-American communitycalled Black Bottom that was sort of Detroit’s answer to Harlem.”

“The BLM movement has brought increased awareness of police brutality, it has brought increased awareness of things like Confederate-era statues, it has brought increased awareness of the larger civil rights movement, and it has brought increased awareness of hidden things like infrastructure racism, which I knew very little about before this project.  There are now proposals being considered to demolish I-375.”

“I have learned so much,” says Kopplin. 

Behind the Numbers: UX

This post will focus on how the Assessment Team has begun officially dipping our toes into User Experience (UX) research by conducting a usability test focused on part of the main navigation of the UT Libraires website. Why, you might ask, is this assessment-focused column talking about UX?

In many ways, my assessment practice has always incorporated a good bit of user experience work, though I haven’t typically labeled it as such. Past endeavors such as dot poster surveys (used to learn how students were using new library spaces) and a branch observation project (that was interrupted by the pandemic) employed user experience methodologies, and I see user experience and assessment as complementary and overlapping approaches to asking and answering questions aimed at improving what we do.

When the Libraries redesigned our website a few years ago (which was a huge accomplishment involving many of my talented colleagues), the site redesign process incorporated user feedback by conducting A/B tests, usability tests, focus groups, and more. Now that the site has moved out of development and into sustainment, there are fewer resources devoted to conducting user tests. My colleagues have been busy producing great new tools like portals for our digital exhibits, our digitized and born-digital items, and geospatial data, but we were not sure how to best incorporate them into our site navigation. Members of the Web Steering CFT have conducted user tests as needed/possible, and the Assessment Team decided to help in the effort and take on a UX project this spring to help answer questions we had about our navigation menu choices.

A screenshot of the "Find, Borrow, Request" menu that includes links to Library Catalog, Articles, Databases, Journals, Course Materials, Collections Showcase, Digital Collections, Digital Exhibits, Maps, and Geospatial Data.

Along with a small team of other colleagues, we designed a series of questions and tasks focused on the “Find, Borrow, Request” portion of our website and recruited 10 students to participate in brief UX tests conducted through Zoom. While the pandemic has made many aspects of user research more difficult, we were easily able to recruit students through an email invitation, and were overwhelmed with the volume of interest we garnered. We just finished conducting tests earlier this week and haven’t analyzed the results yet, but I already learned through my role in conducting tests that terms like “Collections Showcase” and “Digital Exhibits” are not self-explanatory to the majority of our students. Most surprisingly, the label “Maps” (which we did not expect to be confusing) was misleading to most of the students I conducted or observed tests with. Students generally expected to find a map of library locations or library floorplans at the link, but the link actually leads to our collection of digitized maps of places all over the world. This underscores the importance of conducting frequent user testing. We never would have learned that “Maps” was confusing if we hadn’t been testing adjacent links! Clearly we need to rethink our labels.

I’m excited to analyze the full results and turn them into recommendations for improving the site. I’ve even more excited about expanding our team to include a librarian focused on UX so we can increase our ability to conduct tests like this. We just posted a position for a UX Librarian to join the Assessment and Communication Team to help us ensure that our spaces and services (both web and physical) are welcoming and functional for our users. The eventual end of the pandemic provides ample opportunity for rethinking how we have always done things, and we hope that a UX Librarian will help ensure that the changes we make help our users have great experiences at the UT Libraries.

WHIT’S PICKS: TAKE 9 – GEMS FROM THE HMRC

Resident poet and rock and roll star Harold Whit Williams is in the midst of a project to catalog the KUT Collection, obtained a few years ago and inhabiting a sizable portion of the Historical Music Recordings Collection (HMRC).

Being that he has a refined sense of both words and music, Whit seems like a good candidate for exploring and discovering some overlooked gems in the trove, and so in this occasional series, he’ll be presenting some of his noteworthy finds.

Earlier installments: Take 1Take 2Take 3Take 4Take 5Take 6, Take 7, Take 8

Acetone / Cindy

Available at Fine Arts L​ibrary Onsite Storage

Criminally-overlooked and ultimately doomed L.A. stoner garage-roots trio Acetone droned away in near obscurity during the 1990’s “alternative” heyday, but one can hear their influence on today’s wealth of indie pop and Americana music. Cindy, their first full-length, rocks hard throughout and is built upon overdriven guitars rather than the mellow Gram Parsons-esque atmospherics that would color their subsequent psych-country records. Here’s that time-travelling, road-tripping, couch-surfing soundtrack you’ve long been waiting for. 

Conrad Herwig Nonet / Sketches of Spain y mas

Available at Fine Arts Library Onsite Storage

Trombonist and bandleader Conrad Herwig takes mucho Latin Jazz liberties with this classic Miles Davis work, plus three other pieces (y mas). His New York-based nonet parties hearty in an Afro-Cuban style live at the Blue Note on Davis’ Solar, Seven Steps to Heaven, and Petits Machins, but it’s the majestic 25 minute-long epic Sketches of Spain that stands out admirably here. With highlights from trumpeter Brian Lynch, saxophonist Paquito D’Rivera, and especially the shock-and-awe back and forth between drummer Robby Ameen and conguero/percussionist Richies Flores. 

Amadou & Mariam / Wati

Available at Fine Arts Library Onsite Storage

Stretching the boundaries of traditional north African music, Amadou & Mariam unabashedly mix in healthy doses of rock, blues, pop, and funk into their full band’s hypnotic groove. Having met as students at Mali’s Institute for the Young Blind, the two became a couple, and musically involved as well. Wati leans more in a Western direction – production and instrumentation-wise – but the heart and the soul of the record comes straight from Bamako. A mesmerizing and exuberant Afro pop celebration. 

Britta Phillips & Dean Wareham / Sonic Souvenirs

Available at Fine Arts Library Onsite Storage

Another musical couple here (Britta Phillips and Dean Wareham of indie rock royalty Luna) spices things up nicely with this short and sweet six-track EP. Enlisting famed Bowie producer Tony Visconti for these revamped versions from an earlier album, the duo grooves in a downtown underground style a la Velvet Underground & Nico. Warehams’ low energy slacker vibe is baked in, but it’s Phillips’ coy and coquettish vocals icing this delightful dream pop cake. 

Shirley Scott / Memorial Album

Available at Fine Arts Library Onsite Storage

Always somewhat in the shadows of other Philadelphia B3 organ legends (Jimmy Smith, Jimmy McGriff), Shirley Scott’s exquisite soul jazz chops were nevertheless second to none. The subtitle to this collection, “Queen of the Organ” is no hyperbole, as any experienced hepcat listener can attest to. Culled from her Prestige (and other labels) recordings, these tracks showcase Scott’s solo artist virtuosity as well as her steady grooving backup session work with the likes of Eddie “Lockjaw” Davis, and her husband at the time, Stanley Turrentine. Talk about Philly soul? Then you’ve got to be talking about Shirley Scott.

[Harold Whit Williams is a Content Management Specialist in Music & Multimedia Resources. He writes poetry, is guitarist for the critically acclaimed rock band Cotton Mather, and releases lo-fi guitar-heavy indie pop as DAILY WORKER.]