Homogeneity in textbooks is a new area of study with few major papers. In this study, we work to automate the analysis of textbooks using Natural Language Processing. At this stage, sex and birth location are the variables we can analyze. We analyzed 17 textbook used in the Physics and Astronomy Department at Pomona College using scanned indices and found — as expected — a male and western-dominated demographic of physicists and others mentioned within the textbooks.
To tag names from textbooks, we utilize a combination of a Stanford NamedEntityRecognition model trained on 1,393 English news articles and name matching for famous physicists/Nobel prize winners. Due to the nature of the training set, we must keep in mind that this model will hold a bias towards tagging white men.
To tag birth country, we scraped xml data from physicists with Wikipedia pages and hand-tagged those without.
Our automated tagging model has a testing accuracy of around 92% from testing with three hand-tagged textbook indices, so not all names may be found. Furthermore, textbooks were scanned and converted to text utilizing optical character recognition, which is also prone to error due to noisiness and imperfect scans. For example, the OCR may not pick up a letter, which would lead to an unintentional duplicate name. This occurred a few times.
Textbooks Analyzed:
Phys 003: Musical Acoustics, Phys 41, 42: University Physics, Physics for Scientists and Engineers, Phys 70: Prof. Moore’s 6 Ideas: Unit C, Q, R, T, Phys 71, 72: Prof. Moore’s 6 Ideas: Unit N, E, Phys 101: A Fundamental Approach to Modern Physics, Phys 125: Classical Mechanics, Phys 128: The Art of Electronics, Phys 170: A Modern Approach to Quantum Mechanics, Phys 175: Intro to Thermal Physics, Astr 51: Introduction to Galaxies and Cosmology, Introduction to the Sun and the Stars, Astr 101: To Measure the Sky: An Introduction to Observational Astronomy
Combining all textbooks and removing duplicates, we found that 95.31% of all tagged names were men, while 4.69% were found to be women. Typically, physicists were born in the North America and Europe, with most coming from the United States, United Kingdom, Germany, and France. The 20 physicists that appeared in the most textbooks also consisted of entirely men from almost entirely Europe or the United States, with Satyendra Nath Bose the only exception (India).
Individual textbooks also displayed relative homogeneity, with some featuring entirely men from the United States or Europe. The textbook that features the highest percentage of tagged women physicists was Introduction to the Sun and the Stars, with 6/49 (12.24%). In regards to birth country of physicists, most individual textbooks also primarily consisted of individuals hailing from the United States or Europe. Physicists from Asia, Australia, South America, and Africa were fairly minimal, with many textbooks not mentioning any physicists who were born in those locations.



Leave a comment