-
A Statistical Analysis of the Demographics of Physics Textbooks
Homogeneity in textbooks is a new area of study with few major papers. In this study, we work to automate the analysis of textbooks using Natural Language Processing. At this stage, sex and birth location are the variables we can analyze. We analyzed 17 textbook used in the Physics and Astronomy Department at Pomona College…
-
The History of Modern Science Fiction
With the main two parameters we are taking a look at here are the correlation of Science Fiction literature and history, it seems fitting to look at… the history of science fiction! Although relatively new to the scene when compared to genres such as romance or realistic fiction, science fiction still features a historical development…
-
Transformers – A Better Model?
So a little while back we talked about LSTMs and how they were so much better than that mishmash of classifiers we used to create a voted classifier. We talked about how the LSTM was so much smarter than those other deep learning models and neural nets and how it was well suited for text…
-
Using HathiTrust
One of the biggest struggles throughout this whole process has been locating data for analysis. Due to American copyright practices, a ton of classics that are written as early as post 1930s are still in copyright and thus cannot be legally obtained through databases such as Project Gutenberg and Internet Archive. Thus to obtain these…
-
Introductions
The bildungsroman or coming-of-age story is a fairly common archetype in literature. Our ragtag protagonist starts off as a young and inexperienced figure that is not entirely fit for the grueling outside world. Through a series of trials and tribulations, they rise to the occasion and grow as a character, rounding off a fairly linear…
-
Results: Attempt 2
We come back this time with our trained LSTM deep learning model and a set of cleaned books that we want some answers from. Gone is the NLTK voted classifier, for we now have the bright and shiny new toy to run our books through. Feeding our data into the deep learning model is not…
-
The Deep Learning Model(s)
Moving on from that shoddy first model of voted classifiers, we come about to what is all the rage nowadays: a deep learning model! I had the plan all laid out in my head: Toss a bunch of data into this guy, let him connect the dots, and come out with a trained model that…
-
Results: Bag of Words Binary Classification
And we are back! … with some slightly underwhelming results. After feeding our data to our nifty little classifier, we have gotten a set of unexpectedly positive results. Out of all the books we fed to our hungry, hungry algorithms, we learned that a whopping 49 of them were positive, with a meager 2 negatives…
-
Data-Cleaning
From what I’ve read/heard/learnt, data-cleaning is one of the most important data-set pre-processing steps out there. At the end of the day, we are feeding a mish-mash of text into a bunch of algorithms that sort through these words without any regard for any external considerations or precautions. These algorithms due their work in silence,…
-
The Dataset!!
Coding is hard, so is math, so is English: these technical bits of this research process all yielded obstacles I was prepared to overcome, to understand, to learn about. Yet I was ill-prepared for the most difficult task of them all, the most daunting plight in which this entire process hinges upon: Who the heck…