Category: Digital Humanities

January 13, 2021

A Statistical Analysis of the Demographics of Physics Textbooks

Homogeneity in textbooks is a new area of study with few major papers. In this study, we work to automate the analysis of textbooks using Natural Language Processing. At this stage, sex and birth location are the variables we can analyze. We analyzed 17 textbook used in the Physics and Astronomy Department at Pomona College…

Textbook Statistics
July 17, 2020

The History of Modern Science Fiction

With the main two parameters we are taking a look at here are the correlation of Science Fiction literature and history, it seems fitting to look at… the history of science fiction! Although relatively new to the scene when compared to genres such as romance or realistic fiction, science fiction still features a historical development…

Sentiments of Science Fiction
June 22, 2020

Transformers – A Better Model?

So a little while back we talked about LSTMs and how they were so much better than that mishmash of classifiers we used to create a voted classifier. We talked about how the LSTM was so much smarter than those other deep learning models and neural nets and how it was well suited for text…

Sentiments of Science Fiction
June 16, 2020

Using HathiTrust

One of the biggest struggles throughout this whole process has been locating data for analysis. Due to American copyright practices, a ton of classics that are written as early as post 1930s are still in copyright and thus cannot be legally obtained through databases such as Project Gutenberg and Internet Archive. Thus to obtain these…

Sentiments of Science Fiction
May 26, 2020

Introductions

The bildungsroman or coming-of-age story is a fairly common archetype in literature. Our ragtag protagonist starts off as a young and inexperienced figure that is not entirely fit for the grueling outside world. Through a series of trials and tribulations, they rise to the occasion and grow as a character, rounding off a fairly linear…

Tracing the Development of a Bildungsroman
May 20, 2020

Results: Attempt 2

We come back this time with our trained LSTM deep learning model and a set of cleaned books that we want some answers from. Gone is the NLTK voted classifier, for we now have the bright and shiny new toy to run our books through. Feeding our data into the deep learning model is not…

Sentiments of Science Fiction
May 7, 2020

The Deep Learning Model(s)

Moving on from that shoddy first model of voted classifiers, we come about to what is all the rage nowadays: a deep learning model! I had the plan all laid out in my head: Toss a bunch of data into this guy, let him connect the dots, and come out with a trained model that…

Sentiments of Science Fiction
April 20, 2020

Results: Bag of Words Binary Classification

And we are back! … with some slightly underwhelming results. After feeding our data to our nifty little classifier, we have gotten a set of unexpectedly positive results. Out of all the books we fed to our hungry, hungry algorithms, we learned that a whopping 49 of them were positive, with a meager 2 negatives…

Sentiments of Science Fiction
April 13, 2020

Data-Cleaning

From what I’ve read/heard/learnt, data-cleaning is one of the most important data-set pre-processing steps out there. At the end of the day, we are feeding a mish-mash of text into a bunch of algorithms that sort through these words without any regard for any external considerations or precautions. These algorithms due their work in silence,…

Sentiments of Science Fiction
April 5, 2020

The Dataset!!

Coding is hard, so is math, so is English: these technical bits of this research process all yielded obstacles I was prepared to overcome, to understand, to learn about. Yet I was ill-prepared for the most difficult task of them all, the most daunting plight in which this entire process hinges upon: Who the heck…

Sentiments of Science Fiction