-
About
This project seeks to investigate the progression in sentiment of classic science fiction works from ~1900s-2000s. As various global, international, and domestic events occur in points in history, is there an affect on the sentiment of material released by science fiction authors?
-
The History of Modern Science Fiction
With the main two parameters we are taking a look at here are the correlation of Science Fiction literature and history, it seems fitting to look at… the history of science fiction! Although relatively new to the scene when compared to genres such as romance or realistic fiction, science fiction still features a historical development…
-
Transformers – A Better Model?
So a little while back we talked about LSTMs and how they were so much better than that mishmash of classifiers we used to create a voted classifier. We talked about how the LSTM was so much smarter than those other deep learning models and neural nets and how it was well suited for text…
-
Using HathiTrust
One of the biggest struggles throughout this whole process has been locating data for analysis. Due to American copyright practices, a ton of classics that are written as early as post 1930s are still in copyright and thus cannot be legally obtained through databases such as Project Gutenberg and Internet Archive. Thus to obtain these…
-
Results: Attempt 2
We come back this time with our trained LSTM deep learning model and a set of cleaned books that we want some answers from. Gone is the NLTK voted classifier, for we now have the bright and shiny new toy to run our books through. Feeding our data into the deep learning model is not…
-
The Deep Learning Model(s)
Moving on from that shoddy first model of voted classifiers, we come about to what is all the rage nowadays: a deep learning model! I had the plan all laid out in my head: Toss a bunch of data into this guy, let him connect the dots, and come out with a trained model that…
-
Results: Bag of Words Binary Classification
And we are back! … with some slightly underwhelming results. After feeding our data to our nifty little classifier, we have gotten a set of unexpectedly positive results. Out of all the books we fed to our hungry, hungry algorithms, we learned that a whopping 49 of them were positive, with a meager 2 negatives…
-
Data-Cleaning
From what I’ve read/heard/learnt, data-cleaning is one of the most important data-set pre-processing steps out there. At the end of the day, we are feeding a mish-mash of text into a bunch of algorithms that sort through these words without any regard for any external considerations or precautions. These algorithms due their work in silence,…
-
The Dataset!!
Coding is hard, so is math, so is English: these technical bits of this research process all yielded obstacles I was prepared to overcome, to understand, to learn about. Yet I was ill-prepared for the most difficult task of them all, the most daunting plight in which this entire process hinges upon: Who the heck…
-
Natural Language Toolkit
One of the most helpful and powerful libraries I’ve used thus far in this process has been NLTK, which contains a bunch of modules that are very helpful for natural language processing. Below, I’ll outline a few basic uses. In order to maximize consistency when processing a text dataset, it is vital to adequately pre-process…