Our DH seminar homework for tonight is to write a brief blog post considering how we might use text mining in our upcoming digital history projects. Unfortunately for me, a project about an underwater mining town doesn’t seem particularly text mining friendly. Don’t get me wrong, I found, for example, this particular tool to be something potentially really useful as a way to get control of my growing corpus of Harvey Wiley literature. However, from my perspective, text mining is probably the least useful DH strategy that I’ve encountered here in the last week and a half or so.
The one time I did to some text mining may suggest why. This is the Google Ngram for “refrigerator v. icebox” (with “ice box” thrown in just for good measure)*:
I first did this Ngram while writing Chapter Six of Refrigeration Nation in order to confirm something that I already knew from my research: that before the advent of the electric refrigerator, what we now know as “iceboxes” were called “refrigerators” and that icebox is a term invented to differentiate boxes full of ice from the appliance that now runs in everybody’s kitchens. The fact that the terms “icebox”and “ice box” basically come out of nowhere precisely during the time when the first electric refrigerators were being developed basically confirm that fact.
Apparently, confirming things you already think you know is the best way to use text mining. I think that’s a good thing, as I’m not sure how I ever would have footnoted this in the book. In fact, how COULD you footnote this in a book if the corpus keeps changing?
But it is a pretty good trick to play with students that the cultural historians probably adore.
* Click the picture if you’re interested in a clear look.