I got drawn into using Google Ngrams for my data visualization, and unfortunately found it to be a much less powerful and versatile tool than I would like.
It all started when I was messing around in Ngrams with various terms for weapons. Did writers use concrete terms like flintlock or rimfire, less concrete terms like musket, pistol or rifle, or very vague terms such as gun? I decided I wanted to expand this investigation with a better tool but still using Google’s corpus, when I got stuck.
Google allows anyone to download their corpus (good), however it turns out this would be a complicated and lengthy project (bad). Their corpus is actually text files with word counts generated from Google books, organized to be machine-readable and posted in many small bits. This means it would be very laborious and confusing to download manually, and because of the format Google uses, it would be worthless to upload to Voyant or other text analysis tools. Instead, I would have to write a program to download and modify the files into a useful format or just write my own text analysis tool. Defeated, I decided to use Ngrams.
Before I settled on a final set of words, I explored many options. I started with a generic catch-all term, gun, as my baseline. I added the primary types of firearms, musket and rifle, and then tried some melee weapons, such as sword and pike. I broadened the field with bigger weapons, adding artillery and cannon, and then added bomb, since it was a term in the 1700s which I expected would radically rise in usage when it changed meaning. Specific words, like matchlock, culverin, or howitzer, had so few mentions that they were only a line at the bottom. I also tried shell, but removed this because there’s no way to distinguish between artillery shells and animal shells.
At this point the Ngram was so crowded I had to remove some things, so I took out the specific terms, since I can note they had very few hits without clogging the graph. I also took the melee weapons out from most of the graphs to get a better view of what was going on with the others. I explored going earlier than 1700, but Google doesn’t have enough data to tell me anything useful before that. Plus, firearms tech was still in its infancy during that time, and there weren’t really standard terms yet.
Looking at my Ngrams, the first thing I noticed was the spikes in mentions of weapons during years of war. Not surprising. Right before 1850, rifle supersedes musket as the specific term of choice, just as rifles were then superseding muskets as the firearm of choice. Bomb also rises into much more common use around WWII, coming even with artillery with the advent of strategic bombing. Finally, gun is consistently the term of choice over rifle and musket, which could speak to general laziness or lack of knowledge when discussing firearms.
I was surprised to see no spikes of weapon terms around the Vietnam War, Gulf War, or Iraq/Afghanistan wars. I was also surprised to see the prominence of swords until WWI, and their resurgence after 2000.
I made additional Ngrams to compare British and American uses of weapon terms, but I’ve run out of space to discuss them.
Ngrams allows users to search google books for their terms in context, but it is nowhere near as helpful as similar features on other text analysis tools. It has helped me gain a general overview of the use of weapons terms, and provided one or two surprises that could be interesting to look in to. However, it is much less powerful than I would like. If I wanted to use this in a paper, I would have to do much more analysis and add several caveats before I would be comfortable drawing conclusions. But isn’t that what we do with all our sources?