Dr. Cohen’s article on data mining has several excellent points. For example, having a more comprehensive, humanities-only API/data-mining tool would yield more and better results than a sufficient search using Google’s engine. It’s also interesting to note that free resources can provide better quality data, because these resources can be manipulated more easily, unlike Google’s library of scanned books.

Now first, I would like to stop here for a second and analyze what Dr. Cohen means by this. I believe he means that using a resource such as wikipedia could help us data mine information on a subject and more easily correlate how that subject relates to a different subject. For example, Dr. Cohen talks about George W. Bush and how we might be able to distinguish him from his father. Google’s library scanning program might let you know which books have the word “Bush” in them, and an exhaustive cross referencing of terms including “the President”, “Iraq”, etc. may help us further distinguish the second president from his father, yet having a more manipulative source would make it easier to use different algorithms to find answers to these questions more quickly. At least, I think. Frankly, how it is more easily manipulated is something I’m not sure I understand and would love some clarification on.

Another issue I’d like to bring up is: Why is it better to have a large quantity of articles in your database verses a few qualitative articles? When I first read this, I assumed Dr. Cohen was referring to a QA model, where a search engine such as H-bot might be able to answer your question based on what the majority of the internet says. Or, you could grasp the general scope of a particular event in history based off of what the majority of websites said. But I’m not quite sure how this could come into play. I’d be concerned that the exclusion of important works of history could severely limit our access to crucial historical knowledge. For example, I currently am studying Women in the Chinese Enlightenment by Wang Zheng. In it, she provides a reliable argument that the history of the Chinese feminist movement is inherently flawed. The accepted idea among most western scholars is that the Chinese feminist movement was limited and stifled until the Communist Party rose to power, providing emancipation for women. Yet Wang makes a good argument (that apparently has shook the academic community to its core on this particular issue) that not only suggests Chinese feminism thrived during the 1920’s without the help of the CCP (Chinese Communist Party), but that the CCP may have been the political organization that proved problematic for women’s rights. This is not the perspective that most of the web seems to take, and indeed, a search for “Chinese feminism” leads to a multitude of responses which usually discuss different aspects of the CCPs actions towards feminism, barely discussing anything about the Guomindang (the rightist, opposing political faction in China), or independent Chinese feminist organizations that existed before the “liberation” of China in 1949. Based off of this knowledge, and API data mine would give a searcher a potentially very narrow view of China’s feminist history, limiting our ability as historians to discover new facets of history.

But then again, much of the terminology in this article seems very vague and I had a hard time understanding some things. So this assumption could be very wrong. For example, a large database like this could HELP a searcher find a hidden nugget of truth, much like how history can be changed by a person maticulously going through letters in an archive and discover a long forgotten letter everyone had overlooked.