/module2-findingdata

module 2 for hist3907b

Primary LanguageShell

How do we find data, anyway?

datum

'Something given'. That's a nice way of thinking about it. Of course, much of the data that we are 'given' wasn't really given willingly. When we topic model Martha Ballard's diary, did she give this to us? Of course, she couldn't have imagined what we might try to do to it. Other kinds of data - census data, for instance - were compelled: folks had to answer the questions, on pain of punishment. This is all to suggest that there is a moral dimension to what we do with big data in history. Stop for a moment and read 'the joys of big data' (if you haven't already) and then 'the third wave of computational history'.

Big data is not value-neutral; we need to think about, and talk about, what it means to collect, transform, analyze and visualize it. Who has the power here? (and you might also reflect on 'the most profitable obsolete technology' ) Finally, you might also think about recent history - listen to Ian Milligan discuss how Yahoo's closure of Geocities represented a terrible blow to social history.

Accepting that big data is out there, that there's more material than one person can usefully digest and understand, and that a big-picture, macroscopic point of view is a useful perspective, means also thinking about the digital milieu that makes this possible. But see this piece by Tim Sherratt on Seams and edges: Dreams of aggregation, access & discovery in a broken world. We interact with the data we find, and in the process, we alter both it and ourselves! We'll discuss this in class, and as you do your projects, think about the ethical, moral, and legal dimensions to what you are doing. Keep track of your thoughts in your open notebook.

Ok, so?

So how can we find big data? The exercises in this module will teach you how to use wget on the command line to grab webpages; they will introduce you to the concept of APIs and what you might achieve with them as a historian; they will have you use some existing free and commercial tools for webscraping; and we will also learn how to grab social media data as well.

You might like to begin with this list of resources; more being added in time.

And don't forget serendipity

Follow researchers and institutions in your field of study. Today on Twitter I saw something that struck me as an excellent find. Penn Libraries tweeted, and I retweeted, a link to a traveller's diary from the 19th century - a woman who sailed from the US to Europe and thence the Nile, which she ascended and explored. My tweet led to a flurry of activity amongst scholars, and even now, the transcription has begun...

But first... let's set a bit of framework.

If we're going to find data, we need to be able to access the power of our machines, to get them to do what we want. It's worth thinking about what Corey Doctorow has called the war on general purpose computing as we begin...

...and then thinking about what 'search' actually means. Check out Ted Underwood's piece on 'Theorizing Research Practices We Forgot to Theorize Twenty Years Ago'.

Finally, Cameron Blevins has some thoughts on the 'perpetual sunrise of methodology'.