Scrapism @ SFPC, Spring 2022

Instructor: Sam Lavigne | splavigne@gmail.com
Assistant Teachers: Omayeli Arenyeka and Ilona Brand
Location: Online
Time: Tuesdays 5-8pm ET (Section 1) and Thursdays 6-9pm ET (Section 2)
Office Hours: By appointment
Class Notes: (link to come)

Web scraping is the process of automatically downloading and manipulating web content. It's a common practice in silicon valley, where companies large and small transform open html pages into commodified datasets.

As an alternative, "Scrapism" is the practice of web scraping for artistic, emotional, and critical ends. By combining aspects of data journalism, conceptual art, and hoarding, it offers a methodology to make sense of a world in which everything we do is mediated by internet companies. These companies surveil us, exploit and financialize our experiences, and attempt to vacuum up every trace we leave behind. But in turn they also leave their own traces online, traces which when collected, filtered, and sorted can reveal or even intervene in power relations.

In this class participants will learn how to scrape massive quantities of material from the web with Python, and then use this source material in projects that probe the politics and poetics of the internet. We will cover multiple web scraping techniques, as well as different techniques for manipulating and presenting textual content.

Schedule

1. March 15th / 17th

Introductions. Using the terminal. Reading lines.

Readings for next week

Homework

  • Create a work of computationally generated poetry using only command-line tools. These might include grep, sort, tr, cat, sed, fold, curl, say, and others. You can repurpose an existing text, or write one on your own.

2. March 22nd / 24th

Intro to python. Manipulating text. Automating writing.

Readings for next week

Homework

  • Write a python script that combines texts from two or more sources to create a generative poem.

3. March 29th / 31st

HTML and CSS basics. Web scraping basics. Making big lists. Basic html publications.

Readings for next week

Project 1 (due on April 12th/14th)

Theme: The Language of Power

Brief: Compile a list, or an archive of text. Transform that archive into a zine or similar publication. Your publication can be printed or online. Experiment with how you sort or organize your archive, and pay special attention to how the presentation of your archive affects and manipulates the source material.


4. April 5th / 7th

Web scraping part 2: JSON and APIs. Fishing for data. Intro to NLP.

Optional readings for next week


5. April 12th / 14th

Project 1 crit. Scraping images. Basic image publications.

Readings for next week

Project 2 (due April 26th/28th)

Theme: The Commodification of Everything

Brief: Create an archive of images. Present the archive as a publication (in the broad sense) that enhances or underlines its content. Think about how the images are arranged and manipulated. For example, should they all be seen at once? In multiples of 10? One at a time? What determines the order of the images? Should the images be annotated? Vandalized? Should the images be modified or processed?


6. April 19th / 21st

Real browsers. Processing and analyzing images. Turning images into text.

Readings for next week


7. April 26th / 28th

Project 2 crit. Scraping video.

Readings for next week

Project 3 (due May 17th/19th)

Theme: Seeing like a state

Collect a dataset, and transform it into a publication.


8. May 3rd / 5th

Automating video. Or, working with data (class votes).

Readings for next week

9. May 10th / 12th

Bots and running scripts over time.


10. May 17th / 19th

Project 3 crit. Wrap-up discussion.


Some inspiration

Fun and useful Python Libraries

--

Whatever you have to say, leave
The roots on, let them
Dangle

And the dirt

Just to make clear
Where they come from.