Instructor: Sam Lavigne | splavigne@gmail.com
Assistant Teacher: Ilona Brand
Location: Online
Time: Tuesdays 10am-1pm ET
Office Hours: By appointment
Class Notes: (link to come)
Web scraping is the process of automatically downloading and manipulating web content. It's a common practice in silicon valley, where companies large and small transform open html pages into commodified datasets.
As an alternative, "Scrapism" is the practice of web scraping for artistic, emotional, and critical ends. By combining aspects of data journalism, conceptual art, and hoarding, it offers a methodology to make sense of a world in which everything we do is mediated by internet companies. These companies surveil us, exploit and financialize our experiences, and attempt to vacuum up every trace we leave behind. But in turn they also leave their own traces online, traces which when collected, filtered, and sorted can reveal or even intervene in power relations.
In this class participants will learn how to scrape massive quantities of material from the web with Python, and then use this source material in projects that probe the politics and poetics of the internet. We will cover multiple web scraping techniques, as well as different techniques for manipulating and presenting textual content.
Introductions. Using the terminal. Reading lines.
- The Cut Up Method by William Burroughs
- URGENTCRAFT - Radical Publishing During Crisis by Paul Soulellis
- Intro to the Command Line
- Intro to Python
- Create a work of computationally generated poetry using only command-line tools. These might include
grep
,sort
,tr
,cat
,sed
,fold
,curl
,say
, and others. You can repurpose an existing text, or write one on your own.
Intro to python. Manipulating text. Automating writing.
- A User’s Guide to Détournement
- A Long History of Generated Poetics by Everest Pipkin
- Web Scraping Basics
- Write a python script that combines texts from two or more sources to create a generative poem.
HTML and CSS basics. Web scraping basics. Making big lists. Basic html publications.
- Archives, Records, and Power: The Making of Modern Memory by Terry Cook and Joan Schwartz
- Scraping XHR
Theme: The Language of Power
Brief: Compile a list, or an archive of text. Transform that archive into a zine or similar publication. Your publication can be printed or online. Experiment with how you sort or organize your archive, and pay special attention to how the presentation of your archive affects and manipulates the source material.
Web scraping part 2: JSON and APIs. Fishing for data. Intro to NLP.
Project 1 crit. Using real browsers. Scraping images. Basic image publications.
Theme: The Commodification of Everything
Brief: Create an archive of images. Present the archive as a publication (in the broad sense) that enhances or underlines its content. Think about how the images are arranged and manipulated. For example, should they all be seen at once? In multiples of 10? One at a time? What determines the order of the images? Should the images be annotated? Vandalized? Should the images be modified or processed?
Real browsers. Processing and analyzing images. Turning images into text.
Project 2 crit. Scraping video.
- Face Trace by Maryam Monalisa Gharavi
- Race after Technology (introduction) by Ruha Benjamin
- FFmpeg - The Ultimate Guide
Theme: Seeing like a state
Collect a dataset, and transform it into a publication.
Automating video. Or, working with data (class votes).
Bots and running scripts over time.
Project 3 crit. Wrap-up discussion.
- Jenny Odell
- Everest Pipkin
- Ben Grosser
- Heather Dewey-Hagborg
- Zach Blas
- Angela Washko
- Andrew Badr
- Allison Parrish
- Christian Marclay
- Golan Levin
- Mimi Onuoha
- Matthew Plummer Fernandez
- Neta Bomani
- Josh Begley
- Angie Waller
- Morehshin Allahyari
- Jacqueline Wu
- Barbara Kruger
- Mark Hansen & Ben Rubin
- City Reliquary
- LittleSis
- Aaron Schwartz
- Sophie Calle
- James Bridle
- Penelope Umbrico
- Joana Moll
- MMuseum
- Ramsey Nasser
- Ingrid Burrington
- Strava Hilarity
- Andrew Norman Wilson
- Jil Magid
- Salome Asega
- Library Genesis
- The Markup
- requests - easy http requests
- beautifulsoup - html parsing
- playwright - automating real browsers
- curlconverter - easily convert curl to requests
- scrapy - scraping framework
- moviepy - edit video
- vidpy - edit video (my library)
- videogrep - make supercuts (my library)
- youtube-dl - download videos
- pillow - edit images
- flask - web server
- twython - use the twitter api
- spacy - natural language processing
- envelopes - send email
- opencv - computer vision
- asciimatics - text-based interfaces and animation
- colorama - easy color in the terminal
Whatever you have to say, leave
The roots on, let them
Dangle
And the dirt
Just to make clear
Where they come from.
-Charles Olson