python-crawling-slides

"Web Crawling and Metadata with Python" is a little talk I gave at PyData NYC 2012.

It discusses:

web crawling challenges
Python libraries to assist these, with focus on scrapy
semantic web challenges
emerging work being done on Mr. Schemato (http://schema.to)

View the slides online

Slides can be viewed in compiled form at:

http://pixelmonkey.org/pub/python-crawling-slides/

Note that the slides can be controlled as follows:

Advance forward / back with the forward and back keys, or left click / right click of the mouse
Press c to get the "controls", which also allows you to skip slides and switch to outline mode
Outline mode includes some notes not included in the slidedeck, and also allows you to easily copy/paste examples into your own interpreter

I suggest you run through the slides in slide mode, and then review them in outline mode, doing examples from your own interpreter. That's how I tended to do things when I physically gave the presentation. Of course, you can also contact me on Twitter at @amontalenti if you want to see if I might be giving the talk nearby you sometime soon :-)

How this was built

Using Python, of course. It's turtles all the way down.

I wrote the slides using reST, and specifically Docutils support for S5 export. Scripts are included to compile the presentation from the index.rst file and also to allow development of new slides with live recompilation using pyinotify (Linux systems only). See build.sh and monitor.sh for more information.

Parsely/python-crawling-slides

python-crawling-slides

View the slides online

How this was built