Tech Talk: Using Web Crawlers
geekygirlsarah opened this issue · 1 comments
Tech Talk Submission
Thanks for offering to give a talk at a Tech Talks meeting! We just need a bit of information from you.
Name
Sarah Withee
What's your talk title?
Using Web Crawlers
(insert more wittier title later)
What's your talk about?
For a PA I was working on, I couldn't get access to some of the sites we were trying to revamp to get the necessary data sets I needed. On a whim, I decided to try to look into web archiving tools, and through that research came upon the idea of using Scrapy (in Python) to scrape the sites for the info I needed.
I wrote a variety of small "spiders" to crawl across the 10 websites from the partner agency and was able to gather massive lists of things we had questions about. I wanted to share how that Scrapy works, but also other ways it can be used in ways that you might not necessarily have thought of. I'll also cover some of the issues that came up and how to overcome them.
How long is your talk?
- Lightning talk (5-10 minutes)
- Short tech talk (20-25 minutes)
- Long tech talk (40-50 minutes)
Do you have any preferred dates for it?
No. Use this as a potential backup in case a talk falls through or there's no talk for that week.
Todo for the MC:
- Update the TTS Guilds calendar entry for this talk to add the talk details.
- Request captioning
- Announce the talk(s) in #tech-talks, #dev, #18f, #18f-dev-announce when date is set
- Announce the talk(s) in same channels on Slack in the morning, and follow up with a reminder just before they're about to begin.
- Upload video and transcript to Google Drive.