Idea for Rust workshop - Parser app with content from web

Question

Idea for Rust workshop - Parser app with content from web

hannelita opened this issue 8 years ago · 12 comments

Run the Rust meetup in Sao Paulo on Aug 3rd
Get people's feedback
write a list of topics to be covered in the workshop
Describe how the tasks and concepts will be presented and taught
Describe an example app
Propose the app that will be built by the attendees

Possibly due to Aug 10

Answer 1 · 2016-08-09T14:35:53.000Z

Good news - we ran the meetup and got some feedback from the audience. I proposed some suggestions here: http://slides.com/hannelitavante-hannelita/rust-html-parsing and did some code here (check through the different branches) https://github.com/galois1/demo . The code describes a possible alternative to guide the course of the workshop.
It was nice, we collected plenty feedback.

Answer 2 · 2016-08-09T14:36:24.000Z

(It was much better than I expected!)

Answer 3 · 2016-08-09T14:59:56.000Z

Topics to cover:

Explain some use cases for parsers
Explain the basics of the structure of a web page - HTML and CSS crash course (tags, classes, etc)
Explain the logic to collect these elements from a web page
How can we establish a conversation with a web page to collect this data? Explain the basics of HTTP and GET.
Start a sample code with Rust
- Question: Should we explain the structure of a Rust project in this workshop? if yes, we can start with Cargo (assuming people will have Rust on their machines), hello world, project dependencies and packing. Explain main.rs file format and introduce the idea of functions
Explain that it would be a lot of work to build an entire tool to handle HTTP protocol - and then introduce Hyper (or any other lib to handle that). Build the basics for a client. This step would be similar to https://github.com/galois1/demo/blob/master/src/main.rs
Explain extern + imports. Type the code without declaring the use, intentionally make it break and then explain we need to invoke hyper (or any other lib).
Explicitly select a page to scrape - can be any page, maybe rust docs? I gave an example with amazon, but it may not be interesting to all the participants. Maybe we should use Mozilla content to scrape.
Scrap a page with Hyper (or any other lib). Get the full HTML. Print it, then save it in a file. We can explain Strings and I/O here. Somethings like this: https://github.com/galois1/demo/blob/improv1/src/main.rs
Show that handling a huge HTML file to collect specific pieces of data would be a lot of work. Introduce some lib to handle that, such as select (or any other lib).
Add select on Cargo.toml and import it.
Suggested by the audience - write a verbose, non-inline/chained code, using select to transform the scraped page.
Write a bunch os distinct functions, a Struct, an Impl and show a more robust or more organised code example, such as in https://github.com/galois1/demo/blob/improv2/src/main.rs

Answer 4 · 2016-08-09T15:51:06.000Z

explain some use cases for parsers ... I think that is an interesting point: to make a compelling 'story' because that helps people to follow the more abstract parts (which have to be there as well).

Answer 5 · 2016-08-09T15:53:13.000Z

I have started on the extern crate / dependency thing for graphics already but only on the surface; as much as I had to get the prerequisite graphics example working.

Answer 6 · 2016-08-09T15:56:58.000Z

How much IT/Web/Coding experience did your audience have?

Answer 7 · 2016-08-09T15:57:58.000Z

@broesamle everyone was a software developer, but I mentioned we were willing to aim ppl without experience

Answer 8 · 2016-08-09T16:05:09.000Z

Rotating @hannelita's image :)

Answer 9 · 2016-08-09T16:05:27.000Z

Extra suggestions by the audience:

Rust fmt
Explain how to use docs
Make sure the place ahs internet connection

Answer 10 · 2016-08-11T01:41:48.000Z

/cc @brson, given we've chatted quickly about it on #rust-community

@hannelita I've started to draft out some outline for the slide content given the example code and the notes on this PR.

I've put it on this file so GitHub could render it nicely:
https://github.com/bltavares/presentations/blob/rust-html/introdution-to-rust-parsers/presentation.org

There is a repository with the code up to the download page here:
https://github.com/bltavares/presentations/tree/rust-html/introdution-to-rust-parsers/listr

Each commit on this directory is one part of the content presented on the slides outline:
https://github.com/bltavares/presentations/commits/rust-html/introdution-to-rust-parsers/listr

Is any of this helpful for the workshop? I could transfer the content and the commit to a separate repo as well, it was just an easy place for me to put it.

Answer 11 · 2016-08-13T15:40:18.000Z

Migrating over from #12:

@Manishearth recommended using kuchiki for the web scraping.

Answer 12 · 2016-08-13T18:52:58.000Z

Thanks @bltavares ! A few of us at the SF meetup are going to try to continue with this material.