This project is just to work/test web scrapping. Also, generate stats.
Process:
- Analyze source, website mixx.io.
- Scrape HTML content.
- Home.
- Other pages.
- Parse HTML with Beautiful Soup.
- Build job search tool.
- Additional work.
Product:
- List of links used.
- List of sites and density.
- Some statistics.
- Length text
- Number of links
Mixx.io is a usable website, easy to use in phones. Structure from the links perspective:
- Home: links to latests podcasts, pagination and other pages.
- Posts: audio and links used in podcast. Also sponsor link.
- Pagination: Each page have about 30 links to posts.
- Other pages.
- Length podcast episode
- Audio analysis
TODO
Now (2020-08-25) sponsors, are under the first blockquote
tag. In general all the notes are under this HTML tag.