- Make a WG-Gesucht scraper library with Golang
- Practice my Golang skills
- Tells you if an URL is an offer, a request, list of offers, or list of requests
- Input: URL
- Output: what type of link it is
- Offer
- Request
- Offers list
- Request list
- Scrap a specific WG-Gesucht.de offer
- Input: URL of an offer
- Output: An "Offer" struct that contains all the relevant data, including but not limited to:
- Which type of offer
- Flatshare
- 1 room flat
- Flats
- Houses
- rent
- what's available in the room/apartment
- info about the people
- Which type of offer
- Error: when the offer is not accessable, or if too many thing broke (i.e. they may have change the site)
- Scrap a specific WG_Gesucht.de request
- Input: URL of an offer
- Output: An "Request" struct that contains all the relevant data
- Scrap offers pages into a list of offers
- Input: URL of that page (list of offers)
- Output: List of "Offer" struct + node to the next/previous page
- Scrap requests pages into a list of requests
- Input: URL of that page (list of requests)
- Output: List of "Request" struct + node to the next/previous page
- Scraper
- Basically wrap around the next 4 things
- Classifier
- Classifier which type of data a url should be able to return
- Basically a wrapper around validator
- Validator
- Validate if an url should be able to return 1 particular kind of data
- Fixer
- For urls that are close enough, this thing fix it so that the injector can work properly
- Injector
- inject data into objects
- Follow licenses of dependencies
- goquery
- Make a crawer-ish thing that uses this to turn HTML into GraphQL API
- User needs to specify entry point
- Crawer should crawl through the whole list, page after page
- User should be able to filter the list
- Check if they have a REST API (i.e. how does the app work)
- Wireshark
- Check what is the min id of ads
- right now I have a pretty ugly workaround in IsAd(). It would be useful if I can just ignore any id with less than 2/3/4/5/6 digits
- Write down the research of wg-gesucht urls