Blurbs are still being retrieved for filtered out jobs
bunsenmurder opened this issue · 4 comments
Description
Currently the scraper is still retrieving blurbs for jobs that have been filtered out by the pre_filter method.
Please include a summary of the issue.
Please include the steps to reproduce.
List any additional libraries that are affected.
Steps to Reproduce
- Run JobFunnel under any query and make sure the results are saved to a directory without a master_list.csv or duplicate_list.csv file.
- Run the scraper again and take the note of the amount unique jobs found by the pre_filter, then count the amount of individual jobs that are being scraped. You should notice that they don't match.
Expected behavior
The scraper should remove jobs identified by the by the pre_filter, and only obtain blurbs for the remaining jobs.
Actual behavior
The scraper retrieves blurbs for all jobs whether they were filtered out or not.
To fix the issue, the order of the creation of the scrape_list and call to the pre_filter method would have to be switched. The screenshot below highlights the issue within the code and the debugger output :
Although this could've of been fixed in a pull request, making this fix would break date_filter called by the pre_filter method in the main JobFunnel class.
Environment
- Build: Master 0a246cb
- Operating system and version: Arch Linux
- [Linux] Desktop Environment and/or Window Manager: Gnome
thank-you for the detailed write-up!
(looks like it's time to do some more thorough code review in the codebase)
ah oops should have done this before I drafted a release just now. Need to fix this and some other behaviour issues and up the sub-rev.
Perfect timing actually, I was gonna make a pull request with some fixes I made.
ah nice! glad to hear it!
Feel free to up the rev to 2.1.9