Markel/metacritic-crawler

refactor response.css into function

Closed this issue · 4 comments

Currently this line: response.css('.metascore_w span ::text').extract_first() is repeated throughout the parse function in analyze.py. The only changing value is the parameter for the css. This could easily be moved into a function:

def extract_value(res, css):
    return res.css(css).get().strip()

and then used like:

t = extract_value(response, '.product_title a.hover_none h1 ::text')

I also believe scrapy encourages the use of get over the alias extract_first.

I don't understand completely what you mean, but I'm going to research it. I haven't used scrapy in a long time and this was my first project. I'll see what I can do, but yeah, probably lot of code there is not the appropriate way to do it.

Which part? I can elaborate if you want.

extract_first works the same as get, it's just that in the docs they mentioned people should use get.

If you want to extract only the first matched element, you can call the selector .get() (or its alias .extract_first() commonly used in previous Scrapy versions):

I only mentioned this because there doesn't seem to be a specific version of scrapy that is installed with pip, therefore if a user already has a specific version of scrapy that version would be used, but if they don't then most likely they'll download the latest version. There's no need to change the extract_first into get as they are aliases for each other.

I think that I understand what you mean, and yes it would be interesting to update this to the newest methods, I'll work on it. And I'm also creating a issue for using a requirements.txt (even if it is not as good as node's package.json)

Fun fact, I have look at the release notes of Scrapy 6.0 and this new version is where we have changed from the .extract_first() API to the .get() API. I started this project on Scrapy 1.5.1 so that's why 🤯 the method was outdated. It seems the time for updates!

Due to this, I recommend that once this is updated the #20 updates from 1.5.1 to 1.6.