refactor response.css into function
Closed this issue · 4 comments
Currently this line: response.css('.metascore_w span ::text').extract_first()
is repeated throughout the parse function in analyze.py. The only changing value is the parameter for the css. This could easily be moved into a function:
def extract_value(res, css):
return res.css(css).get().strip()
and then used like:
t = extract_value(response, '.product_title a.hover_none h1 ::text')
I also believe scrapy encourages the use of get
over the alias extract_first
.
I don't understand completely what you mean, but I'm going to research it. I haven't used scrapy in a long time and this was my first project. I'll see what I can do, but yeah, probably lot of code there is not the appropriate way to do it.
Which part? I can elaborate if you want.
extract_first
works the same as get
, it's just that in the docs they mentioned people should use get
.
If you want to extract only the first matched element, you can call the selector .get() (or its alias .extract_first() commonly used in previous Scrapy versions):
I only mentioned this because there doesn't seem to be a specific version of scrapy that is installed with pip, therefore if a user already has a specific version of scrapy that version would be used, but if they don't then most likely they'll download the latest version. There's no need to change the extract_first
into get
as they are aliases for each other.
I think that I understand what you mean, and yes it would be interesting to update this to the newest methods, I'll work on it. And I'm also creating a issue for using a requirements.txt (even if it is not as good as node's package.json)
Fun fact, I have look at the release notes of Scrapy 6.0 and this new version is where we have changed from the .extract_first()
API to the .get()
API. I started this project on Scrapy 1.5.1 so that's why 🤯 the method was outdated. It seems the time for updates!
Due to this, I recommend that once this is updated the #20 updates from 1.5.1 to 1.6.