Handling errors in process_item
botzill opened this issue · 4 comments
Hi.
This module is great, good job!
I'm wandering how to properly handle errors in the process_item
callback? For example I want to silently log duplicate entries.
Thx.
Hi, @botzill ,
Duplicate is always a problem/exception when you use MongoDB in pipelines.
In my view, this exception is a problem of your project, not about the pipeline - that is the reason why I did not include any exception treatment in my repo.
In this certain case, I always write the try statement for each project - you can mention it in this setting: MONGODB_PROCESS_ITEM
.
For example, you put your process_item
method in a path scrapy_project.pipeline.process_item
, then it will be:
MONGODB_PROCESS_ITEM = 'scrapy_project.pipeline.process_item'
And please read my code carefully and make sure your args/kwargs are identical to mine:
Note: This middleware is still under development, some settings could be changed in the future!
Thx @grammy-jiang !
Yes, that's how I currently did it, implemented that method. I just wandered about handling errors using some twisted methods like, addErrback
or smth like that. Is it OK to handle like this?
try:
yield pipeline.coll.insert_one(item)
spider.logger.info("[%s] item inserted.", item['_d'])
except Exception as e:
spider.logger.info("[%s] item already exists, skip it.", item['_id'])
Hi, @botzill
Your code is fine with me, and if I were you, I will:
- use the pipeline's logger, not spider's
- the log level would be
debug
, notinfo
- save the return of
insert_one
, even it may not be used anymore - put the succeed log under the statement
else
instead oftry
For your another question, I have never thought about it before. But I realize there is a code example from scrapy documentation which could help - in the section Take screenshot of item (Item Pipeline — Scrapy 1.5.0 documentation). The method process_item
can return a deferred object!
Maybe you can try it and find something interesting! And please let me know!
Thx @grammy-jiang for tips, yes you points make sense. I'm not really experienced with twisted so, I need to check in details about this deferrer and right now not really knowing what can be done with a returned deferrer from process_item
. Will check more details about this.
Thx a lot.