what's the purpose of item_copy in pipelines.py?
ChengkaiYang2022 opened this issue · 2 comments
ChengkaiYang2022 commented
hi,In crawler/crawling/pipelines.py in the "LoggingBeforePipeline" class,there is a variable called "item_copy" in function process_item,it just simple turn item into a dict,and delete some keys like "body","links" and after that it does nothing except for logging.
So what is the purpose for the "item_copy"?
And I also have another question here,if the item is not RawResponseItem,maybe like a user defined Item,it will return None,and the following Pipeline will not recive the item,and those will do nothing.
I'm so confused about this function here.?
madisonb commented
- The item copy is just that, it creates a new copy in memory so when we delete the keys it does not modify the original dictionary. Otherwise you risk deleting keys from your original dictionary if you delete them inside the function
# example where a python function modifies the original dictionary
$ python
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {'key':'value'}
>>> def f(d):
... del d['key']
...
>>> f(d)
>>> d
{}
>>>
vs
# non-modifying version
$ python
Python 3.7.3 (default, Mar 27 2019, 09:23:15)
[Clang 10.0.1 (clang-1001.0.46.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> d = {'key':'value2'}
>>> def f(d):
... d2 = dict(d)
... del d2['key']
... print(d2)
...
>>> d
{'key': 'value2'}
>>> f(d)
{}
>>> d
{'key': 'value2'}
>>>
- Correct, you can do whatever item logic you want here, but this project assumes you are utilizing the RawResponseItem class. If you want to make your own modifications for your own items you certainly are welcome to fork this project.
ChengkaiYang2022 commented
modifies
Thanks for reply!I understand know:)