scrapy/itemadapter

is_item can be costly

Opened this issue · 4 comments

See scrapy/itemloaders#50

While that issue can be addressed in a different way, I am thinking that we should make some work to minimize the CPU cost of this function.

I am actually starting to consider making different item types opt-in; i.e. instead of having all supported item types enabled by default, requiring users to enable them for them to work.

I am just thinking out loud, though. I am just worried about the CPU-consumption of this function going forward, seeing as it is already able to cause a 50% additional CPU usage for a process when overused.

I remember a discussion about imports inside functions being expensive. Let's see if #60 helps in that regard.
Additionally, it might be slightly better to use itemadapter.ItemAdapter.is_item instead of itemadapter.is_item in itemloaders, since the latter actually imports the former.

I am actually starting to consider making different item types opt-in; i.e. instead of having all supported item types enabled by default, requiring users to enable them for them to work.

I just realized this is actually possible at the moment, by modifying the ItemAdapter.ADAPTER_CLASSES class attribute, e.g. removing the ones you are sure you don't need and/or adjusting the order to break early if you know the majority of your items are of a certain kind.

I am actually starting to consider making different item types opt-in; i.e. instead of having all supported item types enabled by default, requiring users to enable them for them to work.

I just realized this is actually possible at the moment, by modifying the ItemAdapter.ADAPTER_CLASSES class attribute, e.g. removing the ones you are sure you don't need and/or adjusting the order to break early if you know the majority of your items are of a certain kind.

Indeed, they are currently opt-out. Making them opt-in would mean making ItemAdapter.ADAPTER_CLASSES empty or at least less crowded by default. But it could be argued that the user experience would be worse, not to mention that making such a change in a backward-compatible way could be a pain.

Released v0.5.0 to get advantage of the performance improvements. Let's keep this issue open to investigate about further improvements.