Add some built-in save() methods.
elliotgao2 opened this issue · 3 comments
elliotgao2 commented
For examples:
class Post(Item):
id = Css('title')
async def save(self):
super.save(self.results, type='database')
class Post(Item):
id = Css('title')
async def save(self):
super.save(self.results, type='file')
Do you have any suggestions?
c1ay commented
class Post(Item):
id = Css('title')
result_cls = FileResultClass
def __init__(self, *args, **kwargs):
self.save_cls = self.result_cls(*args, **kwargs) # init result cls
pass
async def on_result(self):
"""
Called every result
"""
self.save_cls.save()
class ResultClass:
async def save(self, *args, **kwargs):
raise NotImplementedError
class DataBaseResultClass(ResultClass):
async def save(self, *args, **kwargs):
pass
class FileResultClass(ResultClass):
async def save(self):
pass
I think this will be better on expansibility
elliotgao2 commented
This solution is close to what I want.
Both the DataBaseResultClass and the FileResultClass should have their own config. DataBaseResultClass needs database_url, username, password, database, etc. FileResultClass needs file_path, file_ext, etc.
Any suggestions?
c1ay commented
No good suggestion yet.
It will be More complex if adding a new config file.
Add the config like below will be a little better.
class MySpider(Spider):
start_url = 'https://blog.scrapinghub.com/'
concurrency = 5
parsers = [Parser('https://blog.scrapinghub.com/page/\d+/'),
Parser('https://blog.scrapinghub.com/\d{4}/\d{2}/\d{2}/[a-z0-9\-]+/', Post)]
result_config = {
"file_path": "/data/tmp.data"
}