Use scraping result directly in python
so3500 opened this issue · 4 comments
Please kindly help me with my issue.
i'm testing GoogleScraper/Examples/basic.py and successfully got results.
this is my config info.
config = {
'use_own_ip': True,
'keyword': 'how to make blabla',
'search_engines': ['google'],
'num_pages_for_keyword': 1,
'scrape_method': 'selenium',
'sel_browser': 'chrome',
'do_caching': False,
}
and i checked results in shell.
figure 1
And I can only use the data in Python as shown below.
figure 2
I know that the results from figure 1 are stored in the database
I also know that there is a way to add 'output_filename' field to config and save it to a file and then use it to call it.
config = {
'use_own_ip': True,
'keyword': 'how to make blabla',
'search_engines': ['google'],
'num_pages_for_keyword': 1,
'scrape_method': 'selenium',
'sel_browser': 'chrome',
'do_caching': False,
'output_filename': 'output.csv',
}
But,
I want to use the results from figure 1 directly in Python code (title, link etc)
Any Idea??
Give me a few days, and I will show you where in the code you could greb it and print or what ever you need. I did it, but busy next days.
add the below lines in database.py file, set_values_from_parser:
print("PARSED LINK IS: ", link['link'])
print("PARSED TITLE IS: ", link['title'])
print("PARSED SNIPPET IS: ", link['snippet'])
add it after the lines:
Link(
link=link['link'],
snippet=link['snippet'],
title=link['title'],
visible_link=link['visible_link'],
domain=parsed.netloc,
rank=link['rank'],
serp=self,
link_type=key,
)
You can add similarly to print to log:
logger.info("PARSED LINK IS: ")
logger.info(link['link'])
etc.
I don't remember, but if needed for the logger print, add at the top of database.py file the next import (if it is missing I think it is needed to enable logger print):
import logging
logger = logging.getLogger(name)
I guess you can now add it to any specific log file you want, in the same way.
I really appreciate your kind and quick reply.
I will test the answer you have uploaded as soon as possible, upload the results, and close the issue.
search = sqlalchemy_session.query(ScraperSearch).all()[-1]
for serp in search.serps:
for link in serp.links:
print("KW:%s" %(serp.query))
print(link.snippet)
You can change snippet to the element you want to use. :)