Use scraping result directly in python

Question

Use scraping result directly in python

so3500 opened this issue 7 years ago · 4 comments

Please kindly help me with my issue.
i'm testing GoogleScraper/Examples/basic.py and successfully got results.

this is my config info.

config = {
    'use_own_ip': True,
    'keyword': 'how to make blabla',
    'search_engines': ['google'],
    'num_pages_for_keyword': 1,
    'scrape_method': 'selenium',
    'sel_browser': 'chrome',
    'do_caching': False,
}

and i checked results in shell.

figure 1

And I can only use the data in Python as shown below.

figure 2

I know that the results from figure 1 are stored in the database
I also know that there is a way to add 'output_filename' field to config and save it to a file and then use it to call it.

config = {
    'use_own_ip': True,
    'keyword': 'how to make blabla',
    'search_engines': ['google'],
    'num_pages_for_keyword': 1,
    'scrape_method': 'selenium',
    'sel_browser': 'chrome',
    'do_caching': False,
    'output_filename': 'output.csv',
}

But,

I want to use the results from figure 1 directly in Python code (title, link etc)

Any Idea??

Answer 1 · 2017-06-06T06:29:57.000Z

Give me a few days, and I will show you where in the code you could greb it and print or what ever you need. I did it, but busy next days.

Answer 2 · 2017-06-06T14:22:35.000Z

add the below lines in database.py file, set_values_from_parser:
print("PARSED LINK IS: ", link['link'])
print("PARSED TITLE IS: ", link['title'])
print("PARSED SNIPPET IS: ", link['snippet'])
add it after the lines:
Link(
link=link['link'],
snippet=link['snippet'],
title=link['title'],
visible_link=link['visible_link'],
domain=parsed.netloc,
rank=link['rank'],
serp=self,
link_type=key,
)
You can add similarly to print to log:
logger.info("PARSED LINK IS: ")
logger.info(link['link'])
etc.
I don't remember, but if needed for the logger print, add at the top of database.py file the next import (if it is missing I think it is needed to enable logger print):
import logging
logger = logging.getLogger(name)
I guess you can now add it to any specific log file you want, in the same way.

Answer 3 · 2017-06-07T04:19:57.000Z

I really appreciate your kind and quick reply.
I will test the answer you have uploaded as soon as possible, upload the results, and close the issue.

Answer 4 · 2018-01-22T08:28:56.000Z

search = sqlalchemy_session.query(ScraperSearch).all()[-1]
for serp in search.serps:
    for link in serp.links:
        print("KW:%s" %(serp.query))
        print(link.snippet)

You can change snippet to the element you want to use. :)