Read this in other languages: Russian, हिन्दी, 中國人
Вit's very simple: your bot massively signs your account in response, people follow you.
-
Clone the repository or download the archive from github or using the following commands on the command line
$ cmd $ git clone https://github.com/BEPb/github_bot $ cd github_bot
-
Create a Python virtual environment.
-
Install all necessary packages for our code to work using the following command:
pip install -r requirements.txt
-
create a project called nameproject
scrapy startproject nameproject
- after which you will have a folder with the name of this project and in it the minimum necessary files and dependencies
scrapy.cfg #deploy configuration file
nameproject/ # project's Python module, you'll import your code from here
__init__.py
items.py # project items definition file
middlewares.py # project middlewares file
pipelines.py # project pipelines file
settings.py # project settings file
spiders/ # a directory where you'll later put your spiders
__init__.py
- go to our project folder
cd nameproject
- create a quotes_spider.py file in the spiders/ folder and write in it who and how we cheat
- launch our crawler
scrapy crawl quotes
- as a result of the execution, two new files were created: quotes-1.html and quotes-2.html with content for the corresponding URLs, as our parse method specifies.
- use shell selectors
scrapy shell 'https://quotes.toscrape.com/page/1/'
- view all 'title' objects using css. The result of executing response.css('title') is similar to list object named SelectorList which is a list of Selector objects that wrap XML/HTML elements and allow you to perform additional queries to refine the selection or retrieve data.
response.css('title')
- and in order to view the list, specify the getall () method
response.css('title::text').getall()
- the same can be done with xpath
response.xpath('//title/text()').get()
- and now take div tags with class quote
response.css("div.quote")
- take only the first element in the list
response.css("div.quote")[0]
- in order to get the class in the tag, use the following command:
quote.css("span.text::text").get()
quote.css("small.author::text").get()
- and this is how we will display the complete list of the class of the div tag
response.css("div.quote").css("div.tags a.tag::text").getall()
- this is how we save the result in json format, where the
-O
command line switch overwrites any existing file;
scrapy crawl quotes -O quotes.json
- and this is how we save the result in csv format
scrapy crawl quotes -O quotes.csv
- The following command writes line by line using the .jl format
scrapy crawl quotes -o quotes.jl