Scrapes all Packages from the Op1.fun Site. At the time is wrote this script, there was no API so i used the google chrome headless mode to download the files.
The script scans each page on the packages page for the package links. In the next step the collected links will be downloaded.
The google chrome headless modes makes it very easy with the login into the system and downloading files.
- install python
- install google chrome
- install the google chrome ChromeDriver
- install selenium
pip install selenium
Create a account at the op1.fun site and change the settings in the script:
line 15
change YOUR_EMAIL with your emailline 16
change YOUR_PASSWORD with your password
UPDATE_CSV = True
enables CSV capability.
It updates and keep tracks of all the packs available on the op1fun website. And keep track of every pack's download status. This prevents you from downloading redundant files.
It also resumes to undownloaded files if the program ever stops.
The script and the ChromeDriver in the same directory
- run the chrome driver in a other Terminal
./chromedriver
- run
python ./src/op1fun_package_scraper.py
to start downloading
Unpacks all the downloaded zip files into more OP1 friendly folder system
Assign folder path that contains all the zip files and also your destination path
line 6
ZIP_FILES_PATH = "/Users/path/to/the/zip/folders/...."line 7
UNPACK_TO = "/Users/paths/to/unpack/folder/...."
run python unpacker.py
to start unpacking
prompt and enter Y
to confirm and continue
Op1.Fun hosts its files on AWS, the script waits after each download a bit to not trigger a the spam/ddos system from AWS. After 12 downloads there is an other delay too. That helps to prevent access errors. For a complete download of all packages (140 pages 03.06.2019) you will need two days with my delay settings. May you can decrese the wait time in a public network.
The following regex statements are used to find the links in the page:
-
\?page\=(\d)*\">Last"
to find the last page index (line 55) -
<a class=\"pack-name parent-link\" href=\"(.)*\">(.)*<\/a
to get on each packs page, the link to the packs (line 93) -
elem = driver.find_element_by_class_name('download').click()
find the download button on a pack site