/scribd-downloader-3

A small python script that downloads PDF from a scribd url.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Scribd Downloader 3

/!\ Important: **The script is not working anymore.** I don’t have time to maintain this script, and scribd’s interface changes quite often and I can’t update the script every month. I saw that now some alternative exists, so I don’t think I’ll continue to maintain it, but if someone wants to send some PR, feel free! The current code is not complicated to modify ;)

This script is a very short python script whose aim is to download scribd document into a PDF file.

Installation

Depending on what you need, you have several ways to install this script. Or you can use the online docker image (slower, but you are sure to have the good firefox/selenium/geckodriver version), or you can just install the python deps yourself and run it.

Method 1: classic install with pip

To use this script you first need to make sure that firefox, python3 and the python libraries selenium and fpdf are installed. Note that it may be better to setup all of these library inside a virtualenv to avoid version clash.

On debian-like systems, you can proceed using something like this:

sudo apt install firefox python3 python3-pip
sudo pip3 install selenium
sudo pip3 install fpdf
sudo pip3 install Pillow

And if you prefer the virtualenv version (make sure you are in an empty folder):

sudo apt install firefox python3 python3-pip
pip3 install --user virtualenv
pip3 install --upgrade virtualenv
virtualenv -p python3 venv
source venv/bin/activate
pip3 install selenium
pip3 install fpdf
pip3 install Pillow

Then, download this script :

git clone https://github.com/tobiasBora/scribd-downloader-3.git

Make sure it’s executable

chmod +x scribd_downloader_3.py

as well as the last driver geckodriver available at https://github.com/mozilla/geckodriver/releases (make sure it’s the version corresponding to your hardware):

wget https://github.com/mozilla/geckodriver/releases/download/v0.19.0/geckodriver-v0.19.0-linux64.tar.gz
tar zxvf geckodriver-v0.19.0-linux64.tar.gz 

Great, you can now use the script !

Method 2: docker (only on 64bit systems)

First, make sure you have a recent docker installed (the docker in the repository are outdated most of the time). Then, just run:

sudo docker run -it --shm-size 2g -v $(pwd):/host -w /host tobiasbora/scribd-downloader:18.01 bash

It will download the online docker image, mount your local folder in /host, and run a bash in this folder (the --shm-size is very important if you don’t want firefox to crash). Then, you can simply run this command (don’t forget the xvfb-run):

xvfb-run scribd_downloader_3.py <your url> <your pdf.pdf> 

Ex:

xvfb-run scribd_downloader_3.py "https://www.scribd.com/doc/63942746/chopin-nocturne-n-20-partition" out.pdf

Features

  • convert into a pdf file an online scribd document
  • deal with blured pages that would require an account
  • deal with document with different page size

Usage

To use the script it’s pretty easy. Here is the general usage:

./scribd_downloader_3.py -p <PATH GECKODRIVER> <URL> <PDF OUTPUT>

(NB : if geckodriver is in your global path, you can remove the flag p)

For example if geckodriver is in the same folder as the script, you can run something like:

./scribd_downloader_3.py -p . "https://www.scribd.com/document/31698781/Constitution-of-the-Mexican-Mafia-in-Texas" out.pdf

If you are on a headless server, or if the firefox window that is open bother you, you may want to install xvfb and then run the same command prefixed with xvfb-run:

xvfb-run ./scribd_downloader_3.py -p <PATH GECKODRIVER> <URL> <PDF OUTPUT>

the script will now run blindly !

FAQ

It’s a bit slow/quick, can I increase/decrease the speed?

By default, this script waits 1s between each “screenshot” to be sure the page is loaded. If you have a slow connection, this may be too short, creating some blank pages, and if you have a good connection this may be to big, and you may want to decrease this time to download the page quicker.

For example if you want to wait 0.2 seconds (I even tried 0s on a 77 pages document and it worked without any trouble) between each screenshot, just use the option “-w 0.2”:

Example:

./scribd_downloader_3.py -p . https://www.scribd.com/document/359303091/A-Repository-of-Unix-History-and-Evolution unix_history.pdf -w 0.2

Troubleshootings

I spent only a few hours to make this script, so it’s not supposed to be perfect, but for what I tried it works pretty nicely. However, because scribd is changing often, it may not be working, so if you have any trouble, please fill an issue here or edit the code (it should not be too complicated, the code is pretty straight forward and simple) and do a pull request.