Facebook Gallery Downloader

It is a scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.

It assumes that:

the user provided link is an opened image, not just a gallery.
certain style classes appear in these facebook pages
the user provides valid login information

Disclaimer: The secretly entered password is only used to pass it to selenium in the login page

Dependencies

Uses Python 3.6
Uses selenium chromedriver. The windows version is included in the repository, see this link for other versions. More information here
Google Chrome must be installed on your computer
pip

Instructions (Windows)

the virtual environment shall be loaded like: virtualenv -p c:\Python36\python.exe .ve
run .ve.bat to init the working directory by typing .ve:
- the webdriver will be added to the path
- the virtualenv will be loaded
the requirements should be installed pip install -r requirements.txt
create options.json file from options_template.json. It will be ignored by git.
to leave from the virtualenv type deactivate

Usage

Update the options.json appropriately
Run python traverse_gallery.py
Enter your password in the prompt

After first successful login, you can save the printed cookie value to the options file, set the force-login field to false and then you do not have to provide your data again until your tokens are valid.

Options

Name	Description
loginURL	Url for the domain for session cookies, also contains the login fields.
start_images	Array that holds full URL address (with parameters) of one image from each downloadable gallery.
max_workers	Number of parallel image save processes
username	The credential email address, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium.
password	The credential password, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium.
cookies	The cookies facebook uses to authenticate the users. After login I write the current cookies to the cobsole. I recommend you to fill it with that one, but feel free to get it from another source, but it might not work as intended.
force_login	If `true` the login data will be requested, the password will have to be written in secretly. If `false` then the provided cookies will be used to authenticate the requests, but when no cookies are provided, then the user will be forced to sign in.
save_image_index	If set to `true`, then save images by their appearance order by adding a number before their names.
destination_dir	The destination directory of the result. Should be empty. The string does not need to contain a slash in the end.
unique_galleries	If `true`, add timestamp to the start of the gallery folder name. Without this there is no guarantee that two galleries will be saved with different names.

Result

The galleries will be saved to a directory by the album names. The directories contain the images that are inside them. Also three files:

captions.txt: The captions of the images
data.json: All of the extracted data for further usage
urls.txt: Urls of the saved images in case of corrupted or missing downloads.

In case of errors see the log generated by the program, it might contain information about the errors.

TODO

see todo.md file

History

This project came alive, because I needed to collect the images uploaded to our group, to fill up our galleries in our public site, with the original image captions included. I've searched for already existing solutions, but I haven't found exactly what I was looking for. I've found seeya's Facebook Gallery Downloader. I couldn't make it work, facebook has changed since its latest commits, so I tried to use a generalized solution, closely to what I would do if I had to do it manually. It inspired me to use selenium, I updated the code to python3 and added my own tweaks.

lynxrv21/traverse_facebook_galleries