/traverse_facebook_galleries

It is a selenium scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.

Primary LanguagePythonMIT LicenseMIT

Facebook Gallery Downloader

It is a scraper that loads an image from facebook, downloads it in full size and travels through the gallery until it arrives to the same image.

It assumes that:

  • the user provided link is an opened image, not just a gallery.
  • certain style classes appear in these facebook pages
  • the user provides valid login information

Disclaimer: The secretly entered password is only used to pass it to selenium in the login page

Dependencies

  1. Uses Python 3.6
  2. Uses selenium chromedriver. The windows version is included in the repository, see this link for other versions. More information here
  3. Google Chrome must be installed on your computer
  4. pip

Instructions (Windows)

  1. the virtual environment shall be loaded like: virtualenv -p c:\Python36\python.exe .ve
  2. run .ve.bat to init the working directory by typing .ve:
    • the webdriver will be added to the path
    • the virtualenv will be loaded
  3. the requirements should be installed pip install -r requirements.txt
  4. create options.json file from options_template.json. It will be ignored by git.
  5. to leave from the virtualenv type deactivate

Usage

  1. Update the options.json appropriately
  2. Run python traverse_gallery.py
  3. Enter your password in the prompt

After first successful login, you can save the printed cookie value to the options file, set the force-login field to false and then you do not have to provide your data again until your tokens are valid.

Options

Name Description
loginURL Url for the domain for session cookies, also contains the login fields.
start_images Array that holds full URL address (with parameters) of one image from each downloadable gallery.
max_workers Number of parallel image save processes
username The credential email address, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium.
password The credential password, if it is not present it will be asked for. It won't be stored anywhere, only sent to selenium.
cookies The cookies facebook uses to authenticate the users. After login I write the current cookies to the cobsole. I recommend you to fill it with that one, but feel free to get it from another source, but it might not work as intended.
force_login If true the login data will be requested, the password will have to be written in secretly. If false then the provided cookies will be used to authenticate the requests, but when no cookies are provided, then the user will be forced to sign in.
save_image_index If set to true, then save images by their appearance order by adding a number before their names.
destination_dir The destination directory of the result. Should be empty. The string does not need to contain a slash in the end.
unique_galleries If true, add timestamp to the start of the gallery folder name. Without this there is no guarantee that two galleries will be saved with different names.

Result

The galleries will be saved to a directory by the album names. The directories contain the images that are inside them. Also three files:

  • captions.txt: The captions of the images
  • data.json: All of the extracted data for further usage
  • urls.txt: Urls of the saved images in case of corrupted or missing downloads.

In case of errors see the log generated by the program, it might contain information about the errors.

TODO

see todo.md file

History

This project came alive, because I needed to collect the images uploaded to our group, to fill up our galleries in our public site, with the original image captions included. I've searched for already existing solutions, but I haven't found exactly what I was looking for. I've found seeya's Facebook Gallery Downloader. I couldn't make it work, facebook has changed since its latest commits, so I tried to use a generalized solution, closely to what I would do if I had to do it manually. It inspired me to use selenium, I updated the code to python3 and added my own tweaks.