/facebook-data-extraction

Experiences in extracting data from Facebook with these 3 methods: Facebook Graph API, Automation tools, DevTools Console

Primary LanguageJavaScriptMIT LicenseMIT

Summary of Facebook data extraction methods

I. General Comparison

Method Sign-in required Risk when sign-in Risk when not sign-in Difficulty Speed
1️⃣  Personal account Access Token + Graph API Access Token leaked, Rate Limits Not working Medium Fast
2️⃣  Automation tools + IP hiding techniques Depend (*) Checkpoint but less loading more failure Safest Hard Slow (**)
3️⃣  Run JS code directly at the DevTools Console Depend (*) Checkpoint but less loading more failure Can be banned if abused Easy Slow (**)
4️⃣  Mbasic Facebook + IP hiding techniques Depend (*) - - Hard -

(*) Depend on the tasks that you need to sign in to perform. Example: Tasks that need to access private groups or private posts, ...

(**) Depend on how much data you want to extract, the more the number, the more times for scrolling down to load the contents.

II. My general conclusion after many tries with different methods

  • When run at not sign-in state, Facebook usually redirects to the login page or prevent you from loading more comments / replies.
  • No matter which method you use, any fast or irregular activity continuously in sign-in state for a long time can be likely to get blocked at any time.
  • If you want to use at sign-in state, for safety, I recommend create a fake account (you can use a Temporary Email Address to create one) and use it for the extraction.
  • With the sign-in state, there's also another technique to limit the Checkpoint is to sign in with different Cookies.

III. DISCLAIMER

All information provided in this repo and related articles are for educational purposes only. So use at your own risk, I will not guarantee & not be responsible for any situations including:

  • Whether your Facebook account may get Checkpoint due to repeatedly or rapid actions.
  • Problems that may occur or for any abuse of the information or the code provided.
  • Problems about your privacy while using IP hiding techniques or any malicious scripts.