scrapedin: A JavaScript repository from knowtheory

Scraper for LinkedIn full profile data.
Unlike others scrapers, it's working in 2019 with their new website.

Install via npm package manager: npm i scrapedin

Check your version!

We need to update at every LinkedIn change. Please check if you have the latest version.

Latest release: v1.0.8 (latest 16 jul 2019)

Usage Example:

const scrapedin = require('scrapedin')

const profileScraper = await scrapedin({ email: 'login@mail.com', password: 'pass' })
const profile = await profileScraper('https://www.linkedin.com/in/some-profile/')

Documentation:

scrapedin(options)
- options Object:
  - email: LinkedIn login e-mail (required)
  - password: LinkedIn login password (required)
  - isHeadless: display browser (default false)
  - hasToLog: print logs on stdout (default false)
  - puppeteerArgs: puppeteer launch options Object. It's very useful, you can also pass Chromium parameters at its args property, example: { args: ['--no-sandbox'] } (default undefined)
- returns: Promise of profileScraper function
profileScraper(url, waitTimeMs = 500)
- url string: A LinkedIn profile URL
- waitTimeMs integer: milliseconds to wait page load before scraping
- returns: Promise of profile Object

profile Object:

{
  profile: {
    name, headline, location, summary, connections, followers
  },
  positions:[
    { title, company, description, date1, date2,
      roles: [{ title, description, date1, date2 }]
    }
  ],
  educations: [
    { title, degree, date1, date2 }
  ],
  skills: [
    { title, count }
  ],
  recommendations: [
    { user, text }
  ],
  recommendationsCount: {
    received, given
  },
  recommendationsReceived: [
    { user, text }
  ],
  recommendationsGiven: [
    { user, text }
  ],
  accomplishments: [
   { count, title, items }
  ],
  volunteerExperience: {
    title, experience, location, description, date1, date2
  },
  peopleAlsoViewed: [
    { user, text }
  ]
}

Tips

We already built a crawler to automatically collect multiple profiles, so check it out: scrapedin-linkedin-crawler
Usually in the first run LinkedIn asks for a manual check, to solve that you should:
- set isHeadless to false on scrapedin to solve the manual check in the browser.
- set waitTimeMs with a large number (such as 10000) to you have time to solve the manual check.
After doing the manual check once you can go back with isHeadless and waitTimeMs previous values and start the scraping.

We still don't have a solution for that on remote servers without GUI, if you have any idea please tell us!

Contribution

Feel free to contribute. Just open an issue to discuss something before creating a PR.

License