/LinkedinProfileParser

linkedin public profile pages scrapper

LinkedinProfileParser

DESCRIPTION This simple parser aimed to parse data from linkedin public profiles. Simple REST API is written using python 2.7.2, bottle 0.10.9 and "swiss army knife " scrapy 0.14.1.

API SPECIFICATION

Parsing url for competences and education localhost:8080/doparse?address="public profile url"

Output is in json with the format :

{ "educations" : [{"school": "XXX", "year_last": "YYY", "year_first": "ZZZ"}, ...], "tags" : ["MYSQL Database design","PYTHON",...], "experiences": [{"title": "XXX", "company":"YYY", "year_last":"ZZZ", "year_first":"XXX", "description":"YYY"} ...] "html": "XXX" }

Sample request

localhost:8080/doparse?address=http://fr.linkedin.com/in/vasylvaskul/

Sample output

{"educations": [{"school": "Science Po, Coll\u00e8ge des Ing\u00e9nieurs, \u00c9cole des Mines de Paris", "year_last": "2011", "year_first": "2010"}, {"school": "Kyiv National Taras Shevchenko University", "year_last": "2007", "year_first": "2001"}, {"school": "Drohobych Lyceum at Drohobych State 'Ivan Franko' University", "year_last": "2001", "year_first": "1999"}], "tags": []} In case of parsing problems error is returned: ex. {"error": {"message": "HTTP Response 404", "code":X}}

where code X can be one of the following :

1 - network problem 2 - page is not found (404 ) 3 - bad format

CACHING

Currently the system is stateless, every new request re-parse the the page.

TODO

Parse experiences Stock results to db

OPEN ISSUES

Use LInked API to parse provider's profile but using access_token of the users ? Cache or not cache the requests.

HOWTO RUN

python main.py to start server