DESCRIPTION This simple parser aimed to parse data from linkedin public profiles. Simple REST API is written using python 2.7.2, bottle 0.10.9 and "swiss army knife " scrapy 0.14.1.
Parsing url for competences and education localhost:8080/doparse?address="public profile url"
Output is in json with the format :
{ "educations" : [{"school": "XXX", "year_last": "YYY", "year_first": "ZZZ"}, ...], "tags" : ["MYSQL Database design","PYTHON",...], "experiences": [{"title": "XXX", "company":"YYY", "year_last":"ZZZ", "year_first":"XXX", "description":"YYY"} ...] "html": "XXX" }
localhost:8080/doparse?address=http://fr.linkedin.com/in/vasylvaskul/
{"educations": [{"school": "Science Po, Coll\u00e8ge des Ing\u00e9nieurs, \u00c9cole des Mines de Paris", "year_last": "2011", "year_first": "2010"}, {"school": "Kyiv National Taras Shevchenko University", "year_last": "2007", "year_first": "2001"}, {"school": "Drohobych Lyceum at Drohobych State 'Ivan Franko' University", "year_last": "2001", "year_first": "1999"}], "tags": []} In case of parsing problems error is returned: ex. {"error": {"message": "HTTP Response 404", "code":X}}
where code X can be one of the following :
1 - network problem 2 - page is not found (404 ) 3 - bad format
Currently the system is stateless, every new request re-parse the the page.
Parse experiences Stock results to db
Use LInked API to parse provider's profile but using access_token of the users ? Cache or not cache the requests.
python main.py to start server