Although there are existing repositories for obtaining Fortune 1000 companies' data, such as LegendL3n's fortune-grabber, it seems that they are no longer being maintained. Besides, Fortune 1000 has changed its API since 2015. Roysoumya has modified LegendL3n's project, but it can only obtained the data for 2017. This repository is basically a modified version of fortune-grabber, that is able to crawl the annual data from 2002 to 2018. A method to obtain fortune 1000 API for the future years, such as 2019 and 2020, is also provided.
This repository needs Python 3.5.
To obtain Fortune 1000 data from 2002 to 2014:
python fortune_1000.py
To obtain Fortune 1000 data from 2005 to 2018:
python fortune_1000_15_18.py
The newest version of fortune 1000 API looks like this:
http://fortune.com/api/v2/list/{year_code}/expand/item/ranking/asc/{start_from}/{num_limit}/
{start_from} is the ranking in the list where we start to crawl.
{num_limit} is the number of companies in a single request, the maximum is 100.
{year_code} can be found in a HTTP request. An example to obtain this parameter with Chrome is shown as follows.
For 2015:
For 2016:
For 2017:
For 2018:
In this way, the corresponding code to each year can be obtained.
Append this dict to adjust to future years.
dict_api_code = {'2015': '1141696', '2016': '1666518', '2017': '2013055', '2018': '2358051'}
Just like fortune-grabber, the crawled data can be found in output folder.