/receita-tools

A set of tools to allow automated information recovery from the Secretary of the Federal Revenue of Brazil website.

Primary LanguagePythonMIT LicenseMIT

receita-tools

pypi Build Status license

Set of tools to allow automated information recovery from the Secretary of the Federal Revenue of Brazil website. This set of tools will use the receitaws.com.br web service to retrieve information about all Brazilian companies you like.

To install the tool the easiest way is to use pip:

pip install receita-tools

This set of tools will allow you to easily retrieve data from Receita's website. You can get information about multiple companies at once. Those tools also allow you to create a few CSV files to easily import the retrieved data to your system.

The data retriever program works based on a CSV file containing information about the CNPJs it should look for. This file must have at least on column, and the first one should contain the CNPJ of the companies you want to get information.

You can run receita get cnpj.csv to get information from that CSV file. The retrieved data will be saved by default at the data directory in the directory you ran the command. You can change the directory by using the --output option.

With the data saved locally, you can run the receita build command to build the CSV files you need. By default, it will create two CSV files: the companies.csv file that contains general information about each company, and the activities.csv that contains information about the activities of each company.

To get data and save to cnpj_data folder:

receita get list.csv --output cnpj_data

Keep in mind that you can use absolute or relative paths too. You can now run the build command. If you did not used the default directory to save the data, you need to inform it. You can also say the directory where the generated files will be stored.

receita build --input cnpj_data --output results

You can always use the --help option to get help about a command. You can also use it with the subcommands, like receita build --help.

  • #2: Fixed error when handling invalid company data

The performance of the webservice is very limited. This is because Receita's website is very actively blocking access when there is an elevated number of requests from a single IP. We try to run workers on multiple IPs to allow a faster response, but in any case, your code must be prepared to wait for a long time for a response (5mins+). Results are cached, so if you prefer, you can trigger lots of requests, and check their results after some time.

There will be no information about how captcha decoding is made by the web service. The last time this information was available there was changes that broken the service. Basic company information should be available in an easy way so we can have a more transparent business in the country.

Decoding percentages will be available in the future, but they are not really good. If you know a way to achieve +70% of success decoding it, please let me know. There's a tool to download some sample captchas to help any development on that area.