Equity Scrap

Equity Scrap is a project which main goal is to get data from different sources to use as input for equity valuation process.

This first version will start with the full process of scrap and store the rental car prices from Movida (MOVI3), a brazilian equity listed on B3.

You can expect for further versions new sources (via APIs and Web Scraping).

For analytics, you can Check my FinStrats module.

Examples

Movida

We have two options to get prices from Movida.

Massive Scrap

When you want to scrap a range of dates/places/times, this will be your best way to do. To tune your parameters, go to movida_mass_request.yaml and replace the values from the variables:

date_range (int): the rolling number of days to be retrieved.
start_intervals (int): n-days from now to start the scrap
end_intervals (int): n-days from start_intervals
time (str):
places (str): Movida Stores'codes. Codes are available in json or csv

Let's try an example:

date_range: 10  # Will look 10 periods forward the start_intervals

date_interval:
  start_intervals:
    - 1  # Start consulting 1 day from now
  end_intervals:
    - 10 # Will create a combination 1-day/11-day, 2-day/12-day, and so on...
    - 15 # Same but with 15 days interval
    - 20 # Same but with 20 days interval

time:
  - "08:00"  # Pick up and delivery time

places:
  - "CGH"  # Congonhas Airport
  - "GRU"  # Guarulhos Airport

Now, we'll run movida.py. Just to illustrate, the last lines of the code:

if __name__ == '__main__':
    params_ = get_mass_params()
    df_list = set_up_threads(params_)
    df = pd.concat(df_list)
    # Connecting to sql
    conn = sqlite3.connect('db/data.db3')
    # Exporting to sql
    df.to_sql('movida', con=conn, if_exists='append', index=False)
    # Closing connection
    conn.close()
    end = timer()
    print(timedelta(seconds=end - start))

The code above will read the yaml file with the inputs, will access movida's website (4 threads at time) and will store the output appending the values in data.db3, specifically on movida table.

The log will be printed in the screen, but it could be also checked in data.log. The snipped is an example of a logging output:

28/10/2021 12:25:19 - __main__ - INFO - [CLEANING HTML] Cleaning html object and creating Data Frame Place:CGH, Start:06/11/2021 - 08:00, End:26/11/2021 - 08:00, 
28/10/2021 12:25:19 - __main__ - INFO - [CLEANING PASSED] Cleaning Passed Place:CGH, Start:06/11/2021 - 08:00, End:26/11/2021 - 08:00, 
28/10/2021 12:25:19 - __main__ - INFO - [CREATING DATAFRAME] Cleaning Passed Place:CGH, Start:06/11/2021 - 08:00, End:26/11/2021 - 08:00, 
28/10/2021 12:25:19 - __main__ - INFO - [DATAFRAME PASSED] Cleaning Passed Place:CGH, Start:06/11/2021 - 08:00, End:26/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [DRIVER CLOSED] Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [SCRAP PASSED] Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [CLEANING HTML] Cleaning html object and creating Data Frame Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [CLEANING PASSED] Cleaning Passed Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [CREATING DATAFRAME] Cleaning Passed Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00, 
28/10/2021 12:25:23 - __main__ - INFO - [DATAFRAME PASSED] Cleaning Passed Place:GRU, Start:02/11/2021 - 08:00, End:12/11/2021 - 08:00,

This example will generate 60 unique combinations, which took 9 minutes to run entirely.

On section Consulting Data, we'll show the output of this run

Regular Scrap

TBD

Consulting Data

Here we will get the stored values on data.db3 with a simple query.

The example could be found on read_movida.py, the code is pretty simple. It creates a connector to SQLITE database and reads all the values on movida's table

import pandas as pd
import sqlite3

# Creating connection
conn = sqlite3.connect('db/data.db3')
movida = pd.read_sql('SELECT * FROM movida', con=conn)
conn.close()

The output head will be:

	place	extraction_dt	car_group	start_date	end_date	start_time	end_time	cars	prices
0	CGH	2021-10-27 20:21:29.284320	GRUPO AX - MOVIDA ON	01/11/2021	20/11/2021	08:00	08:00	Mobi Like ou similar	211.56
1	CGH	2021-10-27 20:21:29.284320	GRUPO B - ECONÔMICO	01/11/2021	20/11/2021	08:00	08:00	Uno, Onix, Gol, Argo, ou similar	233.43
2	CGH	2021-10-27 20:21:29.284320	GRUPO BS - ECONÔMICO SEDAN	01/11/2021	20/11/2021	08:00	08:00	Logan, Versa, Onix Plus ou Similar	253.13
3	CGH	2021-10-27 20:21:29.284320	GRUPO BX - ECONÔMICO PLUS	01/11/2021	20/11/2021	08:00	08:00	HB20 ou Similar	235.17
4	CGH	2021-10-27 20:21:29.284320	GRUPO C - COMPACTO	01/11/2021	20/11/2021	08:00	08:00	Gol, Sandero, 208, HB20, ou similar	251.68