🧹
Quantclean "Make it cleaner, make it leaner"
A program that reformats every financial dataset to US Equity TradeBar (Quantconnect format)
We all faced the problem of reformating or data to a standard. Manual data cleaning is clearly boring and take time. Quantclean is here to help you and to make you life easier as a quant.
Works great with datas from Quandl, Algoseek, Alpha Vantage, yfinance, and many other more...
🍉
Few things you may want to know before getting started -
Even if you don't have an open, close, volume, high, low, date column, quantclean will create a blank column for it. No problem!
-
The dataframe generated will look like this if you have a date and time column (or if both are on the same column):
Date | Open | High | Low | Close | Volume |
---|---|---|---|---|---|
20131001 09:00 | 6448000 | 6448000 | 6448000 | 6448000 | 90 |
- Date - String date "YYYYMMDD HH:MM" in the timezone of the data format.
- Open - Deci-cents Open Price for TradeBar.
- High - Deci-cents High Price for TradeBar.
- Low - Deci-cents Low Price for TradeBar.
- Close - Deci-cents Close Price for TradeBar.
- Volume - Number of shares traded in this TradeBar.
- You can also get something like that if use the
sweeper_dash
function instead ofsweeper
Date | Open | High | Low | Close | Volume |
---|---|---|---|---|---|
2013-10-01 09:00:00 | 6448000 | 6448000 | 6448000 | 6448000 | 90 |
As you can see, the date format is YYYY-MM-DD and no more YYYYMMDD.
- If you just have a date column (e.g : something like YYYY-MM-DD), it will look like this:
Date | Open | High | Low | Close | Volume |
---|---|---|---|---|---|
20131001 | 6448000 | 6448000 | 6448000 | 6448000 | 90 |
You can also use the sweeper_dash
function here.
🚀
How to use it? First, download the quantclean.py file in the folder where you are working
Note : I took this data from Quandl, your dataset doesn't have to look like this one necessarily, quantclean adapts to your dataset as well as possible
from quantclean import sweeper
df = pd.read_csv('AS-N100.csv')
df
_df = sweeper(df)
_df
Output:
Now, you may not be happy of this date colum which is presented in the YYYYMMDD format and maybe be prefer YYYY-MM-DD.
In that case do :
df_dash = sweeper_dash(df)
df_dash
Output:
Contribution
If you have some suggestions or improvements don't hesitate to create an issue or make a pull request. Any help is welcome!