tushare
stock_df = tushare.get_k_data(code='600519', start='2010-01-01', end='2018-05-09',ktype='60')
- to_datetime
stock_df['date'] = pd.to_datetime(stock_df['date'])
date open close high low volume code
0 2017-11-08 14:00 649.31 652.50 654.45 648.10 11824.0 600519
1 2017-11-08 15:00 652.50 650.38 652.50 647.17 9766.0 600519
2 2017-11-09 10:30 648.00 652.23 654.15 644.90 13409.0 600519
3 2017-11-09 11:30 652.00 648.25 652.39 647.55 6765.0 600519
4 2017-11-09 14:00 648.25 647.17 649.10 646.00 5350.0 600519
- set datetime to index
stock_df.set_index('date',inplace=True)
- resample
# resampled by day and set the last figure as this day's value
resampled_df = stock_df.resample('D').last()
resampled_df.dropna(inplace=True)
- ONLY yyyy-mm-dd hh:mm could be resampled
- Because of Index column being set already, do NOT need to stock_df['date'].resample('D').last()
- rolling(window=).func()
resampled_df['MA 5'] = resampled_df['close'].rolling(window=5).mean()
- plot
resampled_df[ ['close','MA 5','MA 30','MA 60'] ].plot()
combine multi-columns to datetime and set to index, rolling
year month day hour season PM_China PM_US
2013 3 5 8 1 166.0 150.0
2013 3 5 9 1 165.0 163.0
2013 3 5 10 1 173.0 172.0
2013 3 5 11 1 182.0 192.0
2013 3 5 12 1 182.0 181.0
- data_df
data_df = pd.read_csv('./csv')
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20248 entries, 1520 to 26278
Data columns (total 7 columns):
year 20248 non-null int64
month 20248 non-null int64
day 20248 non-null int64
hour 20248 non-null int64
season 20248 non-null int64
PM_China 20248 non-null float64
PM_US 20248 non-null float64
dtypes: float64(2), int64(5)
memory usage: 1.2 MB
- data format transform
data_df['hour'] = data_df['hour'].astype('str') + ':00'
data_df['year'] = data_df['year'].astype('str')
data_df['month'] = data_df['month'].astype('str')
data_df['day'] = data_df['day'].astype('str')
year month day hour season PM_China PM_US
2013 3 5 8:00 1 166.0 150.0
2013 3 5 9:00 1 165.0 163.0
2013 3 5 10:00 1 173.0 172.0
2013 3 5 11:00 1 182.0 192.0
2013 3 5 12:00 1 182.0 181.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 20248 entries, 1520 to 26278
Data columns (total 9 columns):
year 20248 non-null object
month 20248 non-null object
day 20248 non-null object
hour 20248 non-null object
season 20248 non-null int64
PM_China 20248 non-null float64
PM_US 20248 non-null float64
dtypes: category(2), float64(2), int64(1), object(4)
memory usage: 1.3+ MB
- combine datetime
data_df['date'] = data_df['year'].str.cat( [ data_df['month'],data_df['day'] ], sep='-')
data_df['timestamp'] = data_df['date'].str.cat(data_df['hour'],sep=' ')
Multi-columns combine: new = A.str.cat( [ B, C ], sep='-' ) => A-B-C
date timestamp
2013-3-5 2013-3-5 8:00
2013-3-5 2013-3-5 9:00
2013-3-5 2013-3-5 10:00
2013-3-5 2013-3-5 11:00
2013-3-5 2013-3-5 12:00
- to_datetime
data_df['timesatmp'] = pd.to_datetime(data_df['timesatmp'])
date timestamp
2013-3-5 2013-03-05 08:00:00
2013-3-5 2013-03-05 09:00:00
2013-3-5 2013-03-05 10:00:00
2013-3-5 2013-03-05 11:00:00
2013-3-5 2013-03-05 12:00:00
- set index
data_df.set_index('timestamp', inplace=True)
index column has become to the 1st column.
year month day hour season PM_China PM_US
timestamp
2013-03-05 08:00:00 2013 3 5 8:00 1 166.0 150.0
2013-03-05 09:00:00 2013 3 5 9:00 1 165.0 163.0
2013-03-05 10:00:00 2013 3 5 10:00 1 173.0 172.0
2013-03-05 11:00:00 2013 3 5 11:00 1 182.0 192.0
2013-03-05 12:00:00 2013 3 5 12:00 1 182.0 181.0
- resample
resampled_df = data_df.resample('D').mean()
all int/float columns will return resampled mean value
season PM_China PM_US
timestamp
2013-03-05 1.0 206.000000 216.937500
2013-03-06 1.0 222.217391 226.043478
2013-03-07 1.0 312.434783 323.826087
2013-03-08 1.0 231.863636 221.909091
2013-03-09 1.0 61.478261 61.695652
- rolling & plot
resampled_df['MA 7'] = resampled_df['PM_China'].rolling(window=7).mean()
resampled_df['MA 30'] = resampled_df['PM_China'].rolling(window=30).mean()
resampled_df[ ['MA 7'], ['MA 30'] ].plot()