
self.quote.get_data(stock_id, start_time, end_time, "$close") will faild with difference pd.Timestamp

jimrok opened this issue ยท 1 comments

๐Ÿ› Bug Description

I encountered a bug while conducting a backtest with qlib. I'm not sure how to resolve it, so I'll first report where the issue arises: Here, to query data from the quote object, a start_time parameter is required. The start_time is sourced from the trade_calendar during the backtest, and the data from the trade_calendar is read from a list of calendar files. The time construction in the calendar uses pd.Timestamp(x), which presents a problem: the pd.Timestamp type does not include nanosecond information. If this parameter is used for querying, it will result in incorrect data retrieval. Below, I will provide a simplified code snippet to illustrate this issue:

To Reproduce

Steps to reproduce the behavior:
follwing code will be same result in describe the bug.

import qlib
from qlib.constant import REG_CN # region in [REG_CN, REG_US]
from import D
from qlib.backtest.high_performance_ds import BaseQuote, NumpyQuote
import pandas as pd

freq = 'day'
start_time = '2023-01-03'
end_date = '2023-01-30'
provider_uri = 'd:/qlib_data/cn_data'
qlib.init(provider_uri=provider_uri, region=REG_CN)
codes = D.instruments()
all_fields = ["$close","$factor","$change","$volume"]
end_time = '2023-08-30'
quote_df = D.features(

quote = NumpyQuote(quote_df,freq)

quote.get_data('000610.SZ', pd.to_datetime('2023-01-04'), pd.to_datetime('2023-01-04'), "$close")

# print the index map, you will find the index object is pd.Timestamp with ns time info. 
# print(qdata['000610.SZ'].loc._indices[0].index_map)
# {numpy.datetime64('2023-01-03T00:00:00.000000000'): 0,
# numpy.datetime64('2023-01-04T00:00:00.000000000'): 1,
# numpy.datetime64('2023-01-05T00:00:00.000000000'): 2,...

# this line will get value.
quote.get_data('000610.SZ', pd.to_datetime('2023-01-04'), pd.to_datetime('2023-01-04'), "$close")

from import Cal
_calendar = Cal.calendar(freq='day', future=True)
print(_calendar[-181]) # Timestamp('2023-01-04 00:00:00')

# this line will get None.
stime = _calendar[-181]
quote.get_data('000610.SZ', stime,stime, "$close")

Expected Behavior

Firstly, qlib supports the smallest frequency at the minute level, so there is no need to concern ourselves with whether pd.Timestamp includes nanosecond information. However, when constructing the NumpyQuote, it retains the nanosecond information in the index, which leads to inconsistency with the time generated by the calendar. It is hoped that both sides will use a unified method for time conversion when handling pd.Timestamp.


Note: User could run cd scripts && python all under project directory to get system information
and paste them here directly.

  • Qlib version: 0.93
  • Python version: 3.9.13
  • OS (Windows, Linux, MacOS): Windows
  • Commit number (optional, please provide it if you are using the dev version):

Additional Notes

The operation Cal.calendar(freq='day', future=True) yields a List[pd.Timestamp], aligning well with the output format of pd.to_datetime() which produces a pd.Timestamp. Consequently, there's no discrepancy between the following two code snippets. Through personal testing, both methods successfully retrieve identical values:

# Method 1: Utilizing pd.to_datetime for date conversion
data1 = quote.get_data('000610.SZ', pd.to_datetime('2023-01-04'), pd.to_datetime('2023-01-04'), "$close")

# Method 2: Leveraging the calendar list for date specification
data2 = quote.get_data('000610.SZ', _calendar[-181], _calendar[-181], "$close")

numpy.datetime64, utilized within the context of NumpyQuote, is the direct output of invoking pd.Timestamp.to_numpy(). Hence, current time formats are in consistent.