polygon-io/client-python

Incorrect Forex Price Spikes

Closed this issue · 6 comments

Forex Websocket prices ("CAS.*") and Agg Data have occasional price spikes. These price spikes don't align with TradingView chart or Forex.com.

I found the reason why:
Polygon.io is using exchange bid price instead of trade price to populate websocket and agg data.

Here I am printing prices of CADJPY. The price 112.735 is an outlier that never happened. That price came from a stink bid on the exchange.

Print outs of Aggs and Quotes for the surrounding timestamps:

agg Agg(open=113.7, high=113.7, low=113.7, close=113.7, volume=1, vwap=113.7, timestamp=1715288494000, transactions=1, otc=None) dt 8999 ms
agg Agg(open=112.735, high=112.735, low=112.735, close=112.735, volume=1, vwap=112.735, timestamp=1715288495000, transactions=1, otc=None) dt 7999 ms
agg Agg(open=112.735, high=112.735, low=112.735, close=112.735, volume=1, vwap=112.735, timestamp=1715288497000, transactions=1, otc=None) dt 5999 ms
agg Agg(open=113.53, high=113.53, low=113.53, close=113.53, volume=1, vwap=113.53, timestamp=1715288500000, transactions=1, otc=None) dt 2999 ms
agg Agg(open=113.682, high=113.682, low=113.682, close=113.682, volume=1, vwap=113.682, timestamp=1715288503000, transactions=1, otc=None) dt -1 ms
agg Agg(open=112.735, high=112.735, low=112.735, close=112.735, volume=1, vwap=112.735, timestamp=1715288504000, transactions=1, otc=None) dt -1001 ms
agg Agg(open=113.62, high=113.62, low=113.62, close=113.62, volume=1, vwap=113.62, timestamp=1715288505000, transactions=1, otc=None) dt -2001 ms
agg Agg(open=113.7, high=113.7, low=113.7, close=113.7, volume=1, vwap=113.7, timestamp=1715288509000, transactions=1, otc=None) dt -6001 ms
agg Agg(open=113.63, high=113.63, low=113.63, close=113.63, volume=1, vwap=113.63, timestamp=1715288510000, transactions=1, otc=None) dt -7001 ms

quote Quote(ask_exchange=48, ask_price=113.66, ask_size=None, bid_exchange=48, bid_price=113.62, bid_size=None, conditions=None, indicators=None, participant_timestamp=1715288505000000000, sequence_number=None, sip_timestamp=None, tape=None, trf_timestamp=None)
quote Quote(ask_exchange=48, ask_price=114.358, ask_size=None, bid_exchange=48, bid_price=112.735, bid_size=None, conditions=None, indicators=None, participant_timestamp=1715288504000000000, sequence_number=None, sip_timestamp=None, tape=None, trf_timestamp=None)
quote Quote(ask_exchange=48, ask_price=113.805, ask_size=None, bid_exchange=48, bid_price=113.682, bid_size=None, conditions=None, indicators=None, participant_timestamp=1715288503000000000, sequence_number=None, sip_timestamp=None, tape=None, trf_timestamp=None)
quote Quote(ask_exchange=48, ask_price=113.57, ask_size=None, bid_exchange=48, bid_price=113.53, bid_size=None, conditions=None, indicators=None, participant_timestamp=1715288500000000000, sequence_number=None, sip_timestamp=None, tape=None, trf_timestamp=None)


Attaching images of these crazy spikes when monitoring websocket data:

image

image

image

Possible solutions:

  1. Get actual trade data from data providers and use this to populate websocket and agg data instead of faking it.

  2. If trade data is not available, i recommend averaging the bid and ask price instead of just using the bid price. If the delta between bid and ask is too high, discard the price point. This seems to produce reasonable results.

Without a solution, there seems to be no point to having the ""CAS.*" and "Agg" data when it merely uses the bid price in the existing Quote data. I implemented solution #2 already in my branch if you want to reference it, Solution #1 is the proper and preferred fix tho

Hey @jbonilla-tao. sorry for the delay here. I've pinged the backend team to check this out and I'll let you know. Thanks for the detailed write up and images too. This really helps.

Hi @justinpolygon thanks for responding. I found that filtering quote data when the bid and ask are more than .5% apart filters these weird spikes

Thanks @jbonilla-tao. I was able to confirm with the engineering and data teams that we are indeed building aggs for forex off of BBO (Quote) data. The aggs are generated using the bid side of these quotes. I'll get the documentation updated to make sure this is clear. There is also active research/engineering work happening on our end to look at this. Again, sorry this wasn't more clear and thank you for such a detailed report it really help to track things down.

Thanks @justinpolygon
EDIT: sorry disregard. All quotes were used.

I see the aggs are built from the quotes and do some kind of filtering.
For example, this Agg

{"ticker":"C:USDCAD","queryCount":1,"resultsCount":1,"adjusted":true,"results":[{"v":3,"vw":1.3692,"o":1.36922,"c":1.3692,"h":1.36922,"l":1.36918,"t":1715092830000,"n":3}],"status":"OK","request_id":"96d9cb37dc237a63c1dd601d65ade068","count":1}

Is built from these quotes
{"results":[{"ask_exchange":48,"ask_price":1.3697,"bid_exchange":48,"bid_price":1.3692,"participant_timestamp":1715092830000000000},{"ask_exchange":48,"ask_price":1.36931,"bid_exchange":48,"bid_price":1.36918,"participant_timestamp":1715092830000000000},{"ask_exchange":48,"ask_price":1.36926,"bid_exchange":48,"bid_price":1.36922,"participant_timestamp":1715092830000000000}],"status":"OK","request_id":"7544ca2a6d35f6f6ab277082e5db7090"}

What is the algorithm to filter out quotes? Some kind of statistical analysis? One of the three quotes is not reflect in the agg.
Thanks!

Hey @jbonilla-tao, I think we can close this right. You connected with support on it?

Yes we can close as you mentioned yout team is looking into fixing the outliers