USAGE:
./bin/main run <INPUT> <SYMBOL> <TIME>
INPUT:
- file path (i.e.
./*.pcap
)
SYMBOL:
- max length 8 character symbol (i.e.
AAPL
) "Quoted security represented in Nasdaq Integrated symbology"
TIME:
%H:%M:%S.%N
(i.e. HOUR:MINUTE:SECOND.NANOS)
Download a DEEP file from Market Data | IEX.
I personally used:
Expand from .gz
to .pcap
file in order to use with this tool.
HEADS UP: The filename must be the exact same format as what comes from IEX:
data_feeds_20210712_20210712_IEXTP1_DEEP1.0.pcap
$ ./bin/main ./data/data_feeds_20210712_20210712_IEXTP1_DEEP1.0.pcap NET 9:31:7.398847
$ cargo run ./data/data_feeds_20210712_20210712_IEXTP1_DEEP1.0.pcap NET 9:31:7.398847
Optionally, you may add ENVS for debug logging such as RUST_LOG=info RUST_BACKTRACE=1
.
IEX makes depth of book packet captures available for free from their website (https://iextrading.com/trading/market-data/). Make a program which will read this depth of book data, and use it to reconstruct full order books for any listed stock. Use the program to output the top of book quote for any stock at any point in time. For the purpose of this exercise, you can just use the sample data available from the IEX website.
Let's parse this out:
- IEX DEEP files is what we want
- we don't care about trades only price level updates
- "full order books for any listed stock"
- "output top of book quote for any stock at any point in time"
Two deliverables here?
IEX Transport Specification.pdf
See page 5 of 15.
field, offset, length, type, description Version, 0, 1, Byte, 1 (0x1) Version of Transport specification (Reserved), 1, 1, N/A, Reserved byte Message Protocol ID, 2, 2, Short, Unique identifier of the higher-layer protocol Channel ID, 4, 4, Integer, Identifies the stream of bytes/sequenced messages Session ID, 8, 4, Integer, Identifies the session Payload Length, 12, 2, Short, Byte length of the payload Message Count, 14, 2, Short, Number of messages in the payload Stream Offset, 16, 8, Long, Byte offset of the data stream First Message Sequence Number, 24, 8, Long, Sequence of the first message in the segment Send Time, 32, 8, Timestamp, Send time of segment
See page 35 of 44.
Price Level Update Messages in a Single Segment:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
| Transport Header | B 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 28-31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 32-35
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transport Header | B 36-39
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Length | Message Type | (Event Flags) | B 40-43
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp | B 44-47
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timestamp | B 48-51
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Symbol | B 52-55
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Symbol | B 56-59
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size | B 60-63
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Price | B 64-67
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Price | B 68-71
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
HashMap(no wait, I need min/max/sorted keys) BTreeMap- One BTreeMap per symbol, per side
Symbol: String -> {
Buys: BTreeMap<Price, Size>,
Sells: BTreeMap<Price, Size>,
}
- Testing purposes, let's first focus on a single symbol (my favorite stock $NET)
- new packet -> parse messages -> check symbol -> add/remove to/from maps
- while loop until lte timestamp
Advantages:
- quick dev time
- yay for data structure usage *smile*
Disadvantages:
- slow
- need to start from top of pcap file each time
Let's review our goals:
- "full order books for any listed stock"
- "output top of book quote for any stock at any point in time"
Well?
- Yes, but slow
- Yes, but slow
I believe Polygon does not offer book data for stocks but for crypto:
- saves aggregates
- candles/bars
- snapshot book real-time
TOPS is the max bid and lowest ask at a given timetamp and looks like it's atomic based on price level action. Meanwhile Polygon's lowest time window is 1 minute for crypto.
Only other choice is coming up with a crazy cluster mem/disk DB with a finite/consistent latency and a warning for traders. I guess this tool could be used to check how your trades stacked up relative to the book and your brokerage/clearing house partnership.
So I now understand why TOPS is more or less built this way... My first leap was for a BTreeMap on each side which would mean 2 BTreeMaps per each symbol. This would for sure need redundancy and I'm curious what IEX uses as a architecture/data structure/etc.
If a data provider would want to offer this as a feature historically it would make sense to store individual messages/ticks inside a disk data structure and query individual symbols. This disk storage can then have an import side and a query side.
The query side is where this gets difficult. Because it implies always needing a window and never know the exact top of book.
My idea of query side:
- in parallel:
- 1 query for buy searching backwards by timestamp until window size for valid price/size
- 1 query for sell side searching backwards by timestamp until window size for valid price/size
- return top of book after parallel queries finish
- optionally add depth/level for this crawler to return after top 3 valid
for each price status change:
save (symbol, min, max) to disk/db/store by timestamp
This would be inline with TOPS but would require a time series DB/disk/mem solution.
I checked my work on an early intraday "TOP of book" quote_update from TOPS using https://pypi.org/project/iex/.
{
'type': 'quote_update',
'flags': 0,
'timestamp': datetime.datetime(2021, 7, 12, 13, 31, 7, 398847, tzinfo=datetime.timezone.utc),
'symbol': b'NET',
'bid_size': 300,
'bid_price': Decimal('111.2'),
'ask_size': 100,
'ask_price': Decimal('116.87'),
}
- iex · PyPI
- Order Book Definition
- timpalpant/go-iex: A Go library for accessing the IEX Developer API.
- market microstructure - What is an efficient data structure to model order book? - Quantitative Finance Stack Exchange
- How to Build a Fast Limit Order Book « WK's High Frequency Trading Blog
- Show HN: A first project in Rust – in-memory order book | Hacker News