The Query Recognition in Incremental Search (QRIS) is a passive attack system that leverages the size and timing information of incremental search packets to infer the user's query.
QRIS exploits the information leakage contributed by four factors at different data layers:
- Exact data sizes exposed by TLS ciphers.
- Static compression lengths of HTTP/2 headers.
- Accurate keystroke timings exposed by AJAX models.
- Chinese query details revealed by Pinyin IME.
QRIS consists of three stages applied in a pipeline architecture:
- State correlation aims to relate keystroke packets emitted by the incremental search website to specific DFA states.
- Ambiguity reduction utilizes query size distinguishability to reduce the scale of the query set.
- Query inference leverages user's typing rhythm to infer queries from the filtered query set.
Currently support English and Chinese queries. Supported websites include Google, Tmall, Facebook, Baidu, Yahoo, Wikipedia, Csdn, Twitch, Bing.
By default, the AOL search dataset is used as English query set, and the THU Open Chinese Lexicon (THUOCL) is used as Chinese query set. The default keystroke timing model is trained on the 136M keystroke dataset.
- Use pip with Python 3.x to install the QRIS package:
> pip install https://github.com/ld258166011/QRIS/archive/main.zip
-
(Optional) Download the preloaded metadata for the AOL and THUOCL query sets. Unzip it into the QRIS python installation directory.
- metadata (312.05 MB, SHA1: F72D2C22E38BCC0BFBBDA94DDD0697E2B9745E05)
The QRIS python package provides a command qris
to infer the entered search query from a pcap
file that contains network traffic of incremental search.
Use the command qris
to get the help message:
usage: qris [-h] [--website NAME] [--chinese] [--queryset PATH]
[--bigrams PATH] [--trident] [--topk K] [--verbose]
pcap
Query Recognition in Incremental Search
positional arguments:
pcap filename of the pcap.
optional arguments:
-h, --help show this help message and exit
--website NAME name of the website. If not specified, try to identify.
--chinese Chinese query entered using Pinyin IME.
--queryset PATH filename of the query set (csv format).
--bigrams PATH filename of the bigram timing model (csv format).
--trident broswer engine is Trident, including broswer IE and old
version of Edge.
--topk K list the top K inferred queries.
--verbose show inference details.
Use the following command to run QRIS with default optional arguments:
> qris [xx].pcap
Some traffic samples can be found in samples
directory. More samples are available from the ISTD traffic dataset.
> qris "apple bee restaurant.pcap" --website bing
laser eye correction
bound and determined
south par accounting
camel toe definition
death and depression
laser for cigarettes
apple bee restaurant
cures for depression
funds for relocation
inner bay restaurant
> qris 左氧氟沙星片.pcap --chinese
Detected website: tmall
北京地坛公园
北京日坛公园
宝岗大道总站
罗望子多糖胶
住房部分产权
左氧氟沙星片
非全日制用工
辛芳鼻炎胶囊
北方凹指招潮
翻动扶摇羊角