The Query Automation in Incremental Search (QAIS) is a data collection system that captures network traffic while an English or Chinese query is typed into an incremental search website.
QAIS consists of a keystroke replayer based on PyAutoGUI, browser automation with Selenium WebDriver, and a packet sniffer driven by Scapy.
-
Download and install the right version of WebDriver for your browser:
- ChromeDriver for Google Chrome
- GeckoDriver for Mozilla Firefox
- WebDriver for Microsoft Edge
-
Use pip with Python 3.x to install the QAIS package:
> pip install https://github.com/ld258166011/QAIS/archive/main.zip
The QAIS python package provides a command qais
to perform the automation and capture process. Use the command qais
to get the help message:
usage: qais [-h] [--chinese] [--bigrams PATH] [--broswer NAME] [--click]
[-i IFACE] [-f FILE]
website query
Query Automation in Incremental Search
positional arguments:
website currently support Google, Tmall, Facebook, Baidu, Yahoo,
Wikipedia, Csdn, Twitch, Bing.
query search query to be entered. Currently support English and
Chinese.
optional arguments:
-h, --help show this help message and exit
--chinese Chinese query entered using Pinyin IME.
--bigrams PATH filename of the bigram timing model (csv format).
--broswer NAME currently support Chrome, Firefox, and Edge, default is
Chrome.
--click click the search box once before entering the query.
-i IFACE the interface to capture the packets on.
-f FILE filename of the captured traffic, default is pkts.pcap.
Use the following command to run QAIS with default optional arguments:
> qais website query
- Search for the English query
apple bee restaurant
in Google:
> qais Google "apple bee restaurant"
- Search for the Chinese query
拼音输入法
in Baidu. Capture the packets oneth0
and save the traffic assample.pcap
.
> qais Baidu 拼音输入法 --chinese -i eth0 -f sample.pcap
- ISTD: the traffic dataset that contains 32.4k samples of English and Chinese queries captured on 9 incremental search websites.