/QAIS

Query Automation in Incremental Search

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Query Automation in Incremental Search

The Query Automation in Incremental Search (QAIS) is a data collection system that captures network traffic while an English or Chinese query is typed into an incremental search website.

QAIS consists of a keystroke replayer based on PyAutoGUI, browser automation with Selenium WebDriver, and a packet sniffer driven by Scapy.

Installation

  1. Download and install the right version of WebDriver for your browser:

  2. Use pip with Python 3.x to install the QAIS package:

> pip install https://github.com/ld258166011/QAIS/archive/main.zip

Usage

The QAIS python package provides a command qais to perform the automation and capture process. Use the command qais to get the help message:

usage: qais [-h] [--chinese] [--bigrams PATH] [--broswer NAME] [--click]
            [-i IFACE] [-f FILE]
            website query

Query Automation in Incremental Search

positional arguments:
  website         currently support Google, Tmall, Facebook, Baidu, Yahoo,
                  Wikipedia, Csdn, Twitch, Bing.
  query           search query to be entered. Currently support English and
                  Chinese.

optional arguments:
  -h, --help      show this help message and exit
  --chinese       Chinese query entered using Pinyin IME.
  --bigrams PATH  filename of the bigram timing model (csv format).
  --broswer NAME  currently support Chrome, Firefox, and Edge, default is
                  Chrome.
  --click         click the search box once before entering the query.
  -i IFACE        the interface to capture the packets on.
  -f FILE         filename of the captured traffic, default is pkts.pcap.

Use the following command to run QAIS with default optional arguments:

> qais website query

Examples

  1. Search for the English query apple bee restaurant in Google:
> qais Google "apple bee restaurant"
  1. Search for the Chinese query 拼音输入法 in Baidu. Capture the packets on eth0 and save the traffic as sample.pcap.
> qais Baidu 拼音输入法 --chinese -i eth0 -f sample.pcap

Related repository

  • ISTD: the traffic dataset that contains 32.4k samples of English and Chinese queries captured on 9 incremental search websites.