/RFMLS-NEU

Primary LanguagePythonMIT LicenseMIT

Contents

Acknowledgement

This repository contains the source code and tests developed under the DARPA Radio Frequency Machine Learning Systems (RFMLS) program contract N00164-18-R-WQ80. All the code released here is unclassified and the Government has unlimited rights to the code.

Citing This Paper

Please cite the following paper if you intend to use this code for your research.

J. Tong, R. Bruno, O. Emmanuel, S. Nasim, W. Zifeng, S.Kunal, G. Andrey, D. Jennifer, C. Kaushik, I. Stratis, "Deep Learning for RF Fingerprinting: A Massive Experimental Study", Internet of Things (IoT) Magazine, 2020.

Environment Setup

Please install the python dependencies found in requirements.txt with:

python install -r requirements.txt

GNURadio is used in preprocessing the dataset, and thus it must be installed. For installation instructions please follow the official GNURadio guide.

Preprocessing a Dataset

Preprocessing amounts to filtering, which entails re-centering the signals to a base band based on their metadata and equalization. Equalization only applies to WiFi signals, and is optional. Preprocessing also generates reference pickle files needed for training and testing a model; hence all data needs to be preprocessed through this script, whether it be WiFi, ADS-B, or a new type of dataset.

Note that if a task contains both WiFi and ADS-B data, both will be preprocessed simutlaneously (i.e., in one execution of the code with --preprocess True evoked).

The following arguments can be specified during preprocessing:

    --train_tsv         path to the train tsv file
    --test_tsv          path to the test tsv fle
    --task              set task name
    --root_wifi         path to the raw wifi signals
    --root_adsb         path to the raw adsb signals
    --out_root_data     output path for permanently storing preprocessed data
    --out_root_list     output path for permanently storing pickled data lists
    --wifi_eq       Specify whether wifi signals need to be equalized or not {True | False}. default: False
    --newtype_process   [New-Type Signal] process a dataset with New-Type Signal or not {True | False}. 
                default: False
    --root_newtype      [New-Type Signal] path to the raw New-Type Signal {True | False}. 
                default: False
    --newtype_filter    [New-Type Signal] filter New-Type Signal or not {True | False}. 
                default: False
    --signal_BW_useful  [New-Type Signal] This argument specifies the bandwidth (measured in Hz) used to transmit the actual
                          information. Most wireless standards reserve portions of the bandwidth to the so-called
                      guard bands which are used to reduce interference across multiple bands. Guard bands are
                      located on the sides of the overall bandwidth, and the variable signal_BW_useful is used to
                      properly extract useful information from the recorder signal, thus removing any interference
                      generated by transmissions occurring at other bands. As an example, for WiFi signals at
                      2.4GHz, signal_BW_useful=16.5MHz
                      default: {16.5e6 | 17e6} depending on signal frequency
    --num_guard_samp    [New-Type Signal] This argument represents the number of guard samples before and after each packet
                              transmission. Each packet transmission is preceded and followed by num_guard_samp samples that
                      do not contain any data transmission.
                      default: 2e6

Preprocessed data are stored under /$out_root_data/$task/{wifi, ADS-B, newtype}/. There is no need to specify the datatype, which will be detected automatically according to .tsv files.

Other generated files are stored under /$out_root_list/$task/{wifi, ADS-B, newtype, mixed}/. This folder will contain five files needed by both training and testing:

  • a file_list.pkl file containing path of all preprocessed example needed by training and testing,
  • a label.pkl file containing a dictionary {path of preprocessed example: device name(e.g., 'wifi_100_crane-gfi_1_dataset-8965')},
  • a device_ids.pkl file containing a dictionary {device name(e.g., 'wifi_100_crane-gfi_1_dataset-8965'): device id(an integer)},
  • a partition.pkl file containing training and testing separation,
  • a stats.pkl file containing computed data statistics needed by training.

Specifically, the folder /$out_root_data/$task/mixed/ contains generated files for all datatypes provided in .tsv files. For example, if there are two datatypes in .tsv files, e.g., wifi and ADS-B, it will generate files for both filtered wifi and raw ADS-B signals; if there are three datatypes in .tsv files, wifi, ADS-B and newtype, it will generate files for filtered wifi, raw ADS-B, and {raw | filtered, depends on $newtype_filter} newtype signals together. Notice that equalized and raw signals are in different domain and cannot be trained/tested together, the folder ~/mixed/ will contain generated files only for filtered wifi no matter argument --wifi_eq is set to True or not.

An Example: Preprocessing a General Task

An example of running the preprocessing would be:

$ python ./preprocessing/main.py \
    --task MyTask \
    --train_tsv /scratch/MyTask.train.tsv \
    --test_tsv /scratch/MyTask.test.tsv \
    --root_wifi /mnt/disk1/wifi/ \
    --root_adsb /mnt/disk2/adsb/ \
    --out_root_data ./data/test \
    --out_root_list ./data/test_list

This prints the following output:

Extracting data...
*************** Extraction .meta/.data according to .tsv ***************
Initialize 40 workers.
Processing tsv: /scratch/MyTask.train.tsv
Processing tsv: /scratch/MyTask.test.tsv
*************** Filtering WiFi signals ***************
There are 50 devices to filter
Processing folder: [...]
*************** Create partitions, labels and device ids for training. Compute stats also.***************
generating files for WiFi, ADS-B and mixed dataset
***************creating labels for dataset:wifi raw_samples***************
Created auxiliary files:
('Number of devices', 50)
('Number of examples', 13650)
Save files to: ./data/test_list/MyTask/wifi/raw_samples
creating partition files for dataset:wifi raw_samples
13650/13650 [00:00<00:00, 149579.36it/s]
compute stats for dataset:wifi raw_samples
10900/10900 [00:25<00:00, 422.02it/s]
***************creating labels for dataset:ADS-B***************
Created auxiliary files:
('Number of devices', 50)
('Number of examples', 13650)
Save files to:./data/test_list/MyTask/ADS-B/
creating partition files for dataset:ADS-B
13650/13650 [00:00<00:00, 155440.33it/s]
('Num Ex in Train', 10900)
('Num Ex in Test', 2750)
compute stats for dataset:ADS-B 
10900/10900 [00:20<00:00, 527.07it/s]
***************creating labels for dataset:mixed***************
Created auxiliary files:
('Number of devices', 100)
('Number of examples', 27300)
Save files to:./data/test_list/MyTask/mixed/
creating partition files for dataset:mixed
27300/27300 [00:00<00:00, 158303.85it/s]
('Num Ex in Train', 21800)
('Num Ex in Test', 5500)
compute stats for dataset:mixed 
21800/21800 [00:46<00:00, 470.44it/s]
Skipping model framework.

Note that the above command mounts folders /scratch, /mnt, and /data, as they are to be used for input arguments and persisting data.

To preprocess data using equalization, an additional --wifi_eq True argument needs to be passed.

An Example: Preprocessing a New Dataset

An example of running the preprocessing for a new dataset with new signal type would be:

$  python ./preprocessing/main.py \
    --task MyTask \
    --train_tsv /scratch/MyTask.train.tsv \
    --test_tsv /scratch/MyTask.test.tsv \
    --root_wifi /mnt/disk1/wifi/ \
    --root_adsb /mnt/disk2/adsb/ \
    --root_newtype /mnt/disk2/adsb/NewType/ \
    --newtype_process True \
    --newtype_filter True \
    --signal_BW_useful 16.5e6 \
    --num_guard_samp 2e-6 \
    --out_root_data ./data/test \
    --out_root_list ./data/test_list \

If the new dataset contains only newtype signals, there is no need to specify --root_wifi and root_adsb. Otherwise, please specify these two arguments accordingly.

The code presumes that the content of the meta/data files follows the syntax of Wifi signals.

Note that arguments --signal_BW_useful and --num_guard_samp need to be set for filtering a new signal type. There is no need to specify these two arguments for Wifi or ADSB signals.

Training and Testing

Running a task consists of a train_phase and a test_phase.

Both training and testing require that data is first pre-processed (so that, beyond filtering and/or equalizing, it is in a format understandable by our neural network), as described in the previous section.

Command line arguments for training and testing include:

    --data_path             path containing training, validation, testing, stats dictionaries.
                                directory used for --out_root_list.
    --data_type             the data type, either 'wifi', 'ADS-B', 'newtype', or 'mixed'. 'mixed' trains the model jointly                                        on all datatype present. default: wifi
    --model_flag            set the model architecture {baseline | resnet1d}. default: baseline
    --exp_name                  create an exp_name folder to save logs, models etc              

The --data_path folder corresponds to the --out_root_list folder where the output from the preprocessing stage was stored. Running tests is a silent execution (it only prints errors in stderr). Trained model weights as well as classification outcomes are stored under /results/$task/$data_type/$exp_name. This folder will contain four files:

  • a json file containing the model structure,
  • an hdf5 file containing the model weights,
  • a config file containing the parameter configuration used during training and testing, and
  • a log.out file that contains a detailed output log, including the final test accuracy. The last line contains the per slice and per example/signal accuracy on the test set.

For a full list of arguments of preprocessing or building a model, you may use --help.

Commands for preprocessing:

Processing usage, optional arguments:
  -h, --help            show this help message and exit
  --task TASK           Specify the task name (default: 1Cv2)
  --multiburst          Specify the task is a multiburst testing task or not.
                        If it is, we assume the corresponding general task has
                        alredy processed. For example, if the task is
                        specified as 1MC, we will look for corresponding
                        processed data, labels of 1Cv2. (default: False)
  --new_device          Specify the task is a novel device testing task.
                        (default: False)
  --train_tsv TRAIN_TSV
                        Specify the path of .tsv for training (default: /scrat
                        ch/RFMLS/RFML_Test_Specs_Delivered_v3/test1/1Cv2.train
                        .tsv)
  --test_tsv TEST_TSV   Specify the path of .tsv for testing (default: /scratc
                        h/RFMLS/RFML_Test_Specs_Delivered_v3/test1/1Cv2.test.t
                        sv)
  --root_wifi ROOT_WIFI
                        Specify the root path of WiFi signals (default:
                        /mnt/rfmls_data/disk1/wifi_sigmf_dataset_gfi_1/)
  --root_adsb ROOT_ADSB
                        Specify the root path of ADS-B signals (default:
                        /mnt/rfmls_data/disk2/adsb_gfi_3_dataset/)
  --out_root_data OUT_ROOT_DATA
                        Specify the root path of preprocessed data (default:
                        ./data/v3)
  --out_root_list OUT_ROOT_LIST
                        Specify the root path of data lists for training
                        (default: ./data/v3_list)
  --wifi_eq             Specify wifi signals need to be equalized or not.
                        (default: False)
  --newtype_process     [New Type Signal]Specify process new type signals or
                        not (default: False)
  --root_newtype ROOT_NEWTYPE
                        [New Type Signal]Specify the root path of new type
                        signals (default: )
  --newtype_filter      [New Type Signal]Specify if new type signals need to
                        be filtered. (default: False)
  --signal_BW_useful SIGNAL_BW_USEFUL
                        [New Type Signal]Specify Band width for new type
                        signal. (default: None)
  --num_guard_samp NUM_GUARD_SAMP
                        [New Type Signal]Specify number of guard samples.
                        (default: 2e-06)
  --time_analysis       Enable to report time preprocessing takes (default:
                        False)

Commands for building, training, and testing a model:

Model usage, optional arguments:
  -h, --help            show this help message and exit
  --exp_name            Experiment name. (default: experiment_1)
  --pickle_files        Path containing pickle files with data information.
                        (default: None)
  --save_path           Path to save experiment weights and logs. (default:
                        None)
  --save_predictions    Disable saving model predictions. (default: True)
  --task                Set experiment task. (default: 1Cv2)
  --equalize            Enable to use equalized WiFi data. (default: False)
  --data_type           Set the data type {wifi | adsb}. (default: wifi)
  --file_type           Set data file format {mat | pickle}. (default: mat)
  --decimated           Enable if the data in the files is decimated.
                        (default: False)
  --val_from_train      If validation not present in partition file, generate
                        one from the training set. (If called, use test set as
                        validation). (default: True)
  -m , --model_flag     Define model architecture {baseline | vgg16 | resnet50
                        | resnet1d}. (default: baseline)
  -ss , --slice_size    Set slice size. (default: 198)
  -d , --devices        Set number of total devices. (default: 50)
  --cnn_stack           [Baseline Model] Set number of cnn layers. (default:
                        5)
  --fc_stack            [Baseline Model] Set number of fc layers. (default: 2)
  --channels            [Baseline Model] Set number of channels of cnn.
                        (default: 128)
  --fc1                 [Baseline Model] Set number of neurons in the first fc
                        layer. (default: 256)
  --fc2                 [Baseline Model] Set number of neurons in the
                        penultimate fc layer. (default: 128)
  --dropout_flag        Enable to use dropout layers. (default: False)
  --batchnorm           Enable to use batch normalization. (default: False)
  --restore_model_from 
                        Path from where to load model structure. (default:
                        None)
  --restore_weight_from 
                        Path from where to load model weights. (default: None)
  --restore_params_from 
                        Path from where to load model parameters. (default:
                        None)
  --load_by_name        Enable to only load weights by name. (default: False)
  --padding             Disable adding zero-padding if examples are smaller
                        than slice size. (default: True)
  --try_concat          Enable if examples are smaller than slice size and
                        using demodulated data, try and concat them. (default:
                        False)
  --preprocessor        Set preprocessor type to use. (default: no)
  --K                   Set batch down sampling factor K. (default: 16)
  --files_per_IO        Set files loaded to memory per IO. (default: 500000)
  --normalize           Specify if you do not want to normalize the data using
                        mean and std in stats files (if stats does not have
                        this info, it is ignored). (default: True)
  --crop                Set to keep first "crop" samples. (default: 0)
  --training_strategy   Set training strategy to use {big | fft | tensor}.
                        (default: big)
  --sampling            Set sampling strategy to use. (default: model)
  --epochs              Set epochs to train. (default: 25)
  -bs , --batch_size    Set batch size. (default: 512)
  --lr                  Set optimizer learning rate. (default: 0.0001)
  --decay               Set optimizer weight decay. (default: 0.0)
  -mg, --multigpu       Enable multiple distributed GPUs. (default: False)
  -ng , --num_gpu       Set number of distributed GPUs if --multigpu enabled.
                        (default: 8)
  --id_gpu              Set GPU ID to use. (default: 0)
  --shrink              Set down sampling factor. (default: 1)
  --early_stopping      Disable early stopping. (default: True)
  --patience            Set number of epochs for early stopping patience.
                        (default: 1)
  --train               Enable to train model. (default: False)
  -t, --test            Enable to test model. (default: False)
  --test_stride         Set stride to use for testing. (default: 16)
  --per_example_strategy 
                        Set the strategy used to compute the per example
                        accuracy {majority | prob_sum | log_prob_sum, all}.
                        (default: prob_sum)
  --get_device_acc      Report and save number of top class candidates for
                        each example. (default: 5)