/DeepSQA

DeepSQA repo for the paper "DeepSQA: Understanding Sensor Data via Question Answering"

Primary LanguageJupyter Notebook

DeepSQA

Codes for paper: DeepSQA: Understanding Sensor Data via Question Answering(IoTDI 2021)

Raw sensory data "OPPORTUNITY" can be found here

Environment:

  • Python 3.7.7
  • Nvidia RTX Titan
  • Related packages: see "requirements.txt"

Create folders to perform simulation:

mkdir sqa_data trained_models result source_dataset

Files description:

  • source_dataset:

    • opportunity: put raw sensory data "OPPORTUNITY" here.
  • sqa_data_gen:

    • question_family.ipynb: specifies all the question family templates used in generation
    • question_family.json: stores the question family info in a json file.
    • data_extraction.py: functions for extracting/splitting source data; generating scene_list; and visualizing data.
    • dataset_analysis.py class of sqa dataset, used for analyzing the statistics.
    • function_catalog.py: atomic function catalog
    • functional_program.py: function programs associated with all question families
    • question_generation.py: question generation function----given a sence, generate all questions of different families.
    • sqa_gen_engine.py: question generation machine. Main program
    • synonym_change.py: change words in generated question to increase linguistic variations.
    • train_opp_model-single.ipynb: trains DL models on opp dataset natively. Trained model used in Neural Symbolic method.
  • preprocess_data:

    • create folders: mkdir embeddings glove
    • embeddings: folder storing embedding matrix and word index
    • glove: folder storing pretrained glove pre-trained word vectors. Downloaded from here. (glove.6B.zip)
    • embedding.py: create embedding matrix
    • prepare_data.py: get sensory, question, and answer data in matrix form
    • preprocessing.py: main function for converting a SQA dataset in json into processed .npz format.
  • sqa_models:

    • mac_model: codes for DeepSQA-CA model (mac)
    • baselines.py: codes for all other baseline models (prior, prior_q, SAN, conv-lstm, etc)
    • run_baselines.py: training and testing baseline models
    • run_mac.py training and testing mac models
  • result_analysis:

    • utils.py: utility function for getting confusion matrix
    • analyze_result.py: class for analyzing generated .pkl result.
  • create folders: mkdir sqa_data trained_models result

  • sqa_data: stores all the generated SQA data in json format, and aslo preprocessed data in .pkl and .npz format.

  • trained_models: stores all the trained models. Models trained from a single simulation are stored in a single folder. e.g. "opp_sim1".

  • result: stores simulation result in .pkl form. e.g. "opp_sim1.pkl"

Running experiments:

  • sqa_generation.py: scripts for generating the original SQA dataset.
  • data_preprocessing.py: scripts for preprocess the original SQA dataset for training.
  • run_baselines&mac.py: codes for training either mac or baselines models.
  • modify the parameters in the scripts for different simulation settings.

Acknowledgement:

This research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement # W911NF-16-3-0001. The views and conclusions contained in this document arethose of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon.

For more inforamtion, contact Tianwei Xing at: twxing@ucla.edu