/JPAO

Primary LanguagePython

(adjective, concept)-CSK codes and data

File Description

  • crawl_data.py: get Google-Sytactic-Ngrams automatically.
  • clean_data.py: filter module.
  • cluster_data.py: clustering module.
  • concept_data.py: concept generation.

Steps

  • Python packages

    • tensorflow-1.14.0
    • bert4keras-0.9.1
    • Keras-2.3.1
    • For other missing packages, download by pip install XXX
  • Download Google-N-grams corpus

    python crawl_data.py
    • need proxy for linking to Google if you can not connect Google server.
  • Data preprocessing and filtering module

    python clean_data.py
  • Clustering module

    python cluster_data.py
  • Conceptualization and evaluation module

    python concept_data.py
    • before doing so, you should:
      • download BERT model from here , and put it at semantic/bert/models directory.

Data Resources

Information