/CJDClassification

learning to predict statutes for a giving case

Primary LanguagePython

CJDClassification

learning to automatically classify Chinese judgment documents according to the industry involved in the factual content.

Running order

extract_data.py --> to extract x and y from original txts.

sentences2words.py --> to cut words in sentences

build_vocab.py --> to build vocabs

data_loader.py --> to transfer original texts to ids. for example: 我爱** -> character ids: 9, 89, 344, 1244

Training and Test

train.py --> to train/test character level CNNs for CJDClassification/THUCnews/IMDb

train_bert.py --> to train/test CWSB-CNN for CJDClassification/THUCnews/IMDb (Please note that this requires the use of bert for token level and sentence level encoding, see project:fine-tuning BERT

  • legal_data

  • SubIMDb (Need to manually divide the file into train.txt, val.txt and test.txt)

  • SubTHUCNews (Need to manually divide the file into train.txt, val.txt and test.txt)