/hebrew_ULMFiT

Universal Language Model Fine-tuning for Text Classification in Hebew, plus bunus

Primary LanguageJupyter Notebook

hebrew_ULMFiT

Universal Language Model Fine-tuning for Text Classification in Hebew, plus bunus

i happy to share the weight to the hebew ULMFiT model.
ULMFiT published by Jeremy Howard and Sebastian Ruder here, and touch in Fast.ai course.
this model is very strong, because he can be easily tranfer to any kind of classification you want.
the hebrew wikipedia dowload from professor Yoav Golberg web.
download hebew models: here

  • wiki training

    1. hebrew_wiki_part_1.ipynb
      trainaing on wiki from scrach.

    2. hebrew_wiki_part_2.ipynb
      remove unwanted chart and retrain.
  • Bonus - amit segal models.

    1. amit_segal_data.ipynb
      collect amit segal data

    2. amit_segal_language_model.ipynb
      train a language model on this corpus.

    3. amit_classification.ipynb
      tranfer the model to make classification between correct and wrong sentence.
      the model achive 0.68 accuracy, which is impressing because the data size & the possible that sentence look
      real (because good predict) and the other side.

  • load pre-train models and word map
    load_models_word_map.ipynb
    after all the model are create, simple load the models, and create word map (with the embedding) from wiki and amit models.