1. Get from Internet html pages.

    • Status - open.
  2. Take text from html page.

    • Status - open.
  3. NOTE! Instead of 1 & 2 steps we use already existing data from the site

  4. Transfer from text to normalize text.

    • Status - done.
    • Package/class/method - prepare_train_data.py.
  5. Transfer from normalize text to vector.

    • Bag of words
      • Status - done.
      • Package/class/method - prepare_train_data.py.
    • TF-IDF
      • Status - open
  6. Create neural network by vector

    • lib Keras, softmax
      • Status - done
      • Package/class/method -

TODO:

  1. Decrease bag_of_words_full.npy 12,3 ГБ O_o