/vacancy-splitter

Разбиение текста вакансии на требования, обязанности и условия

Primary LanguageJava

Vacancy Splitter

Vacancy splitter is library intended to help train and use classifier, which splits vacancy texts into meaningful blocks.
Current implementation uses plain tf-idf vectorizer and linear svm classifier.

Library usage example:

Map<String, VacancyBlock> splitVacancyIntoParts(String vacancyXml) {
    ObjectInputStream englishClassifierStream = ObjectInputStream(...);
    Classifier englishClassifier = (Classifier) englishClassifierStream.readObject();
    
    ObjectInputStream russianClassifierStream = ObjectInputStream(...);
    Classifier russianClassifier = (Classifier) russianClassifierStream.readObject();
    
    Map<String, VacancyBlock> labelToClassMapping = ImmutableMap.of(
        "req", REQUIREMENTS, "res", RESPONSIBILITIES, "con", CONDITIONS
    );
    
    VacancySplitter splitter = new VacancySplitter(englishClassifier, russianClassifier, labelToClassMapping);
    return splitter.split(vacancyText);
}
  • Note that split method of VacancySplitter accepts only valid xml as input.
  • VacancySplitter is not thread-safe, and it's creation could be fairly expensive. Instances of this class are intended to be reused within single thread for proper performance results.
  • Models in model/ directory are valid ready-to-use java serialized Classifier instances.

This project is being developed for hh.ru online hiring services.