{\rtf1\ansi\ansicpg1252\cocoartf2580 \cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\froman\fcharset0 Times-Roman;} {\colortbl;\red255\green255\blue255;\red0\green0\blue0;} {\*\expandedcolortbl;;\cssrgb\c0\c0\c0;} \paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0 \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \f0\fs24 \cf0 Title: \f1 \cf2 \expnd0\expndtw0\kerning0 \outl0\strokewidth0 \strokec2 CompanyName2Vec: Company Entity Matching Based on Job Ads\ Project: CEM\ Author: Ran Ziv\ Date: September 2021\ \ Contains:\ \'97\'97\'97\'97\'97\'97\ - /code/ - directory with an archive of the CEM project\ - Project contains several directories:\ - This Readme file\ - jobAdsProcessing - contains fingerprinting and job ads corpus processing jobs\ - model - contains the method building blocks\ - model_emb.py - primary executable, includes environmental settings. Loads the input data, generates the model and calls the evaluation process\ - evaluate _emb.py - contains the evaluation process. Calls the index_emb.py\ - Index_emb.py - builds an index for evaluation purposes\ - datasetFiltering.py - prepare job ads corpus and save it to input directory. includes filtering capabilities for testing purposes\ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 \expnd0\expndtw0\kerning0 \outl0\strokewidth0 - utils - contains several utilities, like t-test calculator, fuzzy distance test function, data export/import functions, etc. \expnd0\expndtw0\kerning0 \outl0\strokewidth0 \ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 - /Data/cem-input-export.zip - compressed directory with input directories and files for the CEM project\ - \expnd0\expndtw0\kerning0 \outl0\strokewidth0 /Data/cem-output-export.zip - compressed directory with output directories and files for the CEM project\expnd0\expndtw0\kerning0 \outl0\strokewidth0 \ \ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 \ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 Instructions:\ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 \expnd0\expndtw0\kerning0 \outl0\strokewidth0 \'97\'97\'97\'97\'97\'97\expnd0\expndtw0\kerning0 \outl0\strokewidth0 \ \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 \cf2 - Download and extract files from both Code and Data directories\ - Make sure python 3.6.8 or compatible version of it is installed and all required packages are installed\ - Edit environment variables in model_emb.py\ - Execute model_emb.py\ - Results will be printed to stdout\ }