/arXiv-project

Primary LanguageJupyter Notebook

arXiv-project

AN AI project for natural language processing on arXiv papers dataset.

research question :

How can we predict the trend of publication of a paper using its abstract as a feature and citation number as a label using machine learning.

research method and steps :

step 1 : build a pipe line for extract and save datasets using arXiv API.

step 2 : cleaning the data and using word embeddings methods. store citation data for each paper

step 3 : use classic machine learning methods to classify papers based on their abstract dataset and use their citation as a label.

step 4 : Test the model for prediction