AN AI project for natural language processing on arXiv papers dataset.
research question :
How can we predict the trend of publication of a paper using its abstract as a feature and citation number as a label using machine learning.
research method and steps :
step 1 : build a pipe line for extract and save datasets using arXiv API.
step 2 : cleaning the data and using word embeddings methods. store citation data for each paper
step 3 : use classic machine learning methods to classify papers based on their abstract dataset and use their citation as a label.
step 4 : Test the model for prediction