/journal_finder

Python script for a journal recommendation program based on user-provided manuscript abstracts. The script uses web-scraping and BERT encoding.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Contributors Forks Stargazers Issues LinkedIn

A journal recommender for the submission of your scientific manuscript

Based on the cosine similarity between BERT-encoded journal scopes and a user-provided abstract

Explore the docs »

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. License
  6. Contact

About The Project

The goal of this project is to build a journal recommender for submission of a scientific manuscript. The recommendations are based on similarities between the scope of the journals and the user-provided abstract of a manuscript. To achieve this, two steps have been taken:

1. scimagojr_scrape.ipynb script: I scraped scimagojr.com to extract the scope of each journal from it's dedicated webpage on scimagojr.com and stored these scopes in a separate dataset for each subject category: Biochemistry, Genetics and Molecular Biology / Immunology and Microbiology / Medicine / Neuroscience / Pharmacology, Toxicology, and Pharmaceutics.

The scraped scopes can be viewed in the scraped_from_scimago directory

2. journal_finder.ipynb script: I used a BERT model pretrained on MEDLINE/Pubmed texts which is available on TensorFlow Hub to convert the journal scopes and the user-provided abstract into feature vectors. Then I used cosine similarity values between these vectors to find the most similar scopes to the provided abstract.

The encodings of the journal_scopes using the TensorFlow Hub BERT expert (BERT pooled outputs) are available in the journal_scope_encodings directory as .pkl files.

3. finetuned_BERT_journal_recommender.ipynb script: I used a PubMedBERT sentence similarity model pretrained on MNLI, SNLI, SCINLI, SCITAIL, MEDNLI, and STSB texts which is available on Hugging Face and finetuned it with [abstract, journal_scope] pairs scraped from Pubmed. The finetuned model was then used to convert the journal scopes and the user-provided abstract into feature vectors, which were then compared using cosine similarity to find the most similar scopes to the provided abstract.

Abstracts scraped from PubMed can be downloaded using this link.

(back to top)

Built With

  • Python v3.7.15
  • TensorFlow v2.8.0

(back to top)

Getting Started

journal_finder.ipynb script: If you want to test the final functionality (journal recommendation system), import the .pkl dictionaries to the journal_finder.ipynb script and skip to the "Define function that computes similarity ..." section.

finetuned_BERT_journal_recommender.ipynb: You could skip the scraping section in this notebook as well. Upload the Pubmed data to your Drive or Colab and then skip to "Alternatively, use the presaved ..." section.

Installation

No installation is required. Just open the scripts in Google Colab and you're good to go.

(back to top)

Usage

Finding suitable journals for your manuscript can be a tedious and time-consuming process, especially if you're new to a field or your manuscript was rejected by your first- and second-choice journals. Having a powerful recommender can save up a lot of time in this regard.

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the Apache-2.0 License. See LICENSE.txt for more information.

(back to top)

Contact

Amin Sadeghi - masadeghi6@gmail.com

Project Link: https://github.com/masadeghi/journal_finder

(back to top)