This repository contains the code and resources for the Data Integration for Bibliographic Articles project. The project focuses on enhancing data integration skills using Python to extract valuable details from bibliographic articles available in open data from CAPES.
The goal of this project is to integrate multiple databases and extract additional details such as titles and authors from bibliographic articles. By combining and processing these datasets, we enable strategic decision-making for bibliographic collections.
The project is divided into several main components:
-
📥 Loading CAPES data: In this step, we use the CAPES API to extract essential information about bibliographic articles.
-
🔄 Processing the CAPES dataset: The CAPES dataset undergoes necessary data processing to prepare it for subsequent steps.
-
🌐 Integrating with Crossref Database: We utilize the Crossref API to obtain Digital Object Identifiers (DOIs) for the articles. The matching criteria require a minimum of 90% similarity with the article titles.
-
💾 Storing the Results: How to store the results for further analysis and decision-making.
-
📊 Exploratory Data Analysis: A comprehensive analysis is conducted, exploring elements such as titles, authors, and more to support strategic decision-making for bibliographic collections.
Contributions to this project are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.