🚀 In Progress 🚀
Please note that this project is still under active development. Check back soon for updates and new features! 🔄📈
This project develops a model to analyze and classify sentiments from various sources, including patient reviews, social media posts, and electronic health records. The goal is to categorize sentiments as positive, negative, or neutral.
Key objectives include:
- Processing Diverse Inputs: Handling medical jargon, slang, and informal language.
- Using Advanced Techniques: Applying Natural Language Processing (NLP) and Machine Learning (ML) to improve sentiment analysis in healthcare contexts.
The project aims to evaluate how effectively sentiment analysis can manage complex and varied linguistic inputs in the medical field.
- Python 3.x
- pip (Python package installer)
-
Clone the repository:
git clone https://github.com/RiyaaChauhan/Sentimental-Analysis-in-Healthcare.git cd Sentimental-Analysis-in-Healthcare
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
-
Install the required dependencies listed in
requirements.txt
:pip install -r requirements.txt
-
After installing dependencies, run the script:
python sentiment.py
-
Follow the instructions in the script or consult the documentation for specific usage details.
- dataset_info.json:
- Number of Rows: 5,000
- Number of Columns: 2
- Column Names: 'Tweet', 'Sarcasm (yes/no)'
- Description: This dataset features tweets labeled for sarcasm. Each tweet is accompanied by a label ('yes' or 'no') indicating whether the tweet is sarcastic.
-
sentiment_dataset.csv:
-
McGill-NLP:
-
Columns:
abstract_id
: A unique identifier for each abstract.text
: The main content or text of the abstract.location
: The location or entity associated with the abstract.label
: A label or classification associated with the abstract.
https://huggingface.co/datasets/McGill-NLP/medal ## Files
-
data/training_data.csv
: Contains the training data. -
data/testing_data.csv
: Contains the testing data.
The model is trained on a diverse dataset, including:
- Patient Reviews: Text data from patient feedback across various healthcare platforms.
- Social Media Posts: Comments and posts related to health and wellness.
- Electronic Health Records: Structured data from patient health records, converted into textual descriptions.
The data includes various sentiment categories to enhance the model's ability to handle different types of input.
The model is evaluated using:
- Validation Set: A subset of the training data, kept aside for tuning model parameters and preventing overfitting.
- Test Set: A separate dataset that was not used during training, ensuring an unbiased assessment of the model's performance.
These datasets are designed to reflect real-world scenarios and help assess the model's accuracy in classifying sentiments across different contexts.
- Sentiment Classification: Categorizes sentiments into positive, negative, or neutral.
- Diverse Input Handling: Processes and analyzes data from patient reviews, social media posts, and electronic health records.
- Medical Jargon Recognition: Identifies and understands medical terminology and jargon.
- Slang and Informal Language Processing: Handles slang and informal language commonly found in social media and patient feedback.
- NLP and ML Integration: Utilizes Natural Language Processing (NLP) and Machine Learning (ML) techniques for accurate sentiment analysis.
- Real-time Analysis: Provides real-time insights and feedback based on analyzed data.
- Customizable Models: Allows for adjustments and improvements based on specific healthcare needs and data types.
We welcome contributions from the community to enhance the functionality and quality of this project. To contribute, please follow these guidelines:
-
Fork the Repository: Create your own fork of the repository to work on your changes.
-
Create a Feature Branch: Develop your changes in a separate branch. Use descriptive names for your branches to indicate the purpose of the changes.
git checkout -b feature-branch-name
-
Make your changes
-
Commit your changes:
git commit -m 'Add new feature'
-
Push to the branch
git push origin feature-branch
-
Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.