TalkingStage is a conversational bot that uses machine learning to predict responses to various questions about personal preferences and characteristics. The bot is designed to assist users in gathering and responding to personal information during the "talking stage" of a relationship.
-
Predefined responses for direct keyword matches.
-
Machine learning-based predictions for non-direct keyword matches.
-
Model training and saving functionality.
-
Uses a saved model for making predictions in production.
-
Supports multiple platforms, including macOS and Windows, would support Android and iOS in future updates through API model access or if microsoft makes ML.NET compatible for mobile devices.
YouTube explaination of the brain structure of Talking Stage
We are excited to announce the release of TalkingStageBot v2.0! This new version comes with significant enhancements and improvements to make the chatbot smarter and more efficient. Here’s what’s new:
- Upgraded ML Model: The machine learning model has been upgraded to use the
OneVersusAll
trainer with theLbfgsLogisticRegression
binary classifier. This provides better accuracy and performance for multiclass classification. - Regularization Parameters: Fine-tuned L1 and L2 regularization parameters for optimal model performance.
- L1 Regularization: 0.031347524F
- L2 Regularization: 0.03125F
- Text Normalization: Text data is now preprocessed to ensure all text is in lowercase and trimmed, which helps in reducing the complexity and improving the model's ability to generalize.
- Data Pipeline: Enhanced data pipeline includes text featurization, feature concatenation, and key-to-value mapping for labels.
- Platform Compatibility: Added specific handling for
PlatformNotSupportedException
to ensure smooth operation across different platforms. - Type Initialization: Improved error handling for type initialization issues to make debugging easier and enhance stability.
- Cross-Platform Compatibility: TalkingStageBot is now compatible with multiple platforms, including macOS and Windows.
- Streamlined Training and Saving: The process of training and saving the model has been streamlined with comprehensive debug logs to track the progress and catch errors effectively.
TalkingStageBot.cs
: Main class for the bot functionality.TrainingModel.cs
: Class responsible for training and saving the machine learning model.
-
Clone the repository:
git clone https://github.com/Brainydaps/TalkingStage.git cd TalkingStage
-
Ensure you have .NET SDK installed:
- You can download it from the official Microsoft .NET website.
-
Restore the required packages:
dotnet restore
-
Build the project:
dotnet build
-
Run the project:
dotnet run
- Place your training data in a CSV file named
training_data.csv
in the project directory. - Ensure the CSV file has two columns:
Text
andLabel
.
When you run the project for the first time, the bot will check for a pre-existing model file (model.zip
) in the project directory:
- If the model file is not found, it will train a new model using the provided
training_data.csv
and save it asmodel.zip
. - If the model file is found, it will load the existing model and create a
PredictionEngine
from it.
You can get responses by calling the GetResponse
method with a question string:
var bot = new TalkingStageBot();
var response = bot.GetResponse("What is your favorite color?");
Console.WriteLine(response);
- TalkingStageBot.cs: Contains the bot logic, predefined responses, and model prediction methods.
- TrainingModel.cs: Contains the logic for training the machine learning model and saving it to a file.
- Added logging to confirm paths for the model and training data.
- Text data is featurized using the
FeaturizeText
method, converting text into numerical vectors. - The labels are mapped to keys using
MapValueToKey
.
- The
SdcaMaximumEntropy
trainer from the ML.NET library is used for multiclass classification. - Regularization parameters and the number of iterations were adjusted to improve model performance:
- L2 Regularization: 0.1
- L1 Regularization: 0.01
- Maximum Number of Iterations: 1000
- The trained model is used to create a
PredictionEngine
that takes user input, processes it, and predicts the most appropriate label.
The training data consists of user queries and corresponding labels. The labels are aligned with predefined responses in the bot. The training data was populated by mapping various ways users might ask a question to a specific label. Below is a sample of the training_data.csv
format also found in this repo:
Text,Label
What is your name,name
Where do you live,location
How old are you,age
What is your job,occupation
...
- TalkingStageBot.cs: The core file containing the
TalkingStageBot
class, responsible for initializing the ML context, training the model, and providing responses. - training_data.csv: The CSV file containing the training data used to train the machine learning model.
- Responses Dictionary: A predefined dictionary mapping keywords to responses, used for quick responses to known queries. Before compiling the code, edit the placeholders in the responses dictionary values to your actual information. For example, change "your name" to "My name is Adedapo Adeniran", "your age" to "I am 29 years old, born in October 1994", etc.
- ML Model: The ML.NET model trained using the
SdcaMaximumEntropy
trainer for multiclass classification.
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License. For more details, please refer to the LICENSE file.
Contributions are welcome! Please feel free to submit a pull request or open an issue on GitHub.
For any questions or inquiries, please contact Brainydaps via GitHub.