A chatbot is a computer program or an artificial intelligence (AI) that is designed to simulate human conversation through text or speech interactions. It is also commonly referred to as a conversational agent or virtual assistant.
Chatbots are programmed to understand and respond to user queries and provide relevant information or assistance. They can be integrated into various platforms such as websites, messaging apps, or voice assistants, allowing users to engage in conversations and receive automated responses.
There are different types of chatbots, ranging from rule-based chatbots to more advanced AI-powered chatbots. Rule-based chatbots follow a predefined set of rules and responses, while AI-powered chatbots use natural language processing (NLP) and machine learning techniques to understand and generate more contextually appropriate responses.
Chatbots can be used for a variety of purposes, such as customer support, information retrieval, task automation, and entertainment. They can improve efficiency by handling repetitive or common queries, provide 24/7 availability, and enhance user experiences by delivering personalized interactions.
Necessary data is collected from the path education in Miuul websites and faq section of bootcamps in VBO websites. Question and answers are collected by using web scraping via BeautifulSoup library. But scraping data from websites is not enough for training a chatbot. So, we need more question and answers to train our chatbot. For this purpose, we added some questions and answers manually,then we saved it as a xlsx file.
This dataset covered subjects such as: bootcamp name, program location, program price, program duration, program start date, program end date, program description, program curriculum, program requirements ext.
The data is converted from xlsx file to json file so that chatbot returns the answer of the question that user asked.
Firstly, we deleted html tags from the data. Then, we converted all the letters to lowercase. After that, we removed punctuation marks and stopwords, which are words that do not add meaning to the sentence, from our dataset.
Finally, we tokenized the data by using lemmatization. We used n-gram that creates combinations of words used together. Here, we created consecutive strings consisting of n elements according to the n number we determined.