/mun-ching

ML, WebParser, Mental Health, all in one repo (Aug-2022)

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Project Run Down

Finetuning GPT-3 to talk more as a therapist or conversational partner would

Steps

Data Collection

  • 8.26.22 Attempted a webscrape to get valuable data
    Failed miserably
  • 8.26.22 realized I could use GPT3 itself to create the conversations. they would be just general enough and well written enough to
    to pass as real prompt and response
  • 8.26.22 used the file JSONLprep.py to turn the text file transcripts.txt into a suitable JSONL file
  • 8.27.22 created a script to generate therapist sessions with gpt3 itself
  • 8.27.22 data is saved and parsed with new methods to comply with jsonl
  • 3:14 AM 9/5/2022 The big data has been caught. Alhamdulilah.

Fine tuning

  • 8.27.22 opened the project folder in wsl, set my OPENAI_API_KEY variable, then ran the finetuning script
  • 8.28.22 trained once in the beginning, trained again with many more examples
  • 3:14 AM 9/5/2022 Now that the BEEG data has been caught, new finetuning is in order. But the question is, is finetuning really the way to go about this? prompt engineering is already such a powerful tool. Further inquiry and research is required.

Model Access

  • 8.28.22 Currently, the model is accessed through a rudimentary script
  • 8.29.22 The model is now accessible through a bare tkinter GUI in the tkGUI branch

Adding model memory

  • 2:53 PM 8/30/2022 As when you talk to most people, they tend to remember the last thing you said.
    As such, I am adding memory to this model through way of contextual prompting.
  • 3:14 AM 9/5/2022 Memory has been added by way of prompt engineering and human ingenuity.
  • 12:01 PM 9/8/2022 Memory and prompt have been optimized. Time for webapp

Future Ideas

from 12:01 PM 9/8/2022

  • Make webapp
  • Figure out a way to finetune and move past full prompt engineering from 2:29 AM 9/10/2022
  • Create search conversation function to optimize memory and return only relevant lines of conversation for summary and context.