/myGPTReader

myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.

Primary LanguagePythonMIT LicenseMIT

myGPTReader

myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.

For now it is in development, but you can try it out by join this channel.

The exciting part is that the development of this project is also paired with chatGPT. I document the development process in this CDDR file.

Features

  • Integrated with slack bot
    • Bot replies messages in the same thread
  • Support web page reading with chatGPT
  • Support RSS reading with chatGPT
    • RSS is a bunch of links, so it is equivalent to reading a web page to get the content.
  • Support newsletter reading with chatGPT
    • Most newsletters are public and can be accessed online, so we can just give the url to the slack bot.
  • Prompt fine-tue
    • Support for custom prompt
    • Show prompt templates by slack app slash commands
    • Auto collect the good prompt to #gpt-prompt channel by message shortcut
  • Cost saving
    • by caching the web page llama index
      • Consider to use sqlite-vss to store and search the text embeddings
      • Use chromadb to store and search the text embeddings
      • Use the llama index file to restore the index
    • Consider to use sentence-transformers or txtai to generate embeddings (vectors)
      • Not good as the embeddings of OpenAI, rollback to use the OpenAI embeddings, and if enable to use the custom embeddings, the minimum of server's memory is 2GB which still increase the cost.
    • Consider to fine-tue the chunk size of index node and prompt to save the cost
      • If the chunk size is too big, it will cause the index node to be too large and the cost will be high.
  • Bot can read historical messages from the same thread, thus providing context to chatGPT
  • Index fine-tune
    • Use the GPTListIndex to summarize multiple URLs
    • Use the GPTTreeIndex with summarize mode to summarize a single web page
  • Bot regularly send hot summarizes(expensive cost) news in the slack channel (#daily-news)
    • Refer to this approach
      • World News
        • Zhihu daily hot answers
        • V2EX daily hot topics
        • 1point3acres daily hot topics
        • Reddit world hot news
      • Dev News
        • Hacker News daily hot topics
        • Product Hunt daily hot topics
      • Invest News
        • Xueqiu daily hot topics
        • Jisilu daily hot topics
  • Support file reading and analysis 💥
    • Considering the expensive billing, it needs to use the slack userID whitelist to restrict the access this feature
    • Need to cache the file Documents to save extract cost
    • EPUB
    • DOCX
    • MD
    • TEXT
    • PDF
    • Image
      • may use GPT4
  • Support voice reading with self-hosting whisper
    • (whisper -> chatGPT -> azure text2speech) to play language speaking practices 💥
    • Support language
      • Chinese
      • English
        • 🇺🇸
        • 🇬🇧
        • 🇦🇺
        • 🇮🇳
      • Japanese
      • German
  • Integrated with Azure OpenAI Service
  • User access limit
    • Limit the number of requests to bot per user per day to save the cost
  • Support discord bot ❓
  • Rewrite the code in Typescript ❓
  • Upgrade chat model (gpt-3.5-turbo) to GPT4 (gpt-4-0314) 💥
  • Documentation
  • Publish bot to make it can be used in other workspaces
    • Slack marketplace