/learning-langchain

Learning LLM with LangChain, OpenAI and spaCy

Primary LanguageJupyter Notebook

Learning LangChain

(Work in progress)

OpenAI API Usage

https://platform.openai.com/organization/usage

My bible

Marketing 5.0: Technology for Humanity

As "Marketing 5.0" says, NLP (with LLM) is a core part of Data Driven Marketing.

Background and Motivation

In the past half year, I have learned NLP with spaCy and SQLite. I am using the NLP skill in my work for marketing these days.

Project Goal

I am a fan of SQLite, so I study in this project how I can use SQLite as a part of RAG.

My final goal is to realize Data Driven Marketing framework with NLP and LLM. The framework will not be included in this project.

Code

独自LLM APIサービスが提供されている環境でもLangChain使えるようにしたい。例えば社内で提供されるLLMのAPIを使う場合。

購入した入門書、最新のLangChainのAPI仕様と合っていない。コード部分を最新仕様に合わせて更新。

Test LangChain's RAG capabilities with OpenAI.

I conclude that the document sources are much more important than RAG.

Test spaCy's built-in embedding capabilities.

I conclude that the built-in embedding capabilities are not useful in my work.

Use ChromaDB for keyphrase similality search with textacy. This code use neither LangChain nor OpenAI.

I conclude that Sentence Transformers are useful in my work.

Since the APIs changes frequently, I have started learning LangChain on this site in Aug 2024.

OpenAI API

https://platform.openai.com

Embeddings

I test OpenAI, spaCy and Sentence Transformers to generate embeddings.

Note: spaCy's "en-core-web-lg" and ""ja-core-news-lg"" seems to output embeddings the size of 300 dimensions. On the other hand, "en-core-web-trf" does not seem to support embeddings because of an interoperability problem with the other packages.

VectorDB

Chroma

I use Chroma and my original GraphDB to achive my goal for Data Driven Marketing.

I am also interested in sqlite-vec which is more suitable for my gloal. sqlite-vec is still in an alpha version, so I use Chroma for the time being.

My original GraphDB (private project)

In a real world, SQL DB needs to coexist with GraphDB and VectorDB to meet various demands from marketing teams.

I have already developed GraphDB with SQLite and networkx on my own:

  • My original schema to store graph entities (nodes).
  • My original SQL to dynamically generate triplets on a certain condition (i.e., edges between nodes with dependencies).
  • Run Graph Theory on the generated network to generate a sub graph.
     Network Graph A    Network Graph C
               |           |   <- - - Connect networks where similality distance is smaller than the threshold
              Network Graph B

          Database stack

[            NetworkX            ]  ==> Graph theory for knowledge graph
[           Shim Layer           ]  ==> Dynamic knowledge graph generation
[SQLite database][Chroma database]  ==> SQL and Semantic Search
[            SQLite3             ]  ==> Base

The GraphDB is not included in this project.

Reference

参考