- Q&A : 1 to 1
- Chat : With history in session
- Chunks size
- Vector db
- Model changes, increases context size - would you change chunk size? : tradeoff throughput and cost ofcourse! a very long context, loss of info?
- data-freshness problems
- CONTEXTUAL DATA : formats, loaders,
- EMBEDDINGS : text-embedding-ada-002, cohere, opensource - Sentence Transformers from Huggingface
- CHOICE of DB :
- opensource - Weaviate, Vespa, and Qdrant
- Local vector management libraries - Chroma and Faiss
- OLTP extensions - pgvector (Postgres)
- AWS - S3 with AWS Kendra or AWS OpenSearch
Challenge in
- specific prompt text (domain/task driven)
- few shot examples
- any external api (e.g. web search, maths calculation etc)
- corresponding context (internal organisational data, code base)
- additional logging
- caching, and
- validation of response
- Few shots no less than 12 examples
- Prmpt strategies : chain-of-thought, self-consistency, generated knowledge, tree of thoughts, directional stimulus
- ORCHESTRATIONS :
- Langchain / Llamaindex
- Chioce of Model
- Proprietary Models - e.g. Claude offers fast inference, GPT-3.5-level accuracy
- open source models - API interfaces from Hugging Face and Replicate, LLaMa model from Meta
- Operational tooling
- Cache - redis
- Tools like Weights & Biases and MLflow (ported from traditional machine learning) or PromptLayer and Helicone (purpose-built for LLMs)
- Validate LLM outputs - e.g., Guardrails
- Detect prompt injection attacks - e.g., Rebuff
- Hosting - cloud providers, Vercel
- Steamship provide end-to-end hosting for LLM apps, including orchestration (LangChain), multi-tenant data contexts, async tasks, vector storage, and key management
- Anyscale and Modal allow developers to host models and Python code in one place
Combination of
- reasoning/planning
- tool usage, and
- memory / recursion / self-reflection
- “Chain of Thought” (CoT) (Huang et al., 2022; Wei et al., 2022),
- "Describe, Explain, Plan, and Select” (DEPS) (Wang et al., 2023),
- “Reason+Act” (ReAct) (Yao et al., 2023) and
- self-reflection (Reflexon) (Shinn et al., 2023)
- require the creation of task-specific prompt templates, and
- are limited to a single train of thought within the LLM which precludes the use of multiple attack angles for creative problem solving
Ask the “right questions”
This removes the need for fixed-format task-specific prompts and enables multiple instantiations of LLMs to engage in free-form, self-proposed inquiry, thereby fostering the self-discovery of knowledge.
Questions What vector DB are you using? What is the data structure that you're vectorizing? What is your chunk size? Have you implemented memory? What prompt or technique are you using (ReACT, CoT, few-shot, etc)? Are you only using vector DBs? Do you use sequential chains? Does it need tools? Depending on your data, business case and what output you expect from the agent, there is no one-size-fits-them-all.
Innovations in embedding models ( as we context grows ) rather than choice of vector db, embeddings specific to llm?
Ref :