Implementing this architecture and learning about AI in the process.
I did not finish the frontend as I got what I wanted from this exercise.
-
Download the
faiss-node
release packaged for AWS from here.- Look for the x86 architecture.
-
Put the
nodejs
folder from the package intolayers/faiss-node
-
Bootstrap the backend.
-
Deploy the backend.
-
The middy library does not work super well with TypeScript. The basic support is there, but the types are not correctly augmented, especially when performing input/output validation.
-
I could not find any generic middleware engine for TypeScript.
- Maybe it is not there as it is very hard to write one in a generic manner?
-
-
I had to use
@middy/utils
package to have the AWS Lambda function output correct headers when I threw an error in the middleware chain.- Using packages like
http-errors
or similar did not work. Thecontent-type
was alwaystext/plain
- Using packages like
-
The order of middlewares matters, a lot.
-
I had an issue where the request/response validator tried to validate a string instead of an object.
-
This was because the response parser did not run yet.
-
-
Just like when hitting any other API, you must configure CORS for S3 bucket you want to upload stuff to.
- Of course, this only needs to be done if CORS is required. For example when uploading files from the browser.
-
LLM is an instance of foundational model which applies to text and text-related things.
-
They are super flexible. While this makes them relevant in almost every field or problem, it also makes them less accurate given specific problem domain.
-
One way to improve the output of the LLM is to fine-tune your prompts. A good heuristic is to first give the LLM description of the task, and then a couple of examples which you know are valid.
-
Supplying the examples manually is called "hard prompting". There is also a way for the AI to generate this prompt. If that is the case, then it is called "soft prompting".
-
Since adding additional benefit to the user prompt is quite large, you probably do not want to pass the user input "as is" into the LLM. You most likely want to use a prompt template augmented with the user input.
-
-
-
Retrieval Augmented Generation means adding additional set of data into the LLM "knowledge base". Here is a great video about this topic.
-
A good example would be asking the LLM about the planet with the highest amount of moons. The data LLM has might be outdated as such it might give you wrong answer. Now, if you augment the data LLM has with sources from, let us say NASA, the LLM would be able to give a correct answer.
- The LLM would first ask the "content store" for the answer. If the answer is there, the LLM would use that as a data source. Otherwise it uses the knowledge it already has.
-
-
In the context of AI, the word embeddings are representation of words as array of numbers called vectors.
-
You might think of embeddings as "classifications". The modal will classify some word to a given number.
-
The numbers in the vector represent how similar each word is to another word. For example, the vector for "I took my cat for a walk" would be similar in terms of numbers to the "I took my dog for a walk".
-
The embeddings are then feed into some kind of similarity search engine which LLM use to retrieve the final answer.
-
-
The Amazon Bedrock exposes various models. One of them has an "embedding" modality while others have "text" modality. What is the difference?
-
The "text" modality means
-
The "embedding" modality means
-
-
The name LangChain comes from the fact that, to work with LLMs, one creates "chains" of inputs.
-
The simplest would be the following: input variables -> prompt template -> prompt -> model
- In this example, we are "hard prompting" because it is up to us to craft the prompt template.
-
-
Since we are bound by the input token limit, some libraries (like LangChain) opt to recursively provide prompts by splitting the context into multiple documents.
-
So far, the way I've seen RAG implemented was by providing the relevant context into the LLM prompt. What if the relevant context is very large? Would not asking the AI for the answer cost a lot of money (since we are billed by input/output tokens)?
- It seems like the answer is to split the context into small chunks, rank them and retrieve only the top "k" results. This way we keep the context relatively small. While this might help, it still seems off to me. This blog post describes the filtering & ranking processes as well as the general idea behind RAG.
-
-
The Bedrock model for embeddings does not seem to support streaming. When trying to use the
InvokeModelWithResponseStreamCommand
I got the following responseValidationException: The model is unsupported for streaming
I guess the closest we can get to streaming is the batch inference job.
-
You can add metadata to the S3 presigned URL via the
Fields
property.- This is super handy for adding more information regarding the file. The consumer can then use
headObject
to retrieve it.
- This is super handy for adding more information regarding the file. The consumer can then use
-
Extracting text from PDF in ESM land is quite hard.
-
There are various libraries, but they are incompatible with ESM. The main problem are dynamic
require
calls. -
There is the
pdf-dist
package, which mostly works.-
It requires you have
node-gyp
working which in most cases does not work and you have to install or reinstall some packages. -
I had to use "side-effect" import for the worker.
-
-
Of course, one could use the AWS Textract.
- The sync command does not work well for larger files. If the file is bigger than X, it rejects with an error saying that the "format of the file is incompatible" which is not the case.
-
-
If you see the
dynamic require of XX is not supported
error, try creating therequire
variable before your code runs.-
This error has to do with something about ESM.
-
In
esbuild
you can add this bannerbanner: "import { createRequire } from 'module';const require = createRequire(import.meta.url);",
-
-
In some cases, when AWS Lambda runtime fails to parse the error, you will get the following message.
{ "errorType": "handled", "errorMessage": "callback called with Error argument, but there was a problem while retrieving one or more of its message, name, and stack" }
I hit this error while creating a vector store via the
faiss-node
library.const vectorStore = await FaissStore.fromDocuments( await loader.load(), embeddings );
For some reason, AWS Lambda runtime could not parse the error and print it. Using
console.log(error)
caused this weird runtime message to appear instead of the original error!try { const vectorStore = await FaissStore.fromDocuments( await loader.load(), embeddings ); } catch (error) { if (error instanceof Error) { console.log(error.name); console.log(error.message); console.log(error.stack); // <-- This line cases the weird runtime error } }
Printing the
stack
caused the weird runtime error to show up. I wonder if this is some kind of security mechanism? -
I just spent a couple of hours debugging issue with
faiss-node
AWS Lambda Layer only to discover I used the layer for wrong architecture... -
When getting an object from S3, the SDK returns
StreamingBlobPayloadOutputTypes
. This in turns exposes methods allowing you to create either a buffer, string or a web stream from the body.-
I found it very hard to pipe the web stream into
fs.createWriteStream
. It seems like the types are not compatible. -
Also, examples in the internet are using the
reader
of the web stream. This means they are reading chunks separately rather than piping it to writable.
-
-
I got hit again by the "limitations" of AWS Lambda Destinations.
-
Sadly, the
onSuccess
will not be invoked whenever the AWS Lambda is invoked by DynamoDB streams.-
It does make sense, given the fact that such AWS Lambda is invoked synchronously.
-
Having said that, it would be awesome for AWS Lambda Destinations to work for sync invokes as well.
-
-
-
If you need to conditionally apply TailwindCSS styles, consider the following approaches.
-
Use some kind of library to apply them conditionally. One such example is the
cslx
package -
Use the
data-
attributes and style based on them.
<li data-loading={isPending} className={"[loading='true']:opacity-50"}></li>
-
-
I was trying to add the upload indicator for the document. The
reader
would always yield{done: true, value: undefined}
and I could not figure out why is that the case.-
S3 was not returning anything, as such there was nothing to "download" back from the server.
- Looking back, it makes sense.
-
According to my research, it is not possible to track the "upload" progress indicator if your body is not
ReadableStream
.- I could not find any way to read the upload progress out of the
formData
.
- I could not find any way to read the upload progress out of the
-
Axios has the "upload progress" feature, but it uses XHR requests rather than
fetch
API. The XHR request haveaddEventListener
API where you can listen to upload requests.
-