llama-index: a data framework for your LLM application

Question

llama-index: a data framework for your LLM application

Closed this issue 2 months ago · 16 comments

Hello, I am a PhD student working on a project on the DAS space.

Describe the solution you'd like

I would like to know if it would be possible to check the llama-index package for possible vulnerabilities. I was not able to find this package on jfrog page.

Additional context

If it can help the following link lead to the documentation https://docs.llamaindex.ai/en/stable/

Answer 1 · 2024-10-16T15:11:50.000Z

Hi @P13tr092

Which namespace are you working from?

Can you provide us with more details on your use case for this package?

Answer 2 · 2024-10-16T23:12:24.000Z

Hello Souheil,
I think the name space could be 4pointwayback. In case I am wrong where can I find the namespace?
Regarding LlamaIndex, this package serves as a powerful framework for working with Large Language Models. In my project, I will utilize this framework to explore and analyze documents through the Retrieval-Augmented Generation (RAG) process.
Let me know whether you need more details.

Answer 3 · 2024-10-17T14:19:16.000Z

Hello, I'm currently looking into the issue. There was a vulnerability identified in one of the dependencies to llama-index. This is why pip times out when trying to install it. I will try to find a work-around and let you know.

Answer 4 · 2024-10-17T14:59:32.000Z

Hello @jacek-dudek,
Thank you for your help!

Answer 5 · 2024-10-22T13:30:40.000Z

Some initial discovery:

Tried to manually install llama-index package using pip. Some dependencies are not getting downloaded onto the notebook. I identified these two packages: requests-toolbelt, typing inspect.

pip output:
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Read timed out. (read timeout=300.0)")': /artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl

ERROR: Could not install packages due to an OSError: HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Max retries exceeded with url: /artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl (Caused by ReadTimeoutError("HTTPConnectionPool(host='jfrog-platform-artifactory.jfrog-system', port=8081): Read timed out. (read timeout=300.0)"))

Answer 6 · 2024-10-22T13:50:04.000Z

Looked into jfrog REST apis to see if I can query the jfrog server directly. I can. It appears that those two packages are there on the server. Here's jfrog output:

curl http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/search/artifact?name=requests-toolbelt
{
"results" : [ {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-0.9.1-py_0.tar.bz2"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-0.10.1-pyhd8ed1ab_0.tar.bz2"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/conda-forge-remote-cache/noarch/requests-toolbelt-1.0.0-pyhd8ed1ab_0.conda"
}, {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/pypi-remote-cache/.pypi/requests-toolbelt.html"
} ]
}
curl http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/search/artifact?name=typing-inspect
{
"results" : [ {
"uri" : "https://jfrog.aaw.cloud.statcan.ca/artifactory/api/storage/pypi-remote-cache/.pypi/typing-inspect.html"
} ]

Answer 7 · 2024-10-22T13:54:13.000Z

Trying to install the latest versions of either of these two packages still errors out. The earliest thing that appears to go wrong is something related to cache entries.

pip -vv install typing-inspect==0.9.0
. . .
Looking up "http://jfrog-platform-artifactory.jfrog-system:8081/artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl" in the cache
No cache entry available
No cache entry available
Incremented Retry for (url='/artifactory/api/pypi/pypi-remote/packages/packages/65/f3/107a22063bf27bdccf2024833d3445f4eea42b2e598abfbd46f6a63b6cb0/typing_inspect-0.9.0-py3-none-any.whl'): Retry(total=4, connect=None, read=None, redirect=None, status=None)
. . .
Will keep investigating.

Answer 8 · 2024-10-23T17:19:43.000Z

@P13tr092 can I ask how critical the llama-indexer and spacy package are to your work?

If they are blockers/critical then I will look at whitelisting while we work on eval/mitigation.

Answer 9 · 2024-10-23T17:47:21.000Z

Hi @Souheil-Yazji ,
These two packages are essential to my research project

Answer 10 · 2024-10-23T17:57:42.000Z

Okay, sorry about the wait. We'll look into whitelisting this asap.

Answer 11 · 2024-10-23T19:07:52.000Z

Great! thank you for your help @Souheil-Yazji

Answer 12 · 2024-10-25T19:51:08.000Z

cc @jacek-dudek

Work Around for Non-pro-b Notebooks:

conda deactivate ##exit base env
conda create -n <your-env-name> python=3.11 numpy=1.24.4 ## I believe your requirements.txt file would work but you have to pin the python version down since we are on 3.12.7 which is not compatible with numpy<2.x.x
conda activate <your-env-name>
conda install llama-index
conda install spacy=3.7.5 && python -m spacy download en_core_web_sm

This will all work for a pro-b notebook except for python -m spacy download en_core_web_sm Let me know if that is needed, as we can locally host those files.

One thing I noticed is the usage of pip in the notebook you're using. Honestly conda just does a much better job resolving deps and giving accurate info that I default to it, even though it's not as lightweight.

Your requirements file seems to be good, except that python version doesn't seem pinned. Otherwise you can build that env and activate it, then change the notebook kernal to use that env instead :)

This should resolve all your problems.

Answer 13 · 2024-10-28T17:10:04.000Z

@P13tr092 let me know if this worked for you :)

Answer 14 · 2024-10-28T18:37:45.000Z

Hello Souheil, Thank you for finding a workaround! Right now, I am outside Canada and I cannot try it to verify that it works properly. I will be back on Friday and I will test it over the next weekend. The only problem that I see with your workaround is that spacy needs 'en_core_web_sm' or an alternative library to work. What is the issue when downloading it? Pietro Il giorno lun 28 ott 2024 alle ore 13:10 Souheil ***@***.***> ha scritto:

…

@P13tr092 <https://github.com/P13tr092> let me know if this worked for you :) — Reply to this email directly, view it on GitHub <#1983 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALVG42FMGNEI4BPRAXNZMY3Z5ZVYJAVCNFSM6AAAAABPUYHQZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBSGE3DGNRQGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 15 · 2024-10-29T13:57:37.000Z

Confirmed that Souheil's conda workaround works on my notebook. (Only the python version needed to be pinned when creating the new environment actually.) If you don't insist on that specific version of spacy then you can also get a successful install of both llama-index and spacy with the default python version (3.12).

You'll need to pass an environment variable storing an api key from your openai account (OPENAI_API_KEY) when running your applications. I don't have a paid subscription so I'm not getting any output but the llama-index sample code seems to work otherwise.

Answer 16 · 2024-10-30T14:45:19.000Z

Closing for now, if issue occurs again, reopen