Create tutorials

Question

Create tutorials

gabrielchua opened this issue 10 months ago · 8 comments

gabrielchua commented 10 months ago

Jupyter Notebook style

Answer 1 · 2024-01-28T03:28:47.000Z

Hi @gabrielchua , Can you guide me a little on this. I would love to work on this.

Answer 2 · 2024-01-28T03:55:11.000Z

Hey @alhridoy,

Thanks for offering.

How about starting with this code snippet? You can make a starter notebook which some examples

from ragxplorer import RAGxplorer

client = RAGxplorer(embedding_model="thenlper/gte-large") 

client.load_pdf("presentation.pdf", verbose=True)

client.visualize_query("What are the top revenue drivers for Microsoft?")

Then you can try changing the following:

When initializing the RAGxplorer object, you can change the embedding_model argument to different embedding models from HuggingFace (e.g. BAAI/bge-large-en) or OpenAI (e.g. the new text-embedding-3-small.
visualize_query method has the following arguments:

retrieval_method which can be: naive (default), HyDE or multi_qns
top_k which is an int (recommend 3 to 10), defaults to 5.

Feel free to ping here if you run into any issues.

Answer 3 · 2024-02-01T03:41:07.000Z

Hi @gabrielchua when i write import plotly.graph_objs as go the following errors apper, i also installed plotly manually but it does not fix my problem. ModuleNotFoundError Traceback (most recent call last)
Cell In[20], line 1
----> 1 from ragxplorer import RAGxplorer

File ~/Desktop/projects/RAGxplorer/ragxplorer/init.py:7
1 """
2 init.py
3
4 Initializes the ragxplorer package and exposes the main classes and functions.
5 """
----> 7 from .ragxplorer import RAGxplorer
9 all = ['RAGxplorer']

File ~/Desktop/projects/RAGxplorer/ragxplorer/ragxplorer.py:19
11 import pandas as pd
13 from chromadb.utils.embedding_functions import (
14 SentenceTransformerEmbeddingFunction,
15 OpenAIEmbeddingFunction,
16 HuggingFaceEmbeddingFunction
17 )
---> 19 import plotly.graph_objs as go
21 from .rag import (
22 build_vector_database,
23 get_doc_embeddings,
24 get_docs,
25 query_chroma
26 )
28 from .projections import (
29 set_up_umap,
30 get_projections,
31 prepare_projections_df,
32 plot_embeddings
33 )

ModuleNotFoundError: No module named 'plotly' . Any idea?

Answer 4 · 2024-02-01T06:12:59.000Z

Hi @alhridoy

Are you using a virtual environment? Could you run pip freeze and provide the results?

Answer 5 · 2024-02-02T18:35:21.000Z

Hi @gabrielchua Thanks. Yes i used virtual environment. Here is the result of pip freeze.

absl-py==2.0.0
abstract_singleton==1.0.1
affine==2.4.0
aiofiles==23.2.1
aiohttp==3.8.4
aiosignal==1.3.1
anthropic==0.3.6
anyio==3.7.1
appdirs==1.4.4
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asciitree==0.3.3
asgiref==3.7.2
astor==0.8.1
asttokens==2.4.1
astunparse==1.6.3
async-generator==1.10
async-lru==2.0.4
async-timeout==4.0.2
asynctest==0.13.0
attr==0.3.2
attrs==23.1.0
auto_gpt_plugin_template==0.0.3
autoflake==2.1.1
autopep8==2.0.2
Babel==2.14.0
backcall==0.2.0
backoff==2.2.1
bcrypt==4.0.1
beautifulsoup4==4.12.2
bert4keras==0.11.4
black==23.3.0
bleach==6.1.0
blessed==1.20.0
blis==0.7.9
boltons==21.0.0
bracex==2.4
cachetools==5.3.0
camel-converter==3.0.3
catalogue==2.0.8
certifi==2023.11.17
cffi==1.15.1
cfgv==3.3.1
channels==4.0.0
chardet==5.1.0
charset-normalizer==3.1.0
chroma-hnswlib==0.7.3
chromadb==0.4.15
click==8.1.3
click-option-group==0.5.6
click-plugins==1.1.1
cligj==0.7.2
cloudpickle==2.2.1
colorama==0.4.6
coloredlogs==15.0.1
comm==0.2.1
confection==0.0.4
contourpy==1.1.1
coverage==7.2.3
crcmod==1.7
cryptography==40.0.2
cssselect==1.2.0
cymem==2.0.7
dataclasses-json==0.5.14
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
dill==0.3.1.1
distlib==0.3.6
distro==1.8.0
Django==4.2.2
dnspython==2.3.0
docker==6.0.1
docopt==0.6.2
docutils==0.18.1
duckduckgo-search==2.8.6
earthengine-api==0.1.374
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl
et-xmlfile==1.1.0
exceptiongroup==1.1.1
executing==2.0.0
expecttest==0.1.6
face==22.0.0
faiss-cpu==1.7.4
fastapi==0.104.1
fastavro==1.8.4
fasteners==0.19
fastjsonschema==2.18.1
filelock==3.12.0
fiona==1.9.5
fire==0.4.0
flake8==6.0.0
flatbuffers==23.5.26
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2023.9.2
geopandas==0.14.0
ghp-import==2.1.0
git-python==1.0.3
gitdb==4.0.10
GitPython==3.1.31
glom==22.1.0
google-api-core==2.11.0
google-api-python-client==2.86.0
google-auth==2.17.3
google-auth-httplib2==0.1.0
google-cloud-core==2.3.3
google-cloud-storage==2.11.0
google-crc32c==1.5.0
google-resumable-media==2.6.0
googleapis-common-protos==1.59.0
greenlet==3.0.0
grpcio==1.59.0
grpcio-tools==1.59.0
gTTS==2.3.1
h11==0.14.0
h2==4.1.0
h5py==3.8.0
hpack==4.0.0
httpcore==0.17.0
httplib2==0.22.0
httptools==0.6.1
httpx==0.24.0
huggingface-hub==0.16.4
humanfriendly==10.0
hyperframe==6.0.1
hypothesis==6.88.1
identify==2.5.22
idna==3.4
imagesize==1.4.1
immutabledict==3.0.0
importlib-metadata==6.8.0
importlib-resources==6.1.0
iniconfig==2.0.0
inquirer==3.1.3
ipykernel==6.29.0
ipython==8.20.0
ipywidgets==8.1.1
isodate==0.6.1
isoduration==20.11.0
isort==5.12.0
jedi==0.19.1
Jinja2==3.1.2
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.1
jsonschema-spec==0.2.4
jsonschema-specifications==2023.7.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.0.11
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kubernetes==28.1.0
lancedb==0.1.16
langchain==0.0.231
langchainplus-sdk==0.0.20
langcodes==3.3.0
lazy-object-proxy==1.9.0
libcst==1.0.1
litellm==0.1.824
locket==1.0.0
loguru==0.6.0
lxml==4.9.2
Markdown==3.3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.2
marshmallow==3.20.1
matplotlib-inline==0.1.6
mccabe==0.7.0
mdurl==0.1.2
meilisearch==0.21.0
mergedeep==1.3.4
-e git+https://github.com/geekan/metagpt@ee4d59cd396813be5e5fb674f9c7a40184ad86c9#egg=metagpt
mistune==3.0.2
mkdocs==1.4.2
monotonic==1.6
more-itertools==10.1.0
mpmath==1.3.0
multidict==6.0.4
murmurhash==1.0.9
mypy-extensions==1.0.0
nbclient==0.9.0
nbconvert==7.14.2
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.1
nltk==3.8.1
nodeenv==1.7.0
notebook==7.0.7
notebook_shim==0.2.3
numcodecs==0.12.0
numexpr==2.8.7
numpy==1.24.3
oauthlib==3.2.2
objsize==0.6.1
onnxruntime==1.16.1
open-interpreter==0.1.7
openai==0.28.1
openapi-core==0.18.1
openapi-python-client==0.13.4
openapi-schema-pydantic==1.2.4
openapi-schema-validator==0.6.2
openapi-spec-validator==0.6.0
openpyxl==3.1.2
opentelemetry-api==1.20.0
opentelemetry-exporter-otlp-proto-common==1.20.0
opentelemetry-exporter-otlp-proto-grpc==1.20.0
opentelemetry-proto==1.20.0
opentelemetry-sdk==1.20.0
opentelemetry-semantic-conventions==0.41b0
optree==0.9.2
orjson==3.9.8
outcome==1.2.0
overrides==7.4.0
packaging==23.1
pandas==2.0.3
pandocfilters==1.5.1
parse==1.19.1
parso==0.8.3
pathable==0.4.3
pathspec==0.11.1
pathy==0.10.1
peewee==3.17.0
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.5.0
pinecone-client==2.2.1
platformdirs==3.2.0
playsound==1.2.2
plotly==5.18.0
pluggy==1.0.0
portalocker==2.8.2
posthog==3.0.2
prance==23.6.21.0
pre-commit==3.2.2
preshed==3.0.8
prometheus-client==0.19.0
prompt-toolkit==3.0.43
proto-plus==1.22.3
protobuf==4.22.3
psutil==5.9.5
ptyprocess==0.7.0
pulsar-client==3.3.0
pure-eval==0.2.2
py==1.11.0

Answer 6 · 2024-02-03T03:11:45.000Z

Do you mind creating a new virtualenv and just do pip install ragxplorer and pip install jupyterlab

Answer 7 · 2024-02-06T14:12:42.000Z

I'd like to help too. Could somebody who's been working on this (@alhridoy ?) already share their ipynb so that we don't have to 'relearn' what they have already learned?

Answer 8 · 2024-02-08T01:11:27.000Z

Thanks for adding the ipynb! Just a few points/request:

is there a way to specify we want to use a GPU? (it's VERY slow now)
could you also add a code snippet on how to feed it a text column from - for instance - a pandas dataframe that contains multiple rows representing multiple documents ? (e.g. I use GROBID to parse out things like headers and footers and footnotes etc. - and then get a 'text' column with the full text of a pdf)? Or elements from a database?
we now see 'retrieved', 'chunks' and 'original query' in the viz, and can hover over these - but could we also make the 'fill' of those text boxes transparent, so that we can still see where the other nodes are that we may want to explore? And could we also see the file name (for instance) in that box that the displayed text is coming from?
could we also add an explanation of what exactly these 'retrieved' and 'chunks' categories are?
finally - is there a way to ALSO use the LLM to provide an actual answer to the query (as opposed to just the viz)?

Just suggestions!