gabrielchua/RAGxplorer

Data Privacy

chintanckg opened this issue · 3 comments

Details

Please help us understand whether the uploaded pdfs on the hosted webapp are stored somewhere or are instantly deleted after the session?

Sweeping

✨ Track Sweep's progress on our progress dashboard!


25%

Sweep Basic Tier: I'm using GPT-4. You have 4 GPT-4 tickets left for the month and 2 for the day. (tracking ID: dcc60b0f50)

For more GPT-4 tickets, visit our payment portal. For a one week free trial, try Sweep Pro (unlimited GPT-4 tickets).

Tip

I can email you when I complete this pull request if you set up your email here!

Install Sweep Configs: Pull Request

Actions (click)

  • ↻ Restart Sweep

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description.

def load_pdf(self, document_path: str, chunk_size: int = 1000, chunk_overlap: int = 0, verbose: bool = False, umap_params: dict = None):
"""
Load data from a PDF file and prepare it for exploration.
Args:
document: Path to the PDF document to load.
chunk_size: Size of the chunks to split the document into.
chunk_overlap: Number of tokens to overlap between chunks.
"""
if verbose:
print(" ~ Building the vector database...")
self._vectordb = build_vector_database(document_path, chunk_size, chunk_overlap, self._chosen_embedding_model)
if verbose:
print("Completed Building Vector Database ✓")
self._documents.embeddings = get_doc_embeddings(self._vectordb)
self._documents.text = get_docs(self._vectordb)
self._documents.ids = self._vectordb.get()['ids']
if verbose:
print(" ~ Reducing the dimensionality of embeddings...")
self._projector = set_up_umap(embeddings=self._documents.embeddings, umap_params=umap_params)
self._documents.projections = get_projections(embedding=self._documents.embeddings,
umap_transform=self._projector)
self._VizData.base_df = prepare_projections_df(document_ids=self._documents.ids,
document_projections=self._documents.projections,
document_text=self._documents.text)
if verbose:
print("Completed reducing dimensionality of embeddings ✓")


Step 2: ⌨️ Coding

Working on it...


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description.
Something wrong? Let us know.

This is an automated message generated by Sweep AI.

Sweeping

25%


Actions (click)

  • ↻ Restart Sweep

❌ Unable to Complete PR

I'm sorry, but it looks like an error has occurred due to a planning failure. Feel free to add more details to the issue description so Sweep can better address it. Alternatively, reach out to Kevin or William for help at https://discord.gg/sweep.

For bonus GPT-4 tickets, please report this bug on Discord (tracking ID: dcc60b0f50).


🎉 Latest improvements to Sweep:
  • New dashboard launched for real-time tracking of Sweep issues, covering all stages from search to coding.
  • Integration of OpenAI's latest Assistant API for more efficient and reliable code planning and editing, improving speed by 3x.
  • Use the GitHub issues extension for creating Sweep issues directly from your editor.

💡 To recreate the pull request edit the issue title or description.

Hi @chintanckg , the demo does not persist the pdfs!

it's hosted on streamlit's community cloud and are only used in memory

The demo's repo is here: https://github.com/gabrielchua/RAGxplorer-demo