weaviate/weaviate-python-client

[DX] Create objects from other objects

CShorten opened this issue · 2 comments

What

Say we have a List[str] property in one Weaviate collection such as chunks, or a JSON property, we then want an API to populate another collection with each string value, potentially inheriting other properties of the collection as well.

Why

We believe one of the killer use cases of GFLs is for an LLM to chunk long documents such as PDFs into chunks and metadata descriptions, thus we have a JSON property that stores the list of chunks and metadata strings per entry. It would be great to have an API that flows this from say "WeaviateBlogPosts" --> "WeaviateBlogChunks"

How

weaviate_blog_posts.data.transfer(
  to_collection="WeaviateBlogChunks",
  split_properties="ChunkAndMetadataJSON",
  inherit_properties=["title", "author", "date_published"],
  add_cref=true,
  uuids=uuids
)

Assuming ChunkAndMetadataJSON is populated with a GFL such as:

weaviate_blog_posts.data.gfl.update(
  instruction="Please break up this markdown file into semantic chunks with metadata further description their context in the original document",
  view_properties=["content"],
  on_property=["ChunkAndMetadataJSON"],
  uuids=uuids
)

^ We still need to figure out how we can interface composite types like this to the GFL. So alternately this could be a List[ChunkWithMetadata] type.

This was a use-case brought to me by @jfrancoa also since it is a frequent journey to be able to migrate collections either within or between instances. Developing this during the next sprint would be a good idea, I think!

Awesome!! Super happy to hear it, thanks @tsmith023!