/demo-generativeai-with-java

Show how to build a project to do generative Ai with Java

Primary LanguageJava

Demo of Generative AI with Java

Gitpod ready-to-code License Apache2 Discord

📋 Table of content

Week 1

Week 2

WEEK1

1 - Create your DataStax Astra account

ℹ️ Account creation tutorial is available in awesome astra

click the image below or go to https://astra.datastax./com


2 - Create an Astra Token

ℹ️ Token creation tutorial is available in awesome astra

  • Locate Settings(#1) in the menu on the left, thenToken Management` (#2)

  • Select the role Organization Administrator before clicking [Generate Token]

The Token is in fact three separate strings: a Client ID, a Client Secret and the token proper. You will need some of these strings to access the database, depending on the type of access you plan. Although the Client ID, strictly speaking, is not a secret, you should regard this whole object as a secret and make sure not to share it inadvertently (e.g. committing it to a Git repository) as it grants access to your databases.

{
  "ClientId": "ROkiiDZdvPOvHRSgoZtyAapp",
  "ClientSecret": "fakedfaked",
  "Token":"AstraCS:fake"
}

3 - Copy the token value in your clipboard

You can also leave the windo open to copy the value in a second.

4 - Open Gitpod

↗️ Right Click and select open as a new Tab...

Open in Gitpod

5 - Set up the CLI with your token

In gitpod, in a terminal window:

  • Login
astra login --token AstraCS:fake
  • Validate your are setup
astra org

Output

gitpod /workspace/workshop-beam (main) $ astra org
+----------------+-----------------------------------------+
| Attribute      | Value                                   |
+----------------+-----------------------------------------+
| Name           | cedrick.lunven@datastax.com             |
| id             | f9460f14-9879-4ebe-83f2-48d3f3dce13c    |
+----------------+-----------------------------------------+

6 - Create destination Database and a keyspace

ℹ️ You can notice we enabled the Vector Search capability

  • Create db workshop_beam and wait for the DB to become active
astra db create demo-genai -k genai --vector --if-not-exists

💻 Output

[INFO]  Database 'demo-genai' does not exist. Creating database 'demo-genai' with keyspace 'genai'
[INFO]  Enabling vector search for database demo-genai
[INFO]  Database 'demo-genai' and keyspace 'genai' are being created.
[INFO]  Database 'demo-genai' has status 'PENDING' waiting to be 'ACTIVE' ...
[INFO]  Database 'demo-genai' has status 'ACTIVE' (took 112341 millis)
[OK]    Database 'demo-genai' is ready.
  • List databases
astra db list

💻 Output

+--------------------------+--------------------------------------+-----------+-------+---+-----------+
| Name                     | id                                   | Regions   | Cloud | V | Status    |
+--------------------------+--------------------------------------+-----------+-------+---+-----------+
| demo-genai               | 9e54ff00-57e2-47ed-8699-f94d5dd11b6f | us-east1  | gcp   | ■ | ACTIVE    |
+--------------------------+--------------------------------------+-----------+-------+---+-----------+
  • Describe your db
astra db describe demo-genai

💻 Output

+------------------+-----------------------------------------+
| Attribute        | Value                                   |
+------------------+-----------------------------------------+
| Name             | demo-genai                              |
| id               | 9e54ff00-57e2-47ed-8699-f94d5dd11b6f    |
| Status           | ACTIVE                                  |
| Cloud            | GCP                                     |
| Regions          | us-east1                                |
| Default Keyspace | genai                                   |
| Creation Time    | 2023-09-12T08:55:36Z                    |
|                  |                                         |
| Keyspaces        | [0] genai                               |
|                  |                                         |
|                  |                                         |
| Regions          | [0] us-east1                            |
|                  |                                         |
+------------------+-----------------------------------------+

7 - Setup env variables

  • Create .env file with variables
astra db create-dotenv demo-genai 
  • Display the file
cat .env
  • Load env variables
set -a
source .env
set +a
env | grep ASTRA

8 - Register to OpenAI

  • In your profile, go to View API KEYS, create a new key and copy the value in your clipboard. You have a free trial for a month of so.

EXPORT OPENAI_API_KEY=<key>

9 - Setup project

This command will allows to validate that Java , maven and lombok are working as expected and you can connect.

Note: To create the project i simply when with the astra sdk arachetype as follow

mvn archetype:generate \
-DarchetypeGroupId=com.datastax.astra \
-DarchetypeArtifactId=spring-boot-3x-archetype \
-DarchetypeVersion=0.6.9 \
-DinteractiveMode=false \
-DgroupId=com.datastax.demo \
-DartifactId=genai-demo \
-Dversion=1.0-SNAPSHOT

and added the vector dependency:

<dependency>
  <groupId>com.datastax.astra</groupId>
  <artifactId>astra-sdk-vector</artifactId>
  <version>${astra-sdk-starter.version}</version>
</dependency>
  • Run connection test:
mvn test -Dtest=ConnectionTest#shouldBeConnectedTest
  • Run OpenAI Test:

mvn test -Dtest=OpenAiTest#shouldTestOpenAICreateEmbeddings

10 - Vector Search

  • Ingest data

mvn test -Dtest=GenerativeAITest#shouldIngestDocuments
  • Open a cqlsh (in a new terminal)
astra db cqlsh genai-demo -k genai
select row_id, metadata_s, blob_text, vector from philosophers
  • Similarity Search
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotes
  • Similarity Search + MetaData (by Author)
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesFilteredByAuthor
  • Similarity Search + MetaData (by Tags)
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesFilteredByTags
  • Similarity Search with a threshold
mvn test -Dtest=GenerativeAITest#shouldSimilaritySearchQuotesWithThreshold

11 - RAG for Retrieve Augmented Generation

The Full Monty.....

mvn test -Dtest=GenerativeAITest#shouldGenerateQuotesWithRag

WEEK 2

12 - Setup Project

  • Check list of running db
astra db list
  • Resume Db if needed (or create a new once)
astra db resume langchain4j
astra db create langchain4j --if-not-exists
  • Make sure you setup the env variables ($ASTRA_APPLICATION_TOKEN)
astra db create-dotenv langchain4j
set -a
source .env
set +a
env | grep ASTRA

Go the application.yaml and check values are correct for your

astra:
  database:
    name: langchain4j
    keyspace: langchain4j
    table: langchain4j

13 - Ingest Document

@Test
@DisplayName("02. Should Ingest a document")
@EnabledIfEnvironmentVariable(named = "ASTRA_DB_APPLICATION_TOKEN", matches = "Astra.*")
@EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = "sk.*")
void should_Ingest_Document() {

  Document document = FileSystemDocumentLoader.loadDocument(path, DocumentType.TXT);
  DocumentSplitter splitter = DocumentSplitters
        .recursive(100, 10,
        new OpenAiTokenizer(GPT_3_5_TURBO));
  
  EmbeddingStoreIngestor.builder()
     .documentSplitter(splitter)
     .embeddingModel(embeddingModel)
     .embeddingStore(embeddingStore)
     .build().ingest(document);
}

14 - Chat Completion

  @Test
@DisplayName("03. Should Chat Completion")
@EnabledIfEnvironmentVariable(named = "ASTRA_DB_APPLICATION_TOKEN", matches = "Astra.*")
@EnabledIfEnvironmentVariable(named = "OPENAI_API_KEY", matches = "sk.*")
void should_chat_completion(){
        .. //check code in the class
}