Android Semantic Search Sample App

This sample app demonstrates how to implement vector search with embeddings for English keywords in an Android application using Java.

Features

Transforms text queries into vector embeddings
Performs semantic search using vector similarity
Displays search results in a sorted list based on relevance
Uses MiniLM as the embedding model
Uses ObjectBox and alternative Apache Lucene for vector search capabilities

Requirements

Android Studio 2024
Android SDK 30+ (Android 11.0 or higher)
JDK 8+
Tested with Samsung S-24

Setup Instructions

Clone the repository or download the source code
Download the required model files:
- Create an assets folder in your Android project's src/main directory
- Download the MiniLM model file and save it as minilm_l6_v2.tflite in the assets folder
- Create a bert_vocab.txt file in the assets folder with BERT's vocabulary
Sync the project with Gradle files
Build and run the app on your device or emulator

Model Information

This sample uses the MiniLM-L6-v2 model which creates 384-dimensional embeddings. It's a smaller and faster alternative to larger language models while still providing excellent performance for semantic search tasks.

Implementation Details

Components

EmbeddingModel: Interface defining the embedding generation functionality
MiniLMEmbeddingModel: TensorFlow Lite implementation of the embedding model
BertTokenizer: Simple tokenizer for processing text input
VectorDatabase: Interface for vector database operations
LuceneVectorDatabase: Implementation using Apache Lucene's KNN vector search
ObjectBoxVectorDatabase: Implementation using ObjectBox vector search
MainActivity: Main UI for the application
ResultsAdapter: Adapter for displaying search results

How It Works

The app initializes the embedding model and vector database on startup
Sample text documents are added to the database with their vector embeddings
When a user enters a search query, it's converted to an embedding vector
The app performs ObjectBox or Lucene KNN vector search to find the most semantically similar documents
Results are displayed in a RecyclerView sorted by similarity score

Dependencies

TensorFlow Lite: For running the embedding model
Apache Lucene: For vector search capabilities
ObjectBox: For alternative vector search capabilities
AndroidX: For UI components and RecyclerView

Notes

This is a simplified implementation for demonstration purposes
For production use, consider:
- Using a more sophisticated tokenizer
- Implementing asynchronous data loading
- Adding caching mechanisms for embeddings
- Storing the vector database persistently

Contact

You can contribute to this project by pull request. Authors email: hissain.khan@gmail.com

Thanks

hissain/AndroidSemanticSearch