A Go application for testing and comparing different search approaches using Manticore Search, including full-text search, vector search, and hybrid search methods.
- HTTP API: RESTful API for search operations
- Document Processing: Parse markdown files and extract structured data
- Multiple Search Modes:
- Basic text search (simple string matching)
- Full-text search using Manticore Search with BM25 scoring
- Vector search using TF-IDF vectors and cosine similarity
- Hybrid search combining full-text and vector approaches
- TF-IDF Vectorization: Custom implementation for semantic search
- Manticore Integration: HTTP JSON API interface with Manticore Search
- Docker Support: Easy setup with Docker Compose
- Go 1.23 or higher
- Docker and Docker Compose (for Manticore Search)
- Make (optional, for using Makefile commands)
-
Clone the repository:
git clone <repository-url> cd manticore-search-tester
-
Start Manticore Search:
make docker-up # or docker-compose up -d -
Build and run the application:
make run # or make build && ./bin/manticore-search-tester
-
Test the API:
curl "http://localhost:8080/api/search?query=сайт&mode=basic" curl "http://localhost:8080/api/status"
.
├── cmd/
│ └── server/ # Application entry point
│ └── main.go
├── internal/ # Private application code
│ ├── document/ # Document parsing and processing
│ ├── handlers/ # HTTP request handlers
│ ├── manticore/ # Manticore Search client
│ ├── models/ # Data models and types
│ ├── search/ # Search engine implementations
│ └── vectorizer/ # TF-IDF vectorization
├── pkg/ # Public API types
│ └── api/
├── data/ # Sample markdown documents
├── bin/ # Built binaries
├── docker-compose.yml # Docker setup for Manticore Search
├── Dockerfile # Application container
├── Makefile # Build and development commands
├── .air.toml # Hot reload configuration
└── go.mod # Go module dependencies
Search across indexed documents with different modes.
Parameters:
query(required): Search query stringmode(optional):basic,fulltext,vector, orhybrid(default:basic)page(optional): Page number (default: 1)limit(optional): Results per page, 1-100 (default: 10)
Example:
curl "http://localhost:8080/api/search?query=добавить блок&mode=fulltext&page=1&limit=5"Get service health and status information.
Example:
curl "http://localhost:8080/api/status"Manually trigger document reindexing.
Example:
curl -X POST "http://localhost:8080/api/reindex"# Build the application
make build
# Run the application
make run
# Run in development mode with auto-restart
make dev
# Start full development environment (Docker + dev server)
make dev-full
# Run tests
make test
# Test API endpoints
make test-api
# Docker commands
make docker-up # Start Manticore Search
make docker-down # Stop Manticore Search
make docker-logs # View Docker logs
# Code quality
make fmt # Format code
make lint # Lint code (requires golangci-lint)
# Build for multiple platforms
make build-all
# Create release archives
make release
# Install development tools
make install-tools
# Show all available commands
make help# Install dependencies
go mod download && go mod tidy
# Build
go build -o bin/manticore-search-tester ./cmd/server
# Run
./bin/manticore-search-tester
# Test API
./bin/manticore-search-tester test-apiSimple string matching with scoring based on:
- Title matches (higher weight)
- Content matches
- Word frequency
- Document length normalization
Uses Manticore Search's built-in full-text capabilities:
- BM25 scoring algorithm
- Advanced query syntax support
- Optimized for large document collections
Semantic search using TF-IDF vectors:
- Custom TF-IDF implementation
- Cosine similarity scoring
- Handles synonyms and related terms better
Combines full-text and vector search:
- Weighted combination (70% full-text, 30% vector)
- Re-ranking of combined results
- Best of both approaches
MANTICORE_HOST: Manticore Search host (default:localhost:9308)DATA_DIR: Directory containing markdown files (default:./data)PORT: HTTP server port (default:8080)
MANTICORE_HTTP_TIMEOUT: HTTP request timeout (default:60s)MANTICORE_HTTP_MAX_IDLE_CONNS: Maximum idle connections (default:20)MANTICORE_HTTP_MAX_IDLE_CONNS_PER_HOST: Maximum idle connections per host (default:10)MANTICORE_HTTP_IDLE_CONN_TIMEOUT: Idle connection timeout (default:90s)
MANTICORE_HTTP_RETRY_MAX_ATTEMPTS: Maximum retry attempts (default:5)MANTICORE_HTTP_RETRY_BASE_DELAY: Base retry delay (default:500ms)MANTICORE_HTTP_RETRY_MAX_DELAY: Maximum retry delay (default:30s)MANTICORE_HTTP_RETRY_JITTER_PERCENT: Retry jitter percentage (default:0.1)
MANTICORE_HTTP_CB_FAILURE_THRESHOLD: Circuit breaker failure threshold (default:5)MANTICORE_HTTP_CB_RECOVERY_TIMEOUT: Circuit breaker recovery timeout (default:30s)MANTICORE_HTTP_CB_HALF_OPEN_MAX_CALLS: Half-open state max calls (default:3)
Documents should be markdown files with this structure:
# Document Title
**URL:** https://example.com/document-url
Document content goes here...# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose downdocker build -t manticore-search-tester .For improved Auto Embeddings performance, you can enable GPU acceleration:
Prerequisites:
- NVIDIA GPU with CUDA support
- NVIDIA Docker runtime installed
nvidia-container-toolkitornvidia-docker2
Automatic GPU Setup:
# Auto-detect and enable GPU if available
./scripts/gpu-setup.sh
# Force GPU mode
./scripts/gpu-setup.sh --force-gpu
# Check GPU requirements
./scripts/gpu-setup.sh --checkManual GPU Setup:
# With GPU acceleration
docker-compose -f docker-compose.yml -f docker-compose.gpu.yml up
# Without GPU (default)
docker-compose upPerformance Benefits:
- 2-10x faster vector generation for Auto Embeddings
- Reduced CPU load during document indexing
- Better performance for large document sets (500+ documents)
Install Air for hot reload during development:
make install-tools # Installs air and other dev tools
make dev # Start with hot reload- Place markdown files in the
./datadirectory - Call the reindex API:
curl -X POST "http://localhost:8080/api/reindex"
The modular architecture makes it easy to extend:
- Add new search modes: Extend
internal/search/engine.go - Modify scoring: Update scoring algorithms in search implementations
- Add new vectorizers: Implement new vectorization methods in
internal/vectorizer/ - Add new document formats: Extend
internal/document/parser.go
This application uses a custom HTTP JSON API client implementation for Manticore Search. This approach provides several benefits over using third-party libraries:
- Better Control: Direct control over HTTP requests, timeouts, and error handling
- Reduced Dependencies: No longer depends on infrequently updated third-party libraries
- Simplified Architecture: Removed factory pattern complexity, direct client creation
- Enhanced Resilience: Built-in circuit breaker and retry mechanisms
- Improved Performance: Optimized connection pooling and bulk operations
- Better Debugging: Comprehensive logging of all HTTP operations
- Circuit Breaker Pattern: Automatic failure detection and recovery
- Exponential Backoff: Smart retry logic with jitter for network resilience
- Connection Pooling: Efficient HTTP connection management
- Bulk Operations: Optimized batch processing using NDJSON format
- Comprehensive Logging: Detailed request/response logging for troubleshooting
The HTTP client provides robust and efficient operations:
- All search modes work reliably with comprehensive error handling
- Optimized document indexing with intelligent bulk operations
- Consistent API responses with detailed logging
- Built-in resilience patterns (circuit breaker, exponential backoff retry)
- Connection pooling and HTTP keep-alive for optimal performance
cmd/server: Application entry point and initializationinternal/handlers: HTTP request handlers and routinginternal/search: Search engine implementationsinternal/document: Document parsing and processinginternal/manticore: Manticore Search client and operationsinternal/vectorizer: TF-IDF vectorization implementationinternal/models: Shared data models and typespkg/api: Public API response types
HTTP Request → Handlers → Search Engine → Manticore/Vector Search → JSON Response
↓
Markdown Files → Document Parser → TF-IDF Vectorizer → Manticore Indexing
- Indexing: Batch operations for better performance
- Memory: TF-IDF vectors are kept in memory for fast access
- Caching: Connection pooling and prepared statements
- Scaling: Manticore Search handles large document collections efficiently
- API: CORS support for web applications
- Connection refused: Ensure Manticore Search is running (
make docker-up) - No documents found: Check the
./datadirectory exists and contains.mdfiles - Build errors: Run
go mod download && go mod tidyto install dependencies - Port conflicts: Change the
PORTenvironment variable
Enable verbose logging:
export MANTICORE_DEBUG=1
export SEARCH_DEBUG=1Check service status:
curl "http://localhost:8080/api/status"If you're experiencing connection problems with Manticore Search:
-
Check Manticore is running on the correct port:
docker-compose ps curl -X GET "http://localhost:9308/" -
Verify HTTP JSON API is enabled: The application uses Manticore's HTTP JSON API on port 9308 (not the MySQL protocol on port 9306).
-
Test API endpoints manually:
# Health check curl -X GET "http://localhost:9308/" # Test search endpoint curl -X POST "http://localhost:9308/search" \ -H "Content-Type: application/json" \ -d '{"index": "documents", "query": {"match_all": {}}}'
If the circuit breaker is frequently opening:
- Check failure threshold: Lower
MANTICORE_HTTP_CB_FAILURE_THRESHOLDif needed - Increase recovery timeout: Set
MANTICORE_HTTP_CB_RECOVERY_TIMEOUTto a higher value - Monitor logs: Look for repeated connection failures
If requests are timing out or failing:
- Increase timeout: Set
MANTICORE_HTTP_TIMEOUTto a higher value - Adjust retry attempts: Increase
MANTICORE_HTTP_RETRY_MAX_ATTEMPTS - Modify retry delays: Adjust
MANTICORE_HTTP_RETRY_BASE_DELAYandMANTICORE_HTTP_RETRY_MAX_DELAY
For better performance:
- Connection pooling: Increase
MANTICORE_HTTP_MAX_IDLE_CONNSandMANTICORE_HTTP_MAX_IDLE_CONNS_PER_HOST - Keep-alive: Increase
MANTICORE_HTTP_IDLE_CONN_TIMEOUT - Bulk operations: The client automatically uses bulk operations for better throughput
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
make test - Format code:
make fmt - Submit a pull request
This project is provided as-is for educational and testing purposes.