A distributed search engine with etcd-based consensus, object store persistence, and a RESTful API.
- Full-text search using Tantivy
- Distributed consensus using etcd for leader election
- Object store persistence (local filesystem, with AWS S3 support)
- RESTful API with basic health checks and cluster status
- Configurable index buffer size and search result limits
- Leader election with automatic failover
- Rust 1.75 or later
- etcd (single node or cluster)
- Storage (local filesystem or S3)
-
Clone the repository:
git clone https://github.com/yourusername/eureka.git cd eureka
-
Build the project:
cargo build --release
Create a config.json
file with the following structure:
{
"index_buffer_size": 50000000,
"storage_path": "object_store",
"index_prefix": "index_data",
"max_search_results": 100,
"etcd_endpoints": ["http://localhost:2379"],
"heartbeat_interval_secs": 5,
"lock_ttl_secs": 30,
"retry_interval_secs": 1
}
Option | Description | Default |
---|---|---|
index_buffer_size |
Size of the index buffer in bytes | 50MB |
storage_path |
Path to the object store | "object_store" |
index_prefix |
Prefix for index files in the object store | "index_data" |
max_search_results |
Maximum number of results to return | 100 |
etcd_endpoints |
List of etcd endpoints | ["http://localhost:2379"] |
heartbeat_interval_secs |
Interval between heartbeats | 5 |
lock_ttl_secs |
Time-to-live for leader lock | 30 |
retry_interval_secs |
Interval between leader election retries | 1 |
auth_enabled |
Enable API authentication | false |
auth_token |
API authentication token | null |
cargo run --release -- --config-path config.json serve --port 3000
The server will:
- Connect to the etcd cluster
- Participate in leader election
- Load the index from the object store
- Start the REST API server
cargo run --release -- --config-path config.json index
cargo run --release -- --config-path config.json search "your query"
Endpoint | Method | Description |
---|---|---|
/search?q=query&limit=10 |
GET | Search for documents |
/index |
POST | Index new documents |
/health |
GET | Health check endpoint |
/cluster/status |
GET | Get cluster status information |
# Search
curl "http://localhost:3000/search?q=your+query&limit=10"
# Index documents
curl -X POST http://localhost:3000/index \
-H "Content-Type: application/json" \
-d '{
"documents": [
{
"title": "Example Document",
"body": "This is an example document to index."
}
]
}'
# Health check
curl http://localhost:3000/health
# Cluster status
curl http://localhost:3000/cluster/status
For a single-node etcd instance (development):
etcd --listen-client-urls http://localhost:2379 \
--advertise-client-urls http://localhost:2379
For production, a multi-node etcd cluster is recommended.
- Deploy multiple instances of the search engine pointing to the same etcd cluster
- Use a simple load balancer (like Nginx or HAProxy) to distribute requests
- Only the leader node will handle write operations (indexing)
- All nodes can handle read operations (searching)
The following features are planned for future releases:
- TLS support for secure etcd communication
- Full AWS S3, Google Cloud Storage, and Azure Blob Storage support
- Scatter-gather search across cluster nodes
- Metrics and monitoring integration
- Index optimization commands
- Enhanced security features
- Comprehensive backup and recovery procedures
# Debug build
cargo build
# Release build
cargo build --release
# Run all tests
cargo test
# Run specific tests
cargo test search
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.