A high-performance Go application for migrating large SQLite databases into multiple shards based on mobile phone numbers. This tool is designed to handle billions of records efficiently using parallel processing and optimized SQLite operations.
- Parallel Processing: Utilizes multiple readers to extract data from the source database
- Automatic Sharding: Distributes data across multiple SQLite databases based on SHA256 hashing
- Progress Tracking: Saves migration progress to allow resuming interrupted processes
- Memory Management: Includes automatic garbage collection and memory optimization
- Structured Logging: Comprehensive JSON logging with performance metrics
- Index Management: Automatically creates necessary indexes on shard databases
- Reserved Shard: Special shard for handling null or empty values
- Go 1.23+ (tested with Go 1.24.1)
- SQLite3
- Sufficient disk space for both source database and shards
git clone https://github.com/sarff/shard_migrate.git
cd shard_migrate
go mod downloadThe application uses environment variables for configuration. Copy the .env.example file to .env and adjust the values:
cp .env.example .envConfiguration options:
| Variable | Description | Default Value |
|---|---|---|
| SOURCE_DB | Path to source SQLite database | clients.db |
| TABLE_NAME | Name of the table to migrate | clients |
| SHARD_DIR | Directory to store shard databases | /shards |
| LOG_DIR | Directory for log files | ./tmp |
| NUM_SHARDS | Number of shards to create | 10 |
| RESERVED_SHARD | Index of reserved shard for null values | 10 |
| BATCH_SIZE | Number of records per batch | 6000 |
| READERS | Number of parallel readers | 6 |
| TOTAL_ROWS | Total number of rows to migrate | 1217065012 |
| GARB_COL_TICKER_MINUTES | Garbage collection interval (minutes) | 5 |
Run the migration:
go run cmd/main.goThe application will:
- Load configuration from environment variables
- Create shard databases if they don't exist
- Set up indexes on INN, MOBILE_NUMBER, and SNILS columns
- Start parallel readers and workers
- Begin migrating data with progress logging
.
├── cmd/
│ └── main.go # Main application entry point
├── internal/
│ ├── config/
│ │ └── config.go # Configuration management
│ ├── database/
│ │ └── database.go # Database operations
│ ├── progress/
│ │ └── progress.go # Progress tracking
│ └── shard/
│ ├── shard.go # Sharding logic
│ └── shard_test.go # Sharding tests
├── tmp/
│ └── logs/ # Log files directory
├── .env.example # Example configuration
├── go.mod # Go module file
└── go.sum # Go dependencies
- Data Extraction: Multiple readers extract data from the source database in parallel
- Sharding Logic: Each record is assigned to a shard based on:
- SHA256 hash of the MOBILE_NUMBER field
- Modulo operation to determine shard index
- Records with null/empty values go to the reserved shard
- Data Loading: Workers write batches of data to their respective shard databases
- Progress Saving: Periodically saves progress to resume from last checkpoint if interrupted
- WAL Mode: Write-Ahead Logging for better concurrency
- MMAP: Memory-mapped I/O for faster reads (1GB)
- Batch Processing: Reduces transaction overhead
- Memory Management: Periodic garbage collection to prevent memory leaks
- Concurrent Processing: Multiple readers and workers for parallel execution
The application provides:
- Real-time progress updates with rows/second metrics
- Memory usage statistics
- Per-shard performance logging
- Structured JSON logs for easy parsing
- Graceful shutdown on interruption
- Automatic progress recovery on restart
- Detailed error logging with context
- Transaction rollback on failures
- Designed specifically for SQLite databases
- Requires sufficient memory for parallel operations
- Sharding is based on MOBILE_NUMBER field (must exist in source table)
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is open source and available under the MIT License.
- Uses the
modernc.org/sqlitepure Go SQLite driver - Built with the power of Go's concurrency model