A simple data generator with export capability to various destinations
Require Go version 1.24+
- Clone the repository (alternatively download a latest release)
- Copy
config.sample.yaml
and edit as needed. - Start the data generator with config file as the only parameter
go run cmd/main.go --config ./config.yaml
Tip
Use --debug
flag for debug level logs.
Given below are the supported configuration options.
Check config.sample.yaml
for reference.
Given below are supported input types and their related environment variable overrides,
YAML Property | Environment Variable | Description |
---|---|---|
type |
ENV_INPUT_TYPE |
Specifies the input data type (e.g., LOGS , METRICS , ALB , VPC , CLOUDTRAIL , WAF ). |
delay |
ENV_INPUT_DELAY |
Delay between a data point. Accepts value in format like 5s (5 seconds), 10ms (10 milliseconds). |
batching |
ENV_INPUT_BATCHING |
Set time delay between data batches. Accepts a time value similar to delay. Default is set to 0 (no batching). |
max_batch_size |
ENV_INPUT_MAX_BATCH_SIZE |
Set maximum byte size of a batch. Default is to ignore (no max size). |
max_data_points |
ENV_INPUT_MAX_DATA_POINTS |
Set maximum amount of data points to generate. Default is to ignore (no max limit). |
max_runtime |
ENV_INPUT_MAX_RUNTIME |
Set maximum duration load generator will run. Default is to ignore (no max limit). |
Note about input type
,
LOGS
: ECS (Elastic Common Schema) formatted logs based on zapMETRICS
: Generate metrics similar to a CloudWatch metrics entryALB
: Generate AWS ALB formatted log with some random contentVPC
: Generate AWS VPC formatted logs with randomized contentCLOUDTRAIL
: Generate AWS CloudTrail formatted logs with randomized content. Data is generated for AWS S3 Data Event.WAF
: Generate AWS WAF formatted logs with randomized content
Example:
input:
type: LOGS # Input type LOGS
delay: 500ms # 500 milliseconds between each data point
batching: 10s # Emit generated data batched within 10 seconds
max_batch_size: 10000 # Limit maximum batch size to 10,000 bytes. The output is capped at 1000 bytes/second max
max_data_points: 10000 # Exit input after generating 10,000 data points
Tip
When max_batch_size is reached, elapsed time for batching will be considered before generating new data
Given below are supported output types (environment variable ENV_OUT_TYPE
),
- FILE: Output to a file
- FIREHOSE: Output to a Firehose stream
- CLOUDWATCH_LOG: Output to a CloudWatch log group
- S3: Output to a S3 bucket
Sections below provide output specific configurations
YAML Property | Environment Variable | Description |
---|---|---|
location |
ENV_OUT_LOCATION |
Output file location. Default to ./out . When batching, file suffix will increment with numbers (e.g., out_0 , out_2 ). |
Example:
output:
type: FILE
config:
location: "./data"
YAML Property | Environment Variable | Description |
---|---|---|
s3_bucket |
ENV_OUT_S3_BUCKET |
S3 bucket name (required). |
compression |
ENV_OUT_COMPRESSION |
To compress or not the output. Currently supports gzip . |
path_prefix |
ENV_OUT_PATH_PREFIX |
Optional prefix for the bucket entry. Default to logFile- . |
Example:
output:
type: S3
config:
s3_bucket: "testing-bucket"
compression: gzip
path_refix: "datagen"
YAML Property | Environment Variable | Description |
---|---|---|
stream_name |
ENV_OUT_STREAM_NAME |
Firehose stream name (required). |
Example:
output:
type: FIREHOSE
config:
stream_name: "my-firehose-stream"
YAML Property | Environment Variable | Description |
---|---|---|
log_group |
ENV_OUT_LOG_GROUP |
CloudWatch log group name. |
log_stream |
ENV_OUT_LOG_STREAM |
Log group stream name. |
Example:
output:
type: CLOUDWATCH_LOG
config:
logGroup: "MyGroup"
logStream: "data"
Currently, this project only support AWS Cloud Service Provider (CSP). Given below are available configurations,
YAML Property | Environment Variable | Description |
---|---|---|
region |
AWS_REGION |
Region to use by exporters. Default is us-east-1 . |
profile |
AWS_PROFILE |
Credential profile to use by exporters. Default is default . |
Example:
aws:
region: "us-east-1"
profile: "default"
Generate ECS-formatted logs every 2s, batch them in 10 seconds and forward to S3 bucket
input:
type: LOGS
delay: 2s
batching: 10s
output:
type: s3
config:
s3_bucket: "testing-bucket"
Generate ALB logs. No delay between data points (continuous data generating).
Limit batching to 10 seconds and max batch size is set to 10MB. This translates to ~1 MB/second data load.
S3 files will be in gzip
format.
input:
type: ALB
delay: 0s
batching: 10s
max_batch_size: 10000000
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"
Generate VPC logs and limit to 2 data points. Then upload it to S3 in gzip
format.
input:
type: VPC
delay: 1s
max_data_points: 2
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"
Generate CLOUDTRAIL logs and limit to generator runtime of 5 minutes.
input:
type: CLOUDTRAIL
delay: 10us # 10 microseconds between data points
batching: 10s
max_runtime: 5m # 5 minutes
output:
type: s3
config:
s3_bucket: "testing-bucket"
compression: "gzip"