Skip to content

Commit 73ee3ba

Browse files
pepperkickDougManton
authored andcommitted
[exporter/awss3] Add compression option (27872) (open-telemetry#31622)
**Description:** Add `compression` option to compress files using `compress/gzip` library before uploading to S3. **Link to tracking Issue:** Fixes open-telemetry#27872 **Testing:** Sent n number of traces through the S3 exporter using k6 to compare sizes. Used Minio as the S3 backend. | Marshaler | Compression | k6 Requests | k6 Data Sent | S3 Objects | S3 Total Size | | --- | --- | --- | --- | --- | --- | | otlp_json | No | 101 | 118 KB | 101 | 36 KB | | otlp_proto | No | 101 | 118 KB | 101 | 11 KB | | otlp_json | Yes | 101 | 118 KB | 101 | 21 KB | | otlp_proto | Yes | 101 | 118 KB | 101 | 9.9 KB | Additionally, new unit test to check file name. **Documentation:** - Updated README.md file
1 parent 2b19afd commit 73ee3ba

File tree

9 files changed

+185
-30
lines changed

9 files changed

+185
-30
lines changed

.chloggen/add-compression-option.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: enhancement
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: awss3exporter
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: "add `compression` option to enable file compression on S3"
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [ 27872 ]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext: |
19+
Add `compression` option to compress files using `compress/gzip` library before uploading to S3.
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: [ user ]

exporter/awss3exporter/README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,18 +22,19 @@ This exporter targets to support proto/json format.
2222

2323
The following exporter configuration parameters are supported.
2424

25-
| Name | Description | Default |
26-
|:----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|-------------|
27-
| `region` | AWS region. | "us-east-1" |
28-
| `s3_bucket` | S3 bucket | |
29-
| `s3_prefix` | prefix for the S3 key (root directory inside bucket). | |
30-
| `s3_partition` | time granularity of S3 key: hour or minute | "minute" |
31-
| `role_arn` | the Role ARN to be assumed | |
32-
| `file_prefix` | file prefix defined by user | |
33-
| `marshaler` | marshaler used to produce output data | `otlp_json` |
34-
| `endpoint` | overrides the endpoint used by the exporter instead of constructing it from `region` and `s3_bucket` | |
35-
| `s3_force_path_style` | [set this to `true` to force the request to use path-style addressing](http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html) | false |
36-
| `disable_ssl` | set this to `true` to disable SSL when sending requests | false |
25+
| Name | Description | Default |
26+
|:----------------------|:-------------------------------------------------------------------------------------------------------------------------------------------|-------------|
27+
| `region` | AWS region. | "us-east-1" |
28+
| `s3_bucket` | S3 bucket | |
29+
| `s3_prefix` | prefix for the S3 key (root directory inside bucket). | |
30+
| `s3_partition` | time granularity of S3 key: hour or minute | "minute" |
31+
| `role_arn` | the Role ARN to be assumed | |
32+
| `file_prefix` | file prefix defined by user | |
33+
| `marshaler` | marshaler used to produce output data | `otlp_json` |
34+
| `endpoint` | overrides the endpoint used by the exporter instead of constructing it from `region` and `s3_bucket` | |
35+
| `s3_force_path_style` | [set this to `true` to force the request to use path-style addressing](http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html) | false |
36+
| `disable_ssl` | set this to `true` to disable SSL when sending requests | false |
37+
| `compression` | should the file be compressed | none |
3738

3839
### Marshaler
3940

@@ -46,6 +47,10 @@ Marshaler determines the format of data sent to AWS S3. Currently, the following
4647
- `body`: export the log body as string.
4748
**This format is supported only for logs.**
4849

50+
### Compression
51+
- `none` (default): No compression will be applied
52+
- `gzip`: Files will be compressed with gzip. **This does not support `sumo_ic`marshaler.**
53+
4954
# Example Configuration
5055

5156
Following example configuration defines to store output in 'eu-central' region and bucket named 'databucket'.

exporter/awss3exporter/config.go

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,23 @@ package awss3exporter // import "github.com/open-telemetry/opentelemetry-collect
66
import (
77
"errors"
88

9+
"go.opentelemetry.io/collector/config/configcompression"
910
"go.uber.org/multierr"
1011
)
1112

1213
// S3UploaderConfig contains aws s3 uploader related config to controls things
1314
// like bucket, prefix, batching, connections, retries, etc.
1415
type S3UploaderConfig struct {
15-
Region string `mapstructure:"region"`
16-
S3Bucket string `mapstructure:"s3_bucket"`
17-
S3Prefix string `mapstructure:"s3_prefix"`
18-
S3Partition string `mapstructure:"s3_partition"`
19-
FilePrefix string `mapstructure:"file_prefix"`
20-
Endpoint string `mapstructure:"endpoint"`
21-
RoleArn string `mapstructure:"role_arn"`
22-
S3ForcePathStyle bool `mapstructure:"s3_force_path_style"`
23-
DisableSSL bool `mapstructure:"disable_ssl"`
16+
Region string `mapstructure:"region"`
17+
S3Bucket string `mapstructure:"s3_bucket"`
18+
S3Prefix string `mapstructure:"s3_prefix"`
19+
S3Partition string `mapstructure:"s3_partition"`
20+
FilePrefix string `mapstructure:"file_prefix"`
21+
Endpoint string `mapstructure:"endpoint"`
22+
RoleArn string `mapstructure:"role_arn"`
23+
S3ForcePathStyle bool `mapstructure:"s3_force_path_style"`
24+
DisableSSL bool `mapstructure:"disable_ssl"`
25+
Compression configcompression.Type `mapstructure:"compression"`
2426
}
2527

2628
type MarshalerType string
@@ -48,5 +50,15 @@ func (c *Config) Validate() error {
4850
if c.S3Uploader.S3Bucket == "" {
4951
errs = multierr.Append(errs, errors.New("bucket is required"))
5052
}
53+
compression := c.S3Uploader.Compression
54+
if compression.IsCompressed() {
55+
if compression != configcompression.TypeGzip {
56+
errs = multierr.Append(errs, errors.New("unknown compression type"))
57+
}
58+
59+
if c.MarshalerName == SumoIC {
60+
errs = multierr.Append(errs, errors.New("marshaler does not support compression"))
61+
}
62+
}
5163
return errs
5264
}

exporter/awss3exporter/config_test.go

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,3 +196,45 @@ func TestMarshallerName(t *testing.T) {
196196
)
197197

198198
}
199+
200+
func TestCompressionName(t *testing.T) {
201+
factories, err := otelcoltest.NopFactories()
202+
assert.NoError(t, err)
203+
204+
factory := NewFactory()
205+
factories.Exporters[factory.Type()] = factory
206+
cfg, err := otelcoltest.LoadConfigAndValidate(
207+
filepath.Join("testdata", "compression.yaml"), factories)
208+
209+
require.NoError(t, err)
210+
require.NotNil(t, cfg)
211+
212+
e := cfg.Exporters[component.MustNewID("awss3")].(*Config)
213+
214+
assert.Equal(t, e,
215+
&Config{
216+
S3Uploader: S3UploaderConfig{
217+
Region: "us-east-1",
218+
S3Bucket: "foo",
219+
S3Partition: "minute",
220+
Compression: "gzip",
221+
},
222+
MarshalerName: "otlp_json",
223+
},
224+
)
225+
226+
e = cfg.Exporters[component.MustNewIDWithName("awss3", "proto")].(*Config)
227+
228+
assert.Equal(t, e,
229+
&Config{
230+
S3Uploader: S3UploaderConfig{
231+
Region: "us-east-1",
232+
S3Bucket: "bar",
233+
S3Partition: "minute",
234+
Compression: "none",
235+
},
236+
MarshalerName: "otlp_proto",
237+
},
238+
)
239+
240+
}

exporter/awss3exporter/go.mod

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ require (
66
github.com/aws/aws-sdk-go v1.50.27
77
github.com/stretchr/testify v1.9.0
88
go.opentelemetry.io/collector/component v0.96.1-0.20240306115632-b2693620eff6
9+
go.opentelemetry.io/collector/config/configcompression v0.96.1-0.20240306115632-b2693620eff6
910
go.opentelemetry.io/collector/confmap v0.96.1-0.20240306115632-b2693620eff6
1011
go.opentelemetry.io/collector/consumer v0.96.1-0.20240306115632-b2693620eff6
1112
go.opentelemetry.io/collector/exporter v0.96.1-0.20240306115632-b2693620eff6

exporter/awss3exporter/go.sum

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

exporter/awss3exporter/s3_writer.go

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ package awss3exporter // import "github.com/open-telemetry/opentelemetry-collect
55

66
import (
77
"bytes"
8+
"compress/gzip"
89
"context"
910
"fmt"
1011
"math/rand"
@@ -15,6 +16,7 @@ import (
1516
"github.com/aws/aws-sdk-go/aws/credentials/stscreds"
1617
"github.com/aws/aws-sdk-go/aws/session"
1718
"github.com/aws/aws-sdk-go/service/s3/s3manager"
19+
"go.opentelemetry.io/collector/config/configcompression"
1820
)
1921

2022
type s3Writer struct {
@@ -38,12 +40,17 @@ func randomInRange(low, hi int) int {
3840
return low + rand.Intn(hi-low)
3941
}
4042

41-
func getS3Key(time time.Time, keyPrefix string, partition string, filePrefix string, metadata string, fileformat string) string {
43+
func getS3Key(time time.Time, keyPrefix string, partition string, filePrefix string, metadata string, fileformat string, compression configcompression.Type) string {
4244
timeKey := getTimeKey(time, partition)
4345
randomID := randomInRange(100000000, 999999999)
4446

4547
s3Key := keyPrefix + "/" + timeKey + "/" + filePrefix + metadata + "_" + strconv.Itoa(randomID) + "." + fileformat
4648

49+
// add ".gz" extension to files if compression is enabled
50+
if compression == configcompression.TypeGzip {
51+
s3Key += ".gz"
52+
}
53+
4754
return s3Key
4855
}
4956

@@ -77,10 +84,28 @@ func (s3writer *s3Writer) writeBuffer(_ context.Context, buf []byte, config *Con
7784
now := time.Now()
7885
key := getS3Key(now,
7986
config.S3Uploader.S3Prefix, config.S3Uploader.S3Partition,
80-
config.S3Uploader.FilePrefix, metadata, format)
81-
82-
// create a reader from data data in memory
83-
reader := bytes.NewReader(buf)
87+
config.S3Uploader.FilePrefix, metadata, format, config.S3Uploader.Compression)
88+
89+
encoding := ""
90+
var reader *bytes.Reader
91+
if config.S3Uploader.Compression == configcompression.TypeGzip {
92+
// set s3 uploader content encoding to "gzip"
93+
encoding = "gzip"
94+
var gzipContents bytes.Buffer
95+
96+
// create a gzip from data
97+
gzipWriter := gzip.NewWriter(&gzipContents)
98+
_, err := gzipWriter.Write(buf)
99+
if err != nil {
100+
return err
101+
}
102+
gzipWriter.Close()
103+
104+
reader = bytes.NewReader(gzipContents.Bytes())
105+
} else {
106+
// create a reader from data in memory
107+
reader = bytes.NewReader(buf)
108+
}
84109

85110
sessionConfig := getSessionConfig(config)
86111
sess, err := getSession(config, sessionConfig)
@@ -92,9 +117,10 @@ func (s3writer *s3Writer) writeBuffer(_ context.Context, buf []byte, config *Con
92117
uploader := s3manager.NewUploader(sess)
93118

94119
_, err = uploader.Upload(&s3manager.UploadInput{
95-
Bucket: aws.String(config.S3Uploader.S3Bucket),
96-
Key: aws.String(key),
97-
Body: reader,
120+
Bucket: aws.String(config.S3Uploader.S3Bucket),
121+
Key: aws.String(key),
122+
Body: reader,
123+
ContentEncoding: &encoding,
98124
})
99125
if err != nil {
100126
return err

exporter/awss3exporter/s3_writer_test.go

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,21 @@ func TestS3Key(t *testing.T) {
3636
require.NotNil(t, tm)
3737

3838
re := regexp.MustCompile(`keyprefix/year=2022/month=06/day=05/hour=00/minute=00/fileprefixlogs_([0-9]+).json`)
39-
s3Key := getS3Key(tm, "keyprefix", "minute", "fileprefix", "logs", "json")
39+
s3Key := getS3Key(tm, "keyprefix", "minute", "fileprefix", "logs", "json", "")
40+
matched := re.MatchString(s3Key)
41+
assert.Equal(t, true, matched)
42+
}
43+
44+
func TestS3KeyOfCompressedFile(t *testing.T) {
45+
const layout = "2006-01-02"
46+
47+
tm, err := time.Parse(layout, "2022-06-05")
48+
49+
assert.NoError(t, err)
50+
require.NotNil(t, tm)
51+
52+
re := regexp.MustCompile(`keyprefix/year=2022/month=06/day=05/hour=00/minute=00/fileprefixlogs_([0-9]+).json.gz`)
53+
s3Key := getS3Key(tm, "keyprefix", "minute", "fileprefix", "logs", "json", "gzip")
4054
matched := re.MatchString(s3Key)
4155
assert.Equal(t, true, matched)
4256
}
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
receivers:
2+
nop:
3+
4+
exporters:
5+
awss3:
6+
s3uploader:
7+
s3_bucket: "foo"
8+
compression: "gzip"
9+
marshaler: otlp_json
10+
11+
awss3/proto:
12+
s3uploader:
13+
s3_bucket: "bar"
14+
compression: "none"
15+
marshaler: otlp_proto
16+
17+
18+
processors:
19+
nop:
20+
21+
service:
22+
pipelines:
23+
traces:
24+
receivers: [nop]
25+
processors: [nop]
26+
exporters: [awss3, awss3/proto]

0 commit comments

Comments
 (0)