-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
- Version: Metricbeat 8.17 and above
Problem
As of 8.17, Beats can access AWS instance tags with the add_cloud_metadata
processor
#41477
The AWS tags from EC2 hosts are added under aws.tags
.
AWS does not restrict tag keys from containing dots (.
), but Metricbeat's default dynamic template for aws.tags
is defined as:
{
"aws.tags.*": {
"path_match": "aws.tags.*",
"mapping": {
"type": "keyword"
},
"match_mapping_type": "*"
}
}
This mapping applies to all types (match_mapping_type: "*"
) including objects.
If an AWS tag key contains a dot, Elasticsearch interprets it as a nested object field.
For example, the tag key csg.support
and value something
results in:
{
"aws": {
"tags": {
"csg": {
"support": "something"
}
}
}
}
Due to the MB template, Elasticsearch tries to map aws.tags.csg
(an object) as a keyword
, which results in parsing errors
{
"error": {
"root_cause": [
{
"type": "document_parsing_exception",
"reason": "[1:28] failed to parse field [aws.tags.csg] of type [keyword] in document with id '1'. Preview of field's value: '{support=something}'"
}
],
"type": "document_parsing_exception",
"reason": "[1:28] failed to parse field [aws.tags.csg] of type [keyword] in document with id '1'. Preview of field's value: '{support=something}'",
"caused_by": {
"type": "illegal_state_exception",
"reason": "Can't get text on a START_OBJECT at 1:4"
}
},
"status": 400
}
Impact
Ingest will break in any environment that uses dots in their AWS tagging convention and have the add_cloud_metadata
processor enabled. This can unexpectedly happen on an upgrade from an earlier version to 8.17 since the tags will suddenly become accessible to the processor.
Incidentally, this has been observed in ECE 3.8.0 (where 8.18 beats are used and users do not have control over the templates), and has partially broken the ingest to the system logging and metrics cluster. Which has some other downstream UI issues.
Suggestion
I'm not sure what the best approach is TBH. but it doesn't seem like the current approach fits within ECS guidelines https://www.elastic.co/docs/reference/ecs/ecs-guidelines#_guidelines_for_field_names
It says to reserves dots to indicate object hierarchy, and requires that a field path resolve to a single consistent type. But the current template allows dotted tag keys to be mapped inconsistently as keywords or objects(though this results in indexing errors)..
If this doesn't end up meriting a change, then perhaps we can add a Special Note to the add cloud metadata documentation about this consideration: https://www.elastic.co/docs/reference/beats/filebeat/add-cloud-metadata
Some ideas...
- Change the default dynamic template for
aws.tags
to only apply to string fields instead of all types withmatch_mapping_type": "string"
. This avoids attempting to force objects intokeyword
amd keeps existing behavior for string values while preventing mapping conflicts and ingestion failures.
But that would still susceptible to path type conflicts if a tag key evolves from a leaf string to an object (e.g., first csg: "something"
, later csg.support: "something"
)
- So perhaps mapping
aws.tags
asflattened
instead could be better to tolerate arbitrary depth without explosions and conflicts. Problem with this it changes the query semantics.
Steps to Reproduce:
install metricbeat on an EC2 instance with tags allowed in metadata and a tag that contains a dot in the key.
Or we we can simulate with these APIs
Create index with the mapping in metricbeat's default template
PUT aws-tag-test/
{
"mappings": {
"dynamic_templates": [
{
"aws.tags.*": {
"path_match": "aws.tags.*",
"mapping": {
"type": "keyword"
},
"match_mapping_type": "*"
}
}
]
}
}
Add a document mimicking a dotted tag key. This will error.
PUT aws-tag-test/_doc/1
{
"aws.tags.csg.support": "something"
}