Skip to content
This repository was archived by the owner on Oct 23, 2024. It is now read-only.

Commit 3d1c91b

Browse files
add infra correlation for spans (#1250)
* add infra correlation for spans * Add test methods to handle infra correlation properties * Add test methods to verify infra correlation property removal * Add test for environment infrastructure correlation * add globalSpanTags and extraSpanTags configurations * refactoring active service tracker locking * Add performance test for infra correlation * move cache purging in active service tracker to a dedicated routine * don't keep spans, datapoints, and events in fake backend durring trace api performance test * separate ingest and api servers into dedicated threads in fake backend Co-authored-by: Scott Stewart <[email protected]>
1 parent dea7098 commit 3d1c91b

32 files changed

+1655
-173
lines changed

docs/config-schema.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ if not set.
4848
| `disableHostDimensions` | no | bool | Our standard agent model is to collect metrics for services running on the same host as the agent. Therefore, host-specific dimensions (e.g. `host`, `AWSUniqueId`, etc) are automatically added to every datapoint that is emitted from the agent by default. Set this to true if you are using the agent primarily to monitor things on other hosts. You can set this option at the monitor level as well. (**default:** `false`) |
4949
| `intervalSeconds` | no | integer | How often to send metrics to SignalFx. Monitors can override this individually. (**default:** `10`) |
5050
| `globalDimensions` | no | map of strings | Dimensions (key:value pairs) that will be added to every datapoint emitted by the agent. To specify that all metrics should be high-resolution, add the dimension `sf_hires: 1` |
51+
| `globalSpanTags` | no | map of strings | Tags (key:value pairs) that will be added to every span emitted by the agent. |
5152
| `cluster` | no | string | The logical environment/cluster that this agent instance is running in. All of the services that this instance monitors should be in the same environment as well. This value, if provided, will be synced as a property onto the `host` dimension, or onto any cloud-provided specific dimensions (`AWSUniqueId`, `gcp_id`, and `azure_resource_id`) when available. Example values: "prod-usa", "dev" |
5253
| `syncClusterOnHostDimension` | no | bool | If true, force syncing of the `cluster` property on the `host` dimension, even when cloud-specific dimensions are present. (**default:** `false`) |
5354
| `validateDiscoveryRules` | no | bool | If true, a warning will be emitted if a discovery rule contains variables that will never possibly match a rule. If using multiple observers, it is convenient to set this to false to suppress spurious errors. (**default:** `false`) |
@@ -97,6 +98,8 @@ The following are generic options that apply to all monitors. Each monitor type
9798
| `discoveryRule` | no | string | The rule used to match up this configuration with a discovered endpoint. If blank, the configuration will be run immediately when the agent is started. If multiple endpoints match this rule, multiple instances of the monitor type will be created with the same configuration (except different host/port). |
9899
| `validateDiscoveryRule` | no | bool | If true, a warning will be emitted if a discovery rule contains variables that will never possibly match a rule. If using multiple observers, it is convenient to set this to false to suppress spurious errors. The top-level setting `validateDiscoveryRules` acts as a default if this isn't set. (**default:** `"false"`) |
99100
| `extraDimensions` | no | map of strings | A set of extra dimensions (key:value pairs) to include on datapoints emitted by the monitor(s) created from this configuration. To specify metrics from this monitor should be high-resolution, add the dimension `sf_hires: 1` |
101+
| `extraSpanTags` | no | map of strings | A set of extra span tags (key:value pairs) to include on spans emitted by the monitor(s) created from this configuration. |
102+
| `extraSpanTagFromEndpoint` | no | map of strings | A mapping of extra span tag names to a [discovery rule expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html) that is used to derive the value of the span tag. For example, to use a certain container label as a span tag, you could use something like this in your monitor config block: `extraSpanTagsFromEndpoint: {env: 'Get(container_labels, "myapp.com/environment")'}` |
100103
| `extraDimensionsFromEndpoint` | no | map of strings | A mapping of extra dimension names to a [discovery rule expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html) that is used to derive the value of the dimension. For example, to use a certain container label as a dimension, you could use something like this in your monitor config block: `extraDimensionsFromEndpoint: {env: 'Get(container_labels, "myapp.com/environment")'}` |
101104
| `configEndpointMappings` | no | map of strings | A set of mappings from a configuration option on this monitor to attributes of a discovered endpoint. The keys are the config option on this monitor and the value can be any valid expression used in discovery rules. |
102105
| `intervalSeconds` | no | integer | The interval (in seconds) at which to emit datapoints from the monitor(s) created by this configuration. If not set (or set to 0), the global agent intervalSeconds config option will be used instead. (**default:** `0`) |
@@ -149,9 +152,10 @@ The **nested** `writer` config object has the following fields:
149152
| `logDimensionUpdates` | no | bool | If `true`, dimension updates will be logged at the INFO level. (**default:** `false`) |
150153
| `logDroppedDatapoints` | no | bool | If true, and the log level is `debug`, filtered out datapoints will be logged. (**default:** `false`) |
151154
| `addGlobalDimensionsAsSpanTags` | no | bool | If true, the dimensions specified in the top-level `globalDimensions` configuration will be added to the tag set of all spans that are emitted by the writer. If this is false, only the "host id" dimensions such as `host`, `AwsUniqueId`, etc. are added to the span tags. (**default:** `false`) |
152-
| `sendTraceHostCorrelationMetrics` | no | bool | Whether to send host correlation metrics to correlation traced services with the underlying host (**default:** `true`) |
153-
| `staleServiceTimeout` | no | int64 | How long to wait after a trace span's service name is last seen to continue sending the correlation datapoints for that service. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration. This option is irrelvant if `sendTraceHostCorrelationMetrics` is false. (**default:** `"5m"`) |
154-
| `traceHostCorrelationMetricsInterval` | no | int64 | How frequently to send host correlation metrics that are generated from the service name seen in trace spans sent through or by the agent. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration. This option is irrelvant if `sendTraceHostCorrelationMetrics` is false. (**default:** `"1m"`) |
155+
| `sendTraceHostCorrelationMetrics` | no | bool | Whether to send host correlation metrics to correlate traced services with the underlying host (**default:** `true`) |
156+
| `staleServiceTimeout` | no | int64 | How long to wait after a trace span's service name is last seen to continue sending the correlation datapoints for that service. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration. This option is irrelevant if `sendTraceHostCorrelationMetrics` is false. (**default:** `"5m"`) |
157+
| `traceHostCorrelationPurgeInterval` | no | int64 | How frequently to purge host correlation caches that are generated from the service and environment names seen in trace spans sent through or by the agent. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration. (**default:** `"1m"`) |
158+
| `traceHostCorrelationMetricsInterval` | no | int64 | How frequently to send host correlation metrics that are generated from the service name seen in trace spans sent through or by the agent. This should be a duration string that is accepted by https://golang.org/pkg/time/#ParseDuration. This option is irrelevant if `sendTraceHostCorrelationMetrics` is false. (**default:** `"1m"`) |
155159
| `maxTraceSpansInFlight` | no | unsigned integer | How many trace spans are allowed to be in the process of sending. While this number is exceeded, the oldest spans will be discarded to accommodate new spans generated to avoid memory exhaustion. If you see log messages about "Aborting pending trace requests..." or "Dropping new trace spans..." it means that the downstream target for traces is not able to accept them fast enough. Usually if the downstream is offline you will get connection refused errors and most likely spans will not build up in the agent (there is no retry mechanism). In the case of slow downstreams, you might be able to increase `maxRequests` to increase the concurrent stream of spans downstream (if the target can make efficient use of additional connections) or, less likely, increase `traceSpanMaxBatchSize` if your batches are maxing out (turn on debug logging to see the batch sizes being sent) and being split up too much. If neither of those options helps, your downstream is likely too slow to handle the volume of trace spans and should be upgraded to more powerful hardware/networking. (**default:** `100000`) |
156160

157161

@@ -368,6 +372,7 @@ where applicable:
368372
disableHostDimensions: false
369373
intervalSeconds: 10
370374
globalDimensions:
375+
globalSpanTags:
371376
cluster:
372377
syncClusterOnHostDimension: false
373378
validateDiscoveryRules: false
@@ -393,6 +398,7 @@ where applicable:
393398
addGlobalDimensionsAsSpanTags: false
394399
sendTraceHostCorrelationMetrics: true
395400
staleServiceTimeout: "5m"
401+
traceHostCorrelationPurgeInterval: "1m"
396402
traceHostCorrelationMetricsInterval: "1m"
397403
maxTraceSpansInFlight: 100000
398404
logging:

docs/monitor-config.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ The following config options are common to all monitors:
2727
| `discoveryRule` | | no | `string` | The rule used to match up this configuration with a discovered endpoint. If blank, the configuration will be run immediately when the agent is started. If multiple endpoints match this rule, multiple instances of the monitor type will be created with the same configuration (except different host/port). |
2828
| `validateDiscoveryRule` | `false` | no | `bool` | If true, a warning will be emitted if a discovery rule contains variables that will never possibly match a rule. If using multiple observers, it is convenient to set this to false to suppress spurious errors. The top-level setting `validateDiscoveryRules` acts as a default if this isn't set. |
2929
| `extraDimensions` | | no | `map of strings` | A set of extra dimensions (key:value pairs) to include on datapoints emitted by the monitor(s) created from this configuration. To specify metrics from this monitor should be high-resolution, add the dimension `sf_hires: 1` |
30+
| `extraSpanTags` | | no | `map of strings` | A set of extra span tags (key:value pairs) to include on spans emitted by the monitor(s) created from this configuration. |
31+
| `extraSpanTagFromEndpoint` | | no | `map of strings` | A mapping of extra span tag names to a [discovery rule expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html) that is used to derive the value of the span tag. For example, to use a certain container label as a span tag, you could use something like this in your monitor config block: `extraSpanTagsFromEndpoint: {env: 'Get(container_labels, "myapp.com/environment")'}` |
3032
| `extraDimensionsFromEndpoint` | | no | `map of strings` | A mapping of extra dimension names to a [discovery rule expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html) that is used to derive the value of the dimension. For example, to use a certain container label as a dimension, you could use something like this in your monitor config block: `extraDimensionsFromEndpoint: {env: 'Get(container_labels, "myapp.com/environment")'}` |
3133
| `configEndpointMappings` | | no | `map of strings` | A set of mappings from a configuration option on this monitor to attributes of a discovered endpoint. The keys are the config option on this monitor and the value can be any valid expression used in discovery rules. |
3234
| `intervalSeconds` | `0` | no | `integer` | The interval (in seconds) at which to emit datapoints from the monitor(s) created by this configuration. If not set (or set to 0), the global agent intervalSeconds config option will be used instead. |

pkg/core/config/config.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,8 @@ type Config struct {
7676
// Dimensions (key:value pairs) that will be added to every datapoint emitted by the agent.
7777
// To specify that all metrics should be high-resolution, add the dimension `sf_hires: 1`
7878
GlobalDimensions map[string]string `yaml:"globalDimensions" default:"{}"`
79+
// Tags (key:value pairs) that will be added to every span emitted by the agent.
80+
GlobalSpanTags map[string]string `yaml:"globalSpanTags" default:"{}"`
7981
// The logical environment/cluster that this agent instance is running in.
8082
// All of the services that this instance monitors should be in the same
8183
// environment as well. This value, if provided, will be synced as a
@@ -277,6 +279,7 @@ func (c *Config) propagateValuesDown() error {
277279
c.Writer.TraceEndpointURL = c.TraceEndpointURL
278280
c.Writer.SignalFxAccessToken = c.SignalFxAccessToken
279281
c.Writer.GlobalDimensions = c.GlobalDimensions
282+
c.Writer.GlobalSpanTags = c.GlobalSpanTags
280283

281284
return nil
282285
}

pkg/core/config/monitor.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,15 @@ type MonitorConfig struct {
3535
// monitor(s) created from this configuration. To specify metrics from this
3636
// monitor should be high-resolution, add the dimension `sf_hires: 1`
3737
ExtraDimensions map[string]string `yaml:"extraDimensions" json:"extraDimensions"`
38+
// A set of extra span tags (key:value pairs) to include on spans emitted by the
39+
// monitor(s) created from this configuration.
40+
ExtraSpanTags map[string]string `yaml:"extraSpanTags" json:"extraSpanTags"`
41+
// A mapping of extra span tag names to a [discovery rule
42+
// expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html)
43+
// that is used to derive the value of the span tag. For example, to use
44+
// a certain container label as a span tag, you could use something like this
45+
// in your monitor config block: `extraSpanTagsFromEndpoint: {env: 'Get(container_labels, "myapp.com/environment")'}`
46+
ExtraSpanTagsFromEndpoint map[string]string `yaml:"extraSpanTagFromEndpoint" json:"extraSpanTagFromEndpoint"`
3847
// A mapping of extra dimension names to a [discovery rule
3948
// expression](https://docs.signalfx.com/en/latest/integrations/agent/auto-discovery.html)
4049
// that is used to derive the value of the dimension. For example, to use

pkg/core/config/writer.go

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@ import (
44
"net/url"
55
"strings"
66

7-
"github.com/signalfx/signalfx-agent/pkg/utils/timeutil"
8-
97
"github.com/mitchellh/hashstructure"
108
"github.com/signalfx/signalfx-agent/pkg/core/dpfilters"
119
"github.com/signalfx/signalfx-agent/pkg/core/propfilters"
10+
"github.com/signalfx/signalfx-agent/pkg/utils/timeutil"
1211
log "github.com/sirupsen/logrus"
1312
)
1413

@@ -74,19 +73,24 @@ type WriterConfig struct {
7473
// by the writer. If this is false, only the "host id" dimensions such as
7574
// `host`, `AwsUniqueId`, etc. are added to the span tags.
7675
AddGlobalDimensionsAsSpanTags bool `yaml:"addGlobalDimensionsAsSpanTags"`
77-
// Whether to send host correlation metrics to correlation traced services
76+
// Whether to send host correlation metrics to correlate traced services
7877
// with the underlying host
7978
SendTraceHostCorrelationMetrics *bool `yaml:"sendTraceHostCorrelationMetrics" default:"true"`
8079
// How long to wait after a trace span's service name is last seen to
8180
// continue sending the correlation datapoints for that service. This
8281
// should be a duration string that is accepted by
83-
// https://golang.org/pkg/time/#ParseDuration. This option is irrelvant if
82+
// https://golang.org/pkg/time/#ParseDuration. This option is irrelevant if
8483
// `sendTraceHostCorrelationMetrics` is false.
8584
StaleServiceTimeout timeutil.Duration `yaml:"staleServiceTimeout" default:"5m"`
85+
// How frequently to purge host correlation caches that are generated from
86+
// the service and environment names seen in trace spans sent through or by
87+
// the agent. This should be a duration string that is accepted by
88+
// https://golang.org/pkg/time/#ParseDuration.
89+
TraceHostCorrelationPurgeInterval timeutil.Duration `yaml:"traceHostCorrelationPurgeInterval" default:"1m"`
8690
// How frequently to send host correlation metrics that are generated from
8791
// the service name seen in trace spans sent through or by the agent. This
8892
// should be a duration string that is accepted by
89-
// https://golang.org/pkg/time/#ParseDuration. This option is irrelvant if
93+
// https://golang.org/pkg/time/#ParseDuration. This option is irrelevant if
9094
// `sendTraceHostCorrelationMetrics` is false.
9195
TraceHostCorrelationMetricsInterval timeutil.Duration `yaml:"traceHostCorrelationMetricsInterval" default:"1m"`
9296
// How many trace spans are allowed to be in the process of sending. While
@@ -114,6 +118,7 @@ type WriterConfig struct {
114118
TraceEndpointURL string `yaml:"-"`
115119
SignalFxAccessToken string `yaml:"-"`
116120
GlobalDimensions map[string]string `yaml:"-"`
121+
GlobalSpanTags map[string]string `yaml:"-"`
117122
MetricsToInclude []MetricFilter `yaml:"-"`
118123
MetricsToExclude []MetricFilter `yaml:"-"`
119124
PropertiesToExclude []PropertyFilterConfig `yaml:"-"`

0 commit comments

Comments
 (0)