Skip to content

filestorageextension should allow for a maximum storage limit #38620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mattsains opened this issue Mar 13, 2025 · 25 comments
Open

filestorageextension should allow for a maximum storage limit #38620

mattsains opened this issue Mar 13, 2025 · 25 comments
Assignees

Comments

@mattsains
Copy link
Contributor

mattsains commented Mar 13, 2025

Component(s)

extension/storage/filestorage

Is your feature request related to a problem? Please describe.

See also: open-telemetry/opentelemetry-collector#12634

Although the file storage extension has fields to garbage collect data on disk, there is no way to set a limit of how much space the extension can use on disk. There are limits that can be set, for example, in the exporter helper in terms of number of items being stored, but not megabytes, and not in total for a single extension.

I think this is an important feature because filling up a disk can be an reliability and performance issue.

Describe the solution you'd like

The filestorage extension should allow for a new field to set the absolute maximum amount of storage used by the extension. If the extension reaches this maximum during a call to Set(), it should automatically compact if appropriate, and, if still at the limit, should return a storage contract-defined error to communicate that it is full and cannot accept new data.

Additionally, fields ReboundNeededThresholdMiB and ReboundTriggerThresholdMiB should be validated to be at or lower than this maximum size

Describe alternatives you've considered

As mentioned, there is exporterhelper limit on queue size, but pdata entries can vary wildly in size, and also a single storage extension can be used for multiple purposes.

Additional context

No response

@mattsains mattsains added enhancement New feature or request needs triage New item requiring triage labels Mar 13, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@swiatekm
Copy link
Contributor

swiatekm commented Mar 14, 2025

Sounds like a useful feature, and I don't see any immediate problems with the proposed implementation. If you want to define a standard error for this, then https://github.com/open-telemetry/opentelemetry-collector/blob/main/extension/xextension/storage/storage.go would be the place, but in practice it'd probably be easier to first submit a PR to the filestorage extension and get that reviewed.

@andrzej-stencel I recall discussing this in the past. Do you recall any potential problems with this feature?

@swiatekm
Copy link
Contributor

/label -needs-triage

@github-actions github-actions bot removed the needs triage New item requiring triage label Mar 14, 2025
@VihasMakwana
Copy link
Contributor

https://github.com/open-telemetry/opentelemetry-collector/blob/main/extension/xextension/storage/storage.go would be the place, but in practice it'd probably be easier to first submit a PR to the filestorage extension and get that reviewed.

I agree. It's a good starting point.

@mattsains will you be able to submit a PR for this?

@mattsains
Copy link
Contributor Author

Absolutely I’ll get on it

@mattsains
Copy link
Contributor Author

one thing I've noticed here is that the compaction process is constructive and creates a compacted copy of the database during the compaction process. It seems to me that either the maximum size configuration should not include the size taken up by compaction (which could end up temporarily consuming as much space as the original database), or by basically dividing the max size by two when compaction is enabled. Thoughts?

@VihasMakwana
Copy link
Contributor

It seems to me that either the maximum size configuration should not include the size taken up by compaction (which could end up temporarily consuming as much space as the original database), or by basically dividing the max size by two when compaction is enabled

We should not should not include the size taken up by compaction. Compaction db creates a "temporary" drop and we delete it once compaction is done.

@mattsains
Copy link
Contributor Author

@VihasMakwana I'm thinking of otel users setting a max storage size and then getting confused when we're using up potentially double of the storage they've allocated to otel. Very surprising and potentially kills the collector if the user has otel running on a volume with not much more size than the max

@VihasMakwana
Copy link
Contributor

VihasMakwana commented Mar 17, 2025

But that would be temporary, right?

Documenting this behaviour looks like a way forward. Thoughts?

Very surprising and potentially kills the collector if the user has otel running on a volume with not much more size than the max

I guess this could right now as well i.e. without the "maximum storage limit" option if compaction is enabled. On a side note, no new data will be written/read while compaction is taking place.

@mattsains
Copy link
Contributor Author

yes, it would be temporary. But temporarily we would be violating the "don't exceed this storage size" constraint from the user. Maybe that's fine, I'm not sure

@mattsains
Copy link
Contributor Author

I determined we'll need new functionality in bbolt to achieve this feature, so I opened an issue there: etcd-io/bbolt#928

@swiatekm
Copy link
Contributor

I determined we'll need new functionality in bbolt to achieve this feature, so I opened an issue there: etcd-io/bbolt#928

Can't we track the total size ourselves? We can definitely know the size of all incoming operations, so the only thing we'd need would be the size of the bbolt database on disk.

@mattsains
Copy link
Contributor Author

My admittedly tenuous understanding of bbolt is that the size on disk does not increase until space runs out of the Btree, at which point it grows by a size unrelated to the size required for the new entry - leaving some empty space for efficient insertion of new items.

@swiatekm
Copy link
Contributor

As far as I'm aware, your understanding is correct. So it's true that we can't exactly predict how the bbolt db on-disk size will change as a result of writing X bytes into it. But bbolt does actually provide stats about how much memory it has available internally, so I thought we could try checking that. It probably won't be 100% accurate, but I don't think we need that. We can be conservative and set our actual limit to slightly below the configured one.

@mattsains
Copy link
Contributor Author

mattsains commented Mar 20, 2025

could you talk more about your idea? I am aware of stats but as far as I can tell we can only figure out how many bytes are "allocated" in the BTree and also probably the size of the file it has through the io/fs package.

I kind of get the idea that you're thinking of stopping allocations when we get close to the free size stat in bbolt in some situations, but what I'm not understanding is how to tell whether or not it's okay to allow bbolt to expand the file. For example, if the max limit is 10GB and the current size of the bbolt db file is 5kb, then obviously we can allow allocations past the free size, because there's no way bbolt will expand to a million times the current file size. But if the bbolt db is currently 9GB, can we allow it to expand? How big will it be after it expands? That's the reason I created the issue, to have a hook in the expansion logic of bbolt and compare the target size to a maximum size

@andrzej-stencel
Copy link
Member

@andrzej-stencel I recall discussing this in the past. Do you recall any potential problems with this feature?

I think it's a very good idea in general.

If the extension reaches this maximum during a call to Set(), it should automatically compact if appropriate

I'm not sure if this is a good idea. I'm worried about the time it may take to perform the compaction operation. If compaction takes a significant amount of time (> seconds), this might cause a timeout in the collector pipeline. Is this OK or is it better to respond immediately "I don't have space"? @swiatekm what do you think?

@mattsains
Copy link
Contributor Author

I'm happy not to implement that part, it was just an idea I had

@swiatekm
Copy link
Contributor

could you talk more about your idea? I am aware of stats but as far as I can tell we can only figure out how many bytes are "allocated" in the BTree and also probably the size of the file it has through the io/fs package.

I kind of get the idea that you're thinking of stopping allocations when we get close to the free size stat in bbolt in some situations, but what I'm not understanding is how to tell whether or not it's okay to allow bbolt to expand the file. For example, if the max limit is 10GB and the current size of the bbolt db file is 5kb, then obviously we can allow allocations past the free size, because there's no way bbolt will expand to a million times the current file size. But if the bbolt db is currently 9GB, can we allow it to expand? How big will it be after it expands? That's the reason I created the issue, to have a hook in the expansion logic of bbolt and compare the target size to a maximum size

I don't think bbolt makes any guarantees about this logic. However, if you look at the source code, there's only one circumstance in which is grows the on-disk file. The magnitude of that growth is either the size of the written value or the page size (16 MB by default), whichever is larger.

As such, I think that if we set a buffer zone of 32 MB, we should be fine. If a transaction comes in where the total added bytes + current size (as reported by Stats) would go above limit - 32 MB, then we reject it. We'll have some dead space left, but we should never go over the limit this way.

@andrzej-stencel I recall discussing this in the past. Do you recall any potential problems with this feature?

I think it's a very good idea in general.

If the extension reaches this maximum during a call to Set(), it should automatically compact if appropriate

I'm not sure if this is a good idea. I'm worried about the time it may take to perform the compaction operation. If compaction takes a significant amount of time (> seconds), this might cause a timeout in the collector pipeline. Is this OK or is it better to respond immediately "I don't have space"? @swiatekm what do you think?

I don't think we should ever compact outside the current conditions. Even if the on-disk file size is close to the limit, we can still accept writes as long as the database has enough free space internally. If there's not enough free space, compaction won't do anything for us either way.

@mattsains
Copy link
Contributor Author

I don't think we should ever compact outside the current conditions. Even if the on-disk file size is close to the limit, we can still accept writes as long as the database has enough free space internally. If there's not enough free space, compaction won't do anything for us either way.

I think this is a great point and a good argument against compacting

@mattsains
Copy link
Contributor Author

in terms of bbolt max size, I am already most of the way through a PR to add a maximum size feature to bbolt, so I'll try that and if I don't get traction I can consider the idea you've suggested

@swiatekm
Copy link
Contributor

Adding it to bbolt is definitely the right way to go! But I also wouldn't be surprised if it wasn't accepted, depending on how complex the implementation ends up being. So it's good to have a relatively simple fallback.

@mattsains
Copy link
Contributor Author

update: I have submitted a PR to bbolt to add this feature and there has been engagement on it so far: etcd-io/bbolt#929 I'll keep working on it.

@mattsains
Copy link
Contributor Author

I've merged the feature in bbolt, so as soon as the next release of bbolt happens, we can do this in filestorageextension

@mattsains
Copy link
Contributor Author

Here's a draft PR that would complete this feature request: #39667

and the dependent PR in the xextension package: open-telemetry/opentelemetry-collector#12925

@mattsains
Copy link
Contributor Author

mattsains commented Apr 29, 2025

I'm not able to get the PR to build because the newest version of bbolt requires go1.24, which I think we will only be moving to when go1.25 is released (probably around August).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants