Skip to content

Attempt to improve CI caching, v2 #6544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

max-sixty
Copy link
Collaborator

Reverts #6543 as discussed.

@keewis
Copy link
Collaborator

keewis commented Apr 30, 2022

I wonder if we should switch from conda-incubator/setup-miniconda to mamba-org/provision-with-micromamba, which uses micromamba and thus should by a lot faster than miniconda with mamba? It also has some kind of built-in caching, which means that we could get rid of a few steps here, and maybe even avoid the tricky caching part?

@max-sixty
Copy link
Collaborator Author

max-sixty commented Apr 30, 2022

That sounds like a great idea @keewis

Edit: I can hopefully have a go at this next week

- name: Install conda dependencies
run: |
mamba env update -f $CONDA_ENV_FILE
if: steps.cache-env.outputs.cache-hit != 'true'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be removed. A dependency's version can change without any change in the cache key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dcherian

My understanding of conda (which I only use when developing xarray, so don't know) is that running the update step updates all the minor versions. And so that would cause cache misses on any patch version increase.

So my understanding of this if is that to avoid updating patch versions, so we still get a hit, at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).

Is your point that this is bad, because we do want to update the patch version in these cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your point that this is bad, because we do want to update the patch version in these cases?

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).

Actually they last a week if they haven't been accessed, so this doesn't work.


I don't think it will helpful if we remove the if, because GHA caches are write-once, so as soon as a new version comes out, we'll have to reinstall it until the cache key changes.

In other projects I've used a poetry lock file, which is useful to cache on, and then updated at some frequency. But I'm not sure if that approach is supported by conda.
I can't seem to figure out whether we can key on a file that's generated as part of the run rather than in the repo — that would let us make the minimum changes to the process while getting the cache to work. We could try it if people agree it would be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the if, then we'll spend time "solving" the environment every time but will save time because we aren't downloading + installing every single package, just those that have updated since the cache was created (IIUC). Please correct me if I am wrong here.

This should still be a good improvement, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just those that have updated since the cache was created

Yes but it's only created the first time the key hit. Caches are never overwritten. So this would decay a lot.

The docs suggest we add the date onto the key, which seems not that ambitious, but maybe better than nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought that we could add the merge's base ref onto the key (i.e. the commit on main that it branches off), so it updated every time we merged. I need to think through this more though.

@keewis keewis mentioned this pull request Jun 8, 2022
1 task
@keewis keewis closed this in #6674 Jun 10, 2022
@max-sixty max-sixty deleted the revert-6543-revert-6534-gha-cache branch June 10, 2022 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants