-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Attempt to improve CI caching, v2 #6544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder if we should switch from |
That sounds like a great idea @keewis Edit: I can hopefully have a go at this next week |
- name: Install conda dependencies | ||
run: | | ||
mamba env update -f $CONDA_ENV_FILE | ||
if: steps.cache-env.outputs.cache-hit != 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be removed. A dependency's version can change without any change in the cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dcherian
My understanding of conda (which I only use when developing xarray, so don't know) is that running the update
step updates all the minor versions. And so that would cause cache misses on any patch version increase.
So my understanding of this if
is that to avoid updating patch versions, so we still get a hit, at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).
Is your point that this is bad, because we do want to update the patch version in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your point that this is bad, because we do want to update the patch version in these cases?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).
Actually they last a week if they haven't been accessed, so this doesn't work.
I don't think it will helpful if we remove the if
, because GHA caches are write-once, so as soon as a new version comes out, we'll have to reinstall it until the cache key changes.
In other projects I've used a poetry lock file, which is useful to cache on, and then updated at some frequency. But I'm not sure if that approach is supported by conda.
I can't seem to figure out whether we can key on a file that's generated as part of the run rather than in the repo — that would let us make the minimum changes to the process while getting the cache to work. We could try it if people agree it would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the if
, then we'll spend time "solving" the environment every time but will save time because we aren't downloading + installing every single package, just those that have updated since the cache was created (IIUC). Please correct me if I am wrong here.
This should still be a good improvement, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just those that have updated since the cache was created
Yes but it's only created the first time the key hit. Caches are never overwritten. So this would decay a lot.
The docs suggest we add the date onto the key, which seems not that ambitious, but maybe better than nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a thought that we could add the merge's base ref onto the key (i.e. the commit on main that it branches off), so it updated every time we merged. I need to think through this more though.
Reverts #6543 as discussed.