Skip to content

Attempt to improve CI caching, v2 #6544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ jobs:

echo "PYTHON_VERSION=${{ matrix.python-version }}" >> $GITHUB_ENV

# This and the next few are based on https://github.com/conda-incubator/setup-miniconda#caching-environments
- name: Cache conda
id: cache-conda
uses: actions/cache@v3
with:
path: ~/conda_pkgs_dir
Expand All @@ -99,9 +101,17 @@ jobs:
python-version: ${{ matrix.python-version }}
use-only-tar-bz2: true

- name: Cache conda env
id: cache-env
uses: actions/cache@v3
with:
path: /usr/share/miniconda/envs/xarray-tests
key: ${{ runner.os }}-conda-py${{ matrix.python-version }}-${{ hashFiles('ci/requirements/**.yml') }}-${{ matrix.env }}

- name: Install conda dependencies
run: |
mamba env update -f $CONDA_ENV_FILE
if: steps.cache-env.outputs.cache-hit != 'true'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be removed. A dependency's version can change without any change in the cache key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dcherian

My understanding of conda (which I only use when developing xarray, so don't know) is that running the update step updates all the minor versions. And so that would cause cache misses on any patch version increase.

So my understanding of this if is that to avoid updating patch versions, so we still get a hit, at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).

Is your point that this is bad, because we do want to update the patch version in these cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your point that this is bad, because we do want to update the patch version in these cases?

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the cost of being out-of-date for a few days (my understanding is that the cache has 1 week lifetime, so it doesn't last forever).

Actually they last a week if they haven't been accessed, so this doesn't work.


I don't think it will helpful if we remove the if, because GHA caches are write-once, so as soon as a new version comes out, we'll have to reinstall it until the cache key changes.

In other projects I've used a poetry lock file, which is useful to cache on, and then updated at some frequency. But I'm not sure if that approach is supported by conda.
I can't seem to figure out whether we can key on a file that's generated as part of the run rather than in the repo — that would let us make the minimum changes to the process while getting the cache to work. We could try it if people agree it would be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the if, then we'll spend time "solving" the environment every time but will save time because we aren't downloading + installing every single package, just those that have updated since the cache was created (IIUC). Please correct me if I am wrong here.

This should still be a good improvement, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just those that have updated since the cache was created

Yes but it's only created the first time the key hit. Caches are never overwritten. So this would decay a lot.

The docs suggest we add the date onto the key, which seems not that ambitious, but maybe better than nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a thought that we could add the merge's base ref onto the key (i.e. the commit on main that it branches off), so it updated every time we merged. I need to think through this more though.


# We only want to install this on one run, because otherwise we'll have
# duplicate annotations.
Expand Down