Skip to content

Add a GRIB backend via ECMWF cfgrib / ecCodes #2476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Oct 17, 2018
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
72606f7
Integration of ECMWF cfgrib driver to read GRIB files into xarray.
alexamici Oct 9, 2018
71fcbe7
Remove all coordinate renaming from the cfgrib backend.
alexamici Oct 9, 2018
6faa7b9
Move flavour selection to `cfgrib.Dataset.from_path`.
alexamici Oct 9, 2018
1469a0e
Sync xarray backend import style with xarray.
alexamici Oct 9, 2018
12811e8
Make use of the new xarray.backends.FileCachingManager.
alexamici Oct 9, 2018
a4409b6
Add just-in-case locking for ecCodes.
alexamici Oct 9, 2018
80b8788
Explicitly assign attributes to CfGribArrayWrapper
alexamici Oct 10, 2018
9dfd660
Add missing locking in CfGribArrayWrapper and use explicit_indexing_a…
alexamici Oct 10, 2018
edc4e85
Add a comment about the ugly work-around needed for filter_by_keys.
alexamici Oct 10, 2018
9b5335a
Declare correct indexing support.
alexamici Oct 10, 2018
186a504
Merge branch 'upstream' into feature/grib-support-via-cfgrib
alexamici Oct 14, 2018
485a409
Add TestCfGrib test class.
alexamici Oct 14, 2018
81f18c2
cfgrib doesn't store a file reference so no need for CachingFileManager.
alexamici Oct 14, 2018
5dedb3f
Add cfgrib testing to Travis-CI.
alexamici Oct 14, 2018
831ae4f
Naming.
alexamici Oct 14, 2018
6372e6e
Fix line lengths and get to 100% coverage.
alexamici Oct 14, 2018
8e9b2e3
Add reference to *cfgrib* engine in inline docs.
alexamici Oct 14, 2018
07b9469
First cut of the documentation.
alexamici Oct 14, 2018
340720a
Tentative test cfgrib under dask.distributed.
alexamici Oct 14, 2018
4d84f70
Better integration test.
alexamici Oct 14, 2018
0b027db
Remove explicit copyright and license boilerplate to harmonise with o…
alexamici Oct 15, 2018
a4ead54
Add a usage example.
alexamici Oct 15, 2018
ec80d86
Fix code style.
alexamici Oct 15, 2018
f30b7d0
Fix doc style.
alexamici Oct 16, 2018
223d25c
Fix docs testing. The example.grib file is not accessible.
alexamici Oct 17, 2018
2ef993f
Merge remote-tracking branch 'upstream/master' into feature/grib-supp…
alexamici Oct 17, 2018
bbf01e3
Fix merge in docs.
alexamici Oct 17, 2018
da2b9dd
Fix merge in docs.
alexamici Oct 17, 2018
eda96a4
Fix doc style.
alexamici Oct 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/requirements-py36.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,10 @@ dependencies:
- bottleneck
- zarr
- pseudonetcdf>=3.0.1
- eccodes
- pip:
- coveralls
- pytest-cov
- pydap
- lxml
- cfgrib
4 changes: 3 additions & 1 deletion doc/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ For netCDF and IO
- `rasterio <https://github.com/mapbox/rasterio>`__: for reading GeoTiffs and
other gridded raster datasets.
- `iris <https://github.com/scitools/iris>`__: for conversion to and from iris'
Cube objects.
Cube objects
- `cfgrib <https://github.com/ecmwf/cfgrib>`__: for reading GRIB files via the
*ECMWF ecCodes* library.

For accelerating xarray
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
17 changes: 17 additions & 0 deletions doc/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,23 @@ For example:
Not all native zarr compression and filtering options have been tested with
xarray.

.. _io.cfgrib:

GRIB format via cfgrib
----------------------

xarray supports reading GRIB files via ECMWF cfgrib_ python driver and ecCodes_
C-library, if they are installed. To open GRIB file supply ``engine='cfgrib'``
to :py:func:`~xarray.open_dataset`:

We recommend installing ecCodes via conda::

conda install -c conda-forge eccodes
pip install cfgrib

.. _cfgrib: https://github.com/ecmwf/cfgrib
.. _ecCodes: https://confluence.ecmwf.int/display/ECC/ecCodes+Home

.. _io.pynio:

Formats supported by PyNIO
Expand Down
6 changes: 5 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,11 @@ Enhancements
:py:meth:`~xarray.DataArray.interp`, and
:py:meth:`~xarray.Dataset.interp`.
By `Spencer Clark <https://github.com/spencerkclark>`_

- Added a new backend for the GRIB file format based on ECMWF *cfgrib*
python driver and *ecCodes* C-library. (:issue:`2475`)
By `Alessandro Amici <https://github.com/alexamici>`_,
sponsored by `ECMWF <https://github.com/ecmwf>`_.

Bug fixes
~~~~~~~~~

Expand Down
2 changes: 2 additions & 0 deletions xarray/backends/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""
from .common import AbstractDataStore
from .file_manager import FileManager, CachingFileManager, DummyFileManager
from .cfgrib_ import CfGribDataStore
from .memory import InMemoryDataStore
from .netCDF4_ import NetCDF4DataStore
from .pydap_ import PydapDataStore
Expand All @@ -18,6 +19,7 @@
'AbstractDataStore',
'FileManager',
'CachingFileManager',
'CfGribDataStore',
'DummyFileManager',
'InMemoryDataStore',
'NetCDF4DataStore',
Expand Down
12 changes: 9 additions & 3 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,8 @@ def open_dataset(filename_or_obj, group=None, decode_cf=True,
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset.
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio', 'pseudonetcdf'}, optional
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio', 'cfgrib',
'pseudonetcdf'}, optional
Engine to use when reading files. If not provided, the default engine
is chosen based on available dependencies, with a preference for
'netcdf4'.
Expand Down Expand Up @@ -296,6 +297,9 @@ def maybe_decode_store(store, lock=False):
elif engine == 'pseudonetcdf':
store = backends.PseudoNetCDFDataStore.open(
filename_or_obj, lock=lock, **backend_kwargs)
elif engine == 'cfgrib':
store = backends.CfGribDataStore(
filename_or_obj, lock=lock, **backend_kwargs)
else:
raise ValueError('unrecognized engine for open_dataset: %r'
% engine)
Expand Down Expand Up @@ -356,7 +360,8 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
decode_coords : bool, optional
If True, decode the 'coordinates' attribute to identify coordinates in
the resulting dataset.
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio'}, optional
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio', 'cfgrib'},
optional
Engine to use when reading files. If not provided, the default engine
is chosen based on available dependencies, with a preference for
'netcdf4'.
Expand Down Expand Up @@ -486,7 +491,8 @@ def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT,
of all non-null values.
preprocess : callable, optional
If provided, call this function on each dataset prior to concatenation.
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio'}, optional
engine : {'netcdf4', 'scipy', 'pydap', 'h5netcdf', 'pynio', 'cfgrib'},
optional
Engine to use when reading files. If not provided, the default engine
is chosen based on available dependencies, with a preference for
'netcdf4'.
Expand Down
97 changes: 97 additions & 0 deletions xarray/backends/cfgrib_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#
# Copyright 2017-2018 European Centre for Medium-Range Weather Forecasts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really object to this notice, but we don't include it in other source code files for xarray so it looks a little out of place (perhaps we should?). Everything is Apache 2 licensed already, and owned by contributors (or whoever they assign it to, such as an employer).

Copy link
Collaborator Author

@alexamici alexamici Oct 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to handle this one. ECMWF is quite sensitive to license and IPR matters so I added the licence boilerplate to all cfgrib files and later I simply copied the existing backend code as part of the PR.

I'm the material author of the code and my name will appear in the contributors, but I've been working fully funded by ECMWF as an external contractor, so it looks like proper attribution would be lost in this case.

I need to ask @StephanSiemen if they object to removing the copyright notice or what else they propose. To my knowledge this is the first contribution to an external Open Source project funded by ECMWF this way and we are learning how to handle these kind of details as we go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says "Copyright 2014-2018, xarray Developers" in our README.rst file, which was modeled off projects like NumPy: https://github.com/numpy/numpy/blob/master/LICENSE.txt

I see now that our LICENSE file just has the original Apache license text. Perhaps we should add in the more specific "Copyright xarray developers" line, like a project like TensorFlow: https://github.com/tensorflow/tensorflow/blob/master/LICENSE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexamici (and ECMWF by proxy I suppose) - first, I want to make sure you understand that I'm very encouraged by your recent developments of cfgrib as an open source package and your contributions to xarray. They are much appreciated and they stand to benefit a wide swath of the geoscience community.

That said, at this point, I'm somewhat against adding this copyright/license header for these reasons:

  • Xarray already has a clear copyright statement that is permissive enough to allow you (and ECMWF) to maintain copyrights over your contributions.
  • We don't do this anywhere else in the xarray code base. Though many of the contributors that have developed xarray have done so as employees/contractors of various organizations, we've not adopted this level of documentation with regard to the original author or subsequent editors.

Now, I understand that it can be important for organizations to make visible their open source contributions so we want to make sure this sort of engagement can continue to happen. We've recently joined NUMFOCUS, in part to give us access to some proper legal advice when necessary. We can certainly solicit their advice here if we have technical questions.

It's probably worth pinging the rest of @pydata/xarray to get their thoughts here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexamici, @shoyer, @jhamman,
We at ECMWF are very happy for the copyright to be adjusted according to other contributions in xarray. Any acknowledgement of the contribution is much appreciated. Thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was being overzealous :)

I removed the copyright notice and licence boilerplate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Authors:
# Alessandro Amici - B-Open - https://bopen.eu
#

from __future__ import absolute_import, division, print_function

import numpy as np

from .. import Variable
from ..core import indexing
from ..core.utils import Frozen, FrozenOrderedDict
from .common import AbstractDataStore, BackendArray
from .locks import ensure_lock, SerializableLock

# FIXME: Add a dedicated lock, even if ecCodes is supposed to be thread-safe
# in most circumstances. See:
# https://confluence.ecmwf.int/display/ECC/Frequently+Asked+Questions
ECCODES_LOCK = SerializableLock()


class CfGribArrayWrapper(BackendArray):
def __init__(self, datastore, array):
self.datastore = datastore
self.shape = array.shape
self.dtype = array.dtype
self.array = array

def __getitem__(self, key):
return indexing.explicit_indexing_adapter(
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem)

def _getitem(self, key):
with self.datastore.lock:
return self.array[key]


class CfGribDataStore(AbstractDataStore):
"""
Implements the ``xr.AbstractDataStore`` read-only API for a GRIB file.
"""
def __init__(self, filename, lock=None, **backend_kwargs):
import cfgrib
if lock is None:
lock = ECCODES_LOCK
self.lock = ensure_lock(lock)

# NOTE: filter_by_keys is a dict, but CachingFileManager only accepts
# hashable types.
if 'filter_by_keys' in backend_kwargs:
filter_by_keys_items = backend_kwargs['filter_by_keys'].items()
backend_kwargs['filter_by_keys'] = tuple(filter_by_keys_items)

self.ds = cfgrib.open_file(filename, mode='r', **backend_kwargs)

def open_store_variable(self, name, var):
if isinstance(var.data, np.ndarray):
data = var.data
else:
wrapped_array = CfGribArrayWrapper(self, var.data)
data = indexing.LazilyOuterIndexedArray(wrapped_array)

encoding = self.ds.encoding.copy()
encoding['original_shape'] = var.data.shape

return Variable(var.dimensions, data, var.attributes, encoding)

def get_variables(self):
return FrozenOrderedDict((k, self.open_store_variable(k, v))
for k, v in self.ds.variables.items())

def get_attrs(self):
return Frozen(self.ds.attributes)

def get_dimensions(self):
return Frozen(self.ds.dimensions)

def get_encoding(self):
dims = self.get_dimensions()
encoding = {
'unlimited_dims': {k for k, v in dims.items() if v is None},
}
return encoding
1 change: 1 addition & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ def LooseVersion(vstring):
has_zarr, requires_zarr = _importorskip('zarr', minversion='2.2')
has_np113, requires_np113 = _importorskip('numpy', minversion='1.13.0')
has_iris, requires_iris = _importorskip('iris')
has_cfgrib, requires_cfgrib = _importorskip('cfgrib')

# some special cases
has_scipy_or_netCDF4 = has_scipy or has_netCDF4
Expand Down
Binary file added xarray/tests/data/example.grib
Binary file not shown.
24 changes: 23 additions & 1 deletion xarray/tests/test_backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
has_dask, has_netCDF4, has_scipy, network, raises_regex, requires_cftime,
requires_dask, requires_h5netcdf, requires_netCDF4, requires_pathlib,
requires_pseudonetcdf, requires_pydap, requires_pynio, requires_rasterio,
requires_scipy, requires_scipy_or_netCDF4, requires_zarr)
requires_scipy, requires_scipy_or_netCDF4, requires_zarr, requires_cfgrib)
from .test_dataset import create_test_data

try:
Expand Down Expand Up @@ -2463,6 +2463,28 @@ def test_weakrefs(self):
assert_identical(actual, expected)


@requires_cfgrib
class TestCfGrib(object):

def test_read(self):
expected = {'number': 2, 'time': 3, 'air_pressure': 2, 'latitude': 3,
'longitude': 4}
with open_example_dataset('example.grib', engine='cfgrib') as ds:
assert ds.dims == expected
assert list(ds.data_vars) == ['z', 't']
assert ds['z'].min() == 12660.

def test_read_filter_by_keys(self):
kwargs = {'filter_by_keys': {'shortName': 't'}}
expected = {'number': 2, 'time': 3, 'air_pressure': 2, 'latitude': 3,
'longitude': 4}
with open_example_dataset('example.grib', engine='cfgrib',
backend_kwargs=kwargs) as ds:
assert ds.dims == expected
assert list(ds.data_vars) == ['t']
assert ds['t'].min() == 231.


@requires_pseudonetcdf
@pytest.mark.filterwarnings('ignore:IOAPI_ISPH is assumed to be 6370000')
class TestPseudoNetCDFFormat(object):
Expand Down