Performance problem when doing computation between two arrays with discontinuous indexes 

#### MCVE Code Sample

```python
import xarray as xr 
import numpy as np 

# Creating array
ds = xr.Dataset()
ds["longitude"] = np.arange(0.1,2000.1,1)
ds["latitude"] = range(3000)
ds["step"] = range(50)
ds["field"] = (("step","latitude","longitude"),np.random.randn(50,3000,2000))
ds.to_netcdf("big_array.nc")

# Create another array 
ds = xr.Dataset()
ds["longitude"] = np.arange(500.1,600.1,1) # Coordinate are a continuous subset of the first array
ds["latitude"] = np.arange(510,660)
ds["id"] = range(50)
ds["field"] = (("longitude","latitude","id"),np.random.randn(100,150,50))
ds.to_netcdf("slicing.nc")

# Create another array with "discontinuity" in longitude dimension 
ds = xr.Dataset()
ds["longitude"] = list(np.arange(500.1,598.1,1)) +[622.1,640.1]
ds["latitude"] = range(510,660)
ds["id"] = range(10)
ds["mask"] = (("longitude","latitude","id"),np.random.randn(100,150,10))
ds.to_netcdf("no_slicing.nc")

# Load the Three arrays 

da = xr.open_dataset("big_array.nc")
db = xr.open_dataset("slicing.nc").isel(id=0)
dc = xr.open_dataarray("no_slicing.nc").isel(id=0)

%timeit da*db
# 32.3 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit da*dc
#2.13 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def slicing_operation(dc,da):
    """
    Slicing when knowing that dc is a subpart of da
    """
    import operator
    min_lat = np.max([dc.latitude.values.min(),da.latitude.values.min()])
    max_lat = np.min([dc.latitude.values.max(),da.latitude.values.max()])
    index_lat_field = operator.and_(da.latitude >= min_lat,da.latitude <= max_lat)
    min_lon = np.max([dc.longitude.values.min(),da.longitude.values.min()])
    max_lon = np.min([dc.longitude.values.max(),da.longitude.values.max()])
    index_lon_field = operator.and_(da.longitude >= min_lon,da.longitude <= max_lon)
    
    # Extending dc such that it covers all longitude of da 
    # Creating an empty array. 
    dempty = xr.Dataset()
    dempty["longitude"] = da.longitude.isel(longitude=index_lon_field).values
    dempty["latitude"] = da.latitude.isel(latitude=index_lat_field).values
    dc_ext = dc.broadcast_like(dempty)
    # Performing multiplication and align 
    res,_ = xr.align((dc_ext *da),dc)
    return res

%timeit slicing_operation(dc,da)
#43.9 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#Checking if we get the same results
res_0 = dc*da
res = slicing_operation(dc,da)
print(abs(res_0 - res).max())
```


#### Problem Description

A performance problem occured when performing operation between da and dc. 
Computing ```da *db``` takes ~32 ms 
While db and dc have the same shape, opeartion  ```dc * da``` takes ~2.13s (so x 60 in cpu times). 
This seems linked to the fact tha dc includes discontinuous indexes of da. 
 
Indeed, when extending dc (such that its longitude are continuous when compared at da) the computation time take ~44 ms. 
For me it seems that when doing ```da * dc```  da dataset is fully loaded in memory first, while only the  part involved in computation is loaded when doing ``` da *db ```.
Indeed when doing : 
```python 
%time da.load()
```
we get the following result:
```
CPU times: user 0 ns, sys: 1.88 s, total: 1.88 s

Wall time: 1.89 s
```
On top of that, ```da *db``` scales with the number of ```id ``` selected while it is not the case for ```da *dc ```.
#### Output of ``xr.show_versions()``
<details>
INSTALLED VERSIONS
------------------
commit: None

python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:33:48) 

[GCC 7.3.0]

python-bits: 64

OS: Linux

OS-release: 3.10.0-957.27.2.el7.x86_64

machine: x86_64

processor: x86_64

byteorder: little

LC_ALL: None

LANG: en_US.UTF-8

LOCALE: en_US.UTF-8

libhdf5: 1.10.5

libnetcdf: 4.7.3


xarray: 0.14.1

pandas: 0.25.3

numpy: 1.17.4

scipy: 1.4.1

netCDF4: 1.5.3

pydap: None

h5netcdf: None

h5py: 2.10.0

Nio: None

zarr: None

cftime: 1.0.4.2

nc_time_axis: None

PseudoNetCDF: None

rasterio: None

cfgrib: 0.9.7.6

iris: None

bottleneck: 1.3.1

dask: 2.9.1

distributed: 2.9.1

matplotlib: 3.1.2

cartopy: None

seaborn: 0.9.0

numbagg: None

setuptools: 44.0.0.post20200106

pip: 19.3.1

conda: None

pytest: 5.3.2

IPython: 7.11.1

sphinx: 2.3.1
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance problem when doing computation between two arrays with discontinuous indexes #3755

MCVE Code Sample

Problem Description

Output of `xr.show_versions()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Performance problem when doing computation between two arrays with discontinuous indexes #3755

Description

MCVE Code Sample

Problem Description

Output of xr.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `xr.show_versions()`