Description
It would be awesome to add coords while concatenating. Basically, combining this into one line:
DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())
For this dataset I made up, imagine 100 patients, 12 months, and 10000 attributes which would be a typical 3D dataset. Basically, I end up with a bunch of 2D DataArrays (row=months, col=attributes) this DataArray is the value in my dictionary and the patient it came from is the key (i.e. (patient_x : DataArray_X) )
I'm trying to do DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")
but it's not working and I need to split it up like DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients"); DA_data.coords["Patients"] = list(D_patient_DA.keys())
Am I not writing the one-liner in the right format?
The docs say coords : {‘minimal’, ‘different’, ‘all’ o list of str}
so it seems like it should work
Here is my code for generating fake data for this problem:
import xarray as xr
import numpy as np
from collections import *
np.random.seed(1618033)
#Set dimensions
a,b,c = 100,12,10000 #100 patients, 12 months, 10000 attributes
#Create labels
patients = ["patient_%d" % i for i in range(a)]
months = [j for j in range(b)]
attributes = ["attr_%d" % k for k in range(c)]
#Dict of DataFrames
D_patient_DA = OrderedDict()
for i, patient in enumerate(patients):
A_placeholder = np.zeros((b,c))
for j, month in enumerate(months):
#Genes x Replicates
V_attrExp = np.random.random(c)
#Fill array with row
A_placeholder[j,:] = V_attrExp
#Assign dataframe for every patient
D_patient_DA[patient] = xr.DataArray(A_placeholder, coords = [months, attributes], dims = ["Months","Attributes"])
#I'd like to do this:
#DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")
#Traceback (most recent call last):
# File "Untitled.py", line 29, in <module>
# DA_data = xr.concat(list(D_patient_DA.values()), coords = list(D_patient_DA.keys()), dim="Patients")
# File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 114, in concat
# return f(objs, dim, data_vars, coords, compat, positions)
# File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 301, in _dataarray_concat
# positions)
# File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 207, in _dataset_concat
# concat_over = _calc_concat_over(datasets, dim, data_vars, coords)
# File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 186, in _calc_concat_over
# concat_over.update(process_subset_opt(coords, 'coords'))
# File "/Users/Mu/Dropbox/anaconda/lib/python3.5/site-packages/xarray/core/combine.py", line 177, in process_subset_opt
# % (subset, subset_long_name, invalid_vars))
#ValueError: some variables in coords are not coordinates on the first dataset: ['patient_0', 'patient_1', 'patient_2', 'patient_3', 'patient_4', 'patient_5', 'patient_6', 'patient_7', 'patient_8', 'patient_9', 'patient_10', 'patient_11', 'patient_12', 'patient_13', 'patient_14', 'patient_15', 'patient_16', 'patient_17', 'patient_18', 'patient_19', 'patient_20', 'patient_21', 'patient_22', 'patient_23', 'patient_24', 'patient_25', 'patient_26', 'patient_27', 'patient_28', 'patient_29', 'patient_30', 'patient_31', 'patient_32', 'patient_33', 'patient_34', 'patient_35', 'patient_36', 'patient_37', 'patient_38', 'patient_39', 'patient_40', 'patient_41', 'patient_42', 'patient_43', 'patient_44', 'patient_45', 'patient_46', 'patient_47', 'patient_48', 'patient_49', 'patient_50', 'patient_51', 'patient_52', 'patient_53', 'patient_54', 'patient_55', 'patient_56', 'patient_57', 'patient_58', 'patient_59', 'patient_60', 'patient_61', 'patient_62', 'patient_63', 'patient_64', 'patient_65', 'patient_66', 'patient_67', 'patient_68', 'patient_69', 'patient_70', 'patient_71', 'patient_72', 'patient_73', 'patient_74', 'patient_75', 'patient_76', 'patient_77', 'patient_78', 'patient_79', 'patient_80', 'patient_81', 'patient_82', 'patient_83', 'patient_84', 'patient_85', 'patient_86', 'patient_87', 'patient_88', 'patient_89', 'patient_90', 'patient_91', 'patient_92', 'patient_93', 'patient_94', 'patient_95', 'patient_96', 'patient_97', 'patient_98', 'patient_99']
#But I have to do this instead
DA_data = xr.concat(list(D_patient_DA.values()), dim="Patients")
DA_data.coords["Patients"] = list(D_patient_DA.keys())