Description
Hello,
I have 6 hourly data (ERA Interim) for around 10 years. I want to calculate the annual 6 hourly climatology, i.e, 366*4 values, with each value corresponding to a 6 hourly interval. I am chunking the data along longitude.
I'm using xarray 0.9.1 with Python 3.6 (Anaconda).
For a daily climatology on this data, I do the usual:
mean = data.groupby('time.dayofyear').mean(dim='time').compute()
For the 6 hourly version, I am trying the following:
test = (data['time.hour']/24 + data['time.dayofyear'])
test.name = 'dayHourly'
new_test = data.groupby(test).mean(dim='time').compute()
The first one (daily climatology) takes around 15 minutes for my data, whereas the second one ran for almost 30 minutes after which I gave up and killed the process.
Is there some obvious reason why the first is much faster than the second? data
in both cases is the 6 hourly dataset. And is there an alternative way of expressing this computation which would make it faster?
TIA,
Joy