Skip to content

Using groupby with custom index #1308

Closed
@JoyMonteiro

Description

@JoyMonteiro

Hello,

I have 6 hourly data (ERA Interim) for around 10 years. I want to calculate the annual 6 hourly climatology, i.e, 366*4 values, with each value corresponding to a 6 hourly interval. I am chunking the data along longitude.
I'm using xarray 0.9.1 with Python 3.6 (Anaconda).

For a daily climatology on this data, I do the usual:

mean = data.groupby('time.dayofyear').mean(dim='time').compute()

For the 6 hourly version, I am trying the following:

test = (data['time.hour']/24 + data['time.dayofyear'])
test.name = 'dayHourly'
new_test = data.groupby(test).mean(dim='time').compute()

The first one (daily climatology) takes around 15 minutes for my data, whereas the second one ran for almost 30 minutes after which I gave up and killed the process.

Is there some obvious reason why the first is much faster than the second? data in both cases is the 6 hourly dataset. And is there an alternative way of expressing this computation which would make it faster?

TIA,
Joy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions