Skip to content

Unexpected transform behavior on grouped dataset #3740

Closed
@fonnesbeck

Description

@fonnesbeck

I have a simple longitudinal biomedical dataset that I am grouping according to the patient on which measurements are taken. Here are the first couple of groups:

1
   patient  obs  week  site  id  treat  age sex  twstrs  treatment
0        1    1     0     1   1  5000U   65   F      32          1
1        1    2     2     1   1  5000U   65   F      30          1
2        1    3     4     1   1  5000U   65   F      24          1
3        1    4     8     1   1  5000U   65   F      37          1
4        1    5    12     1   1  5000U   65   F      39          1
5        1    6    16     1   1  5000U   65   F      36          1

2
    patient  obs  week  site  id   treat  age sex  twstrs  treatment
6         2    1     0     1   2  10000U   70   F      60          2
7         2    2     2     1   2  10000U   70   F      26          2
8         2    3     4     1   2  10000U   70   F      27          2
9         2    4     8     1   2  10000U   70   F      41          2
10        2    5    12     1   2  10000U   70   F      65          2
11        2    6    16     1   2  10000U   70   F      67          2

However, when I try to transform these data, say by normalization, I get nonsensical results:

normalize = lambda x: (x - x.mean())/x.std()
normed = cdystonia_grouped.transform(normalize)
normed.head(10)

               patient  obs  week                 site                   id  \
0 -9223372036854775808   -1    -1 -9223372036854775808 -9223372036854775808   
1 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
2 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
3 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
4 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   

                   age  twstrs            treatment  
0 -9223372036854775808       0 -9223372036854775808  
1 -9223372036854775808       0 -9223372036854775808  
2 -9223372036854775808      -1 -9223372036854775808  
3 -9223372036854775808       0 -9223372036854775808  
4 -9223372036854775808       1 -9223372036854775808  

The normalize function is straightforward, and works fine when applied to manually subsetted data:

normalize(cdystonia.twstrs[cdystonia.patient==1])

0   -0.181369
1   -0.544107
2   -1.632322
3    0.725476
4    1.088214
5    0.544107
Name: twstrs, dtype: float64

Any guidance here much appreciated. I'm hoping its something obvious.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions