ENH: Keep original stack when exception happened in dataframe.apply #4254

Wuvist · 2013-07-16T10:48:28Z

when using raise e, the error stack app user received will begin in _apply_standard method.

Use raise instead, so that user could received error stack in funv(v) call. Easier for user to debug.

jtratner · 2013-07-16T12:06:07Z

Can you show stack traces before and after your change?

Wuvist · 2013-07-16T12:10:28Z

Before:
Traceback (most recent call last):
File "zrs.py", line 137, in
main()
File "zrs.py", line 131, in main
update_rec()
File "zrs.py", line 119, in update_rec
runFormula(formula, rec)
File "zrs.py", line 74, in runFormula
df.apply(pyformulas.process_similar, axis=1)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4118, in apply
return self._apply_standard(f, axis)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4193, in _apply_standard
raise e
TypeError: ('float() argument must be a string or a number', u'occurred at index 122')

After
Traceback (most recent call last):
File "zrs.py", line 137, in
main()
File "zrs.py", line 131, in main
update_rec()
File "zrs.py", line 119, in update_rec
runFormula(formula, rec)
File "zrs.py", line 74, in runFormula
df.apply(pyformulas.process_similar, axis=1)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4118, in apply
return self._apply_standard(f, axis)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4182, in _apply_standard
results[i] = func(v)
File "/Users/wuvist/Dropbox/zalora/zrs/scripts/pyformulas.py", line 25, in process_similar
cr = util.getCR(it)
File "/Users/wuvist/Dropbox/zalora/zrs/scripts/formula_util.py", line 65, in getCR
return product.Cart / float(product.Views)
TypeError: ('float() argument must be a string or a number', u'occurred at index 122')

This change will help me to location issue in formula_util.py, line 65, in getCR function quickly.

PS: pyformulas.py process_similar function is the function being applied to dataframe.

alvorithm · 2013-07-16T12:38:55Z

+1 this is essential for everyday work and the opaque behaviour was a major deterrent for apply/map/transform/groupby operations.

jreback · 2013-07-16T12:43:21Z

@Wuvist

can you post a complete example (and cut down the traceback to the a couple of the last lines)

ideally if you produce 2 (or more) examples that show different exceptions from the function (e.g. say NameError, TypeError...etc)....

I am thinking we might have a couple of tests that verify this

jreback · 2013-07-16T12:47:16Z

A trivial example (so post the existing and new errors)

In [1]: df = DataFrame(randn(10,2),columns=list('AB'))

In [3]: df['grp'] = 'foo'

In [4]: df.ix[5:,'grp'] = 'bar'

In [6]: df.groupby('grp').apply(lambda x: x.sumdf)
AttributeError: 'DataFrame' object has no attribute 'sumdf'

Wuvist · 2013-07-16T12:56:30Z

@jreback

I don't think there is need for tests, this is the standard way of re-raise exception:
http://docs.python.org/2/tutorial/errors.html#raising-exceptions

jreback · 2013-07-16T12:58:01Z

the reason for tests are to verify standard pandas behavior, how did you know that a future change preserves this behavior? that is the reason for tests (as the current behavior is wrong)

Wuvist · 2013-07-16T13:13:41Z

@jreback

Oh, you mean create a new test case in
pandas/tests/test_frame.py
?

But that would be a new pull request, right?

And, I can't figure out a way to write a test case for asserting error stack at the moment, as the stack calling the test itself maybe changed. >_<

There is also no point to checking the exception type being raise, and raise/ raise e give the same type of exception.

jreback · 2013-07-16T13:18:32Z

I mean add onto this PR (add another commit)

You don't need to assert the stack, just assert that the exception raises (AND the error message) is what you except

the example I gave above you give a very specific error message (which is different from what I posted).

It is very important to test this as certain higher level operations catch various exceptions and pass them on or raise as needed. This change even though looks very trivial could cause real changes if say you have an embedded apply and something is (maybe erroreously) relying upon a raised exception

here's how to do this (this is testing that the constructors raise correctly)

    def test_constructor_error_msgs(self):

        # mix dict and array, wrong size
        def testit():
            DataFrame({'A': {'a': 'a', 'b': 'b'},
                       'B': ['a', 'b', 'c']})
        assertRaisesRegexp(ValueError, "Mixing dicts with non-Series may lead to ambiguous ordering.", testit)

        # wrong size ndarray, GH 3105
        def testit():
            DataFrame(np.arange(12).reshape((4, 3)), columns=['foo', 'bar', 'baz'],
                      index=date_range('2000-01-01', periods=3))
        assertRaisesRegexp(ValueError, "Shape of passed values is \(3, 4\), indices imply \(3, 3\)", testit)

Wuvist · 2013-07-16T13:26:39Z

@jreback

Sorry, but I don't understand.

My PR is about preserving the call stack, not about the exception being raise nor its error message.

The exception raised and the error message is exactly the same before and after my patch.

jreback · 2013-07-16T13:31:14Z

is that ALWAYS the case?

are the error messages different?

Wuvist · 2013-07-16T13:32:19Z

@jreback
Yes, it's ALWAYS the case.

The errors messages are the same before & after this patch.

jreback · 2013-07-16T13:34:44Z

ahhh...I thought the applied function error was not bubbling up....so ok then...

Wuvist · 2013-07-16T13:36:40Z

^_^

cpcloud · 2013-07-21T02:28:20Z

@Wuvist can you set up travis? if u need help doing that i can show u how. thanks

Wuvist · 2013-07-21T10:40:46Z

@cpcloud

travis? The CI tools? I'm willing to try it. ^_^

but, how does travis related to this PR?

jreback · 2013-07-21T14:31:28Z

travis is important for error checking on all distributions / configurations, e.g. something that works on py2.7 might not work with 3.3, or a different numpy
it gives others a quick read on whether the PR has been error tested

in this case it is probably trivial. but it is good practice. so pls hook up. see contributing.md for a howto

thanks!

jtratner · 2013-08-10T01:51:06Z

@Wuvist if you have time to rebase this and setup travis - it'd be nice to have this fix.

jreback · 2013-08-21T13:32:23Z

closing in favor of #4549

Keep original stack when exception happened in dataframe.apply

f915dc6

jtratner mentioned this pull request Aug 13, 2013

ENH/API: Keep original traceback in DataFrame.apply #4549

Merged

jreback closed this Aug 21, 2013

Uh oh!

ENH: Keep original stack when exception happened in dataframe.apply #4254

ENH: Keep original stack when exception happened in dataframe.apply #4254

Uh oh!

Conversation

Wuvist commented Jul 16, 2013

Uh oh!

jtratner commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

alvorithm commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

jreback commented Jul 16, 2013

Uh oh!

Wuvist commented Jul 16, 2013

Uh oh!

cpcloud commented Jul 21, 2013

Uh oh!

Wuvist commented Jul 21, 2013

Uh oh!

jreback commented Jul 21, 2013

Uh oh!

jtratner commented Aug 10, 2013

Uh oh!

jreback commented Aug 21, 2013

Uh oh!

Uh oh!