Skip to content

ENH: Keep original stack when exception happened in dataframe.apply #4254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Wuvist
Copy link

@Wuvist Wuvist commented Jul 16, 2013

when using raise e, the error stack app user received will begin in _apply_standard method.

Use raise instead, so that user could received error stack in funv(v) call. Easier for user to debug.

@jtratner
Copy link
Contributor

Can you show stack traces before and after your change?

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

@jtratner

Before:
Traceback (most recent call last):
File "zrs.py", line 137, in
main()
File "zrs.py", line 131, in main
update_rec()
File "zrs.py", line 119, in update_rec
runFormula(formula, rec)
File "zrs.py", line 74, in runFormula
df.apply(pyformulas.process_similar, axis=1)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4118, in apply
return self._apply_standard(f, axis)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4193, in _apply_standard
raise e
TypeError: ('float() argument must be a string or a number', u'occurred at index 122')

After
Traceback (most recent call last):
File "zrs.py", line 137, in
main()
File "zrs.py", line 131, in main
update_rec()
File "zrs.py", line 119, in update_rec
runFormula(formula, rec)
File "zrs.py", line 74, in runFormula
df.apply(pyformulas.process_similar, axis=1)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4118, in apply
return self._apply_standard(f, axis)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 4182, in _apply_standard
results[i] = func(v)
File "/Users/wuvist/Dropbox/zalora/zrs/scripts/pyformulas.py", line 25, in process_similar
cr = util.getCR(it)
File "/Users/wuvist/Dropbox/zalora/zrs/scripts/formula_util.py", line 65, in getCR
return product.Cart / float(product.Views)
TypeError: ('float() argument must be a string or a number', u'occurred at index 122')

This change will help me to location issue in formula_util.py, line 65, in getCR function quickly.

PS: pyformulas.py process_similar function is the function being applied to dataframe.

@alvorithm
Copy link

+1 this is essential for everyday work and the opaque behaviour was a major deterrent for apply/map/transform/groupby operations.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

@Wuvist

can you post a complete example (and cut down the traceback to the a couple of the last lines)

ideally if you produce 2 (or more) examples that show different exceptions from the function (e.g. say NameError, TypeError...etc)....

I am thinking we might have a couple of tests that verify this

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

A trivial example (so post the existing and new errors)

In [1]: df = DataFrame(randn(10,2),columns=list('AB'))

In [3]: df['grp'] = 'foo'

In [4]: df.ix[5:,'grp'] = 'bar'

In [6]: df.groupby('grp').apply(lambda x: x.sumdf)
AttributeError: 'DataFrame' object has no attribute 'sumdf'

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

@jreback

I don't think there is need for tests, this is the standard way of re-raise exception:
http://docs.python.org/2/tutorial/errors.html#raising-exceptions

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

the reason for tests are to verify standard pandas behavior, how did you know that a future change preserves this behavior? that is the reason for tests (as the current behavior is wrong)

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

@jreback

Oh, you mean create a new test case in
pandas/tests/test_frame.py
?

But that would be a new pull request, right?

And, I can't figure out a way to write a test case for asserting error stack at the moment, as the stack calling the test itself maybe changed. >_<

There is also no point to checking the exception type being raise, and raise/ raise e give the same type of exception.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

I mean add onto this PR (add another commit)

You don't need to assert the stack, just assert that the exception raises (AND the error message) is what you except

the example I gave above you give a very specific error message (which is different from what I posted).

It is very important to test this as certain higher level operations catch various exceptions and pass them on or raise as needed. This change even though looks very trivial could cause real changes if say you have an embedded apply and something is (maybe erroreously) relying upon a raised exception

here's how to do this (this is testing that the constructors raise correctly)

    def test_constructor_error_msgs(self):

        # mix dict and array, wrong size
        def testit():
            DataFrame({'A': {'a': 'a', 'b': 'b'},
                       'B': ['a', 'b', 'c']})
        assertRaisesRegexp(ValueError, "Mixing dicts with non-Series may lead to ambiguous ordering.", testit)

        # wrong size ndarray, GH 3105
        def testit():
            DataFrame(np.arange(12).reshape((4, 3)), columns=['foo', 'bar', 'baz'],
                      index=date_range('2000-01-01', periods=3))
        assertRaisesRegexp(ValueError, "Shape of passed values is \(3, 4\), indices imply \(3, 3\)", testit)

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

@jreback

Sorry, but I don't understand.

My PR is about preserving the call stack, not about the exception being raise nor its error message.

The exception raised and the error message is exactly the same before and after my patch.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

is that ALWAYS the case?

are the error messages different?

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

@jreback
Yes, it's ALWAYS the case.

The errors messages are the same before & after this patch.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

ahhh...I thought the applied function error was not bubbling up....so ok then...

@Wuvist
Copy link
Author

Wuvist commented Jul 16, 2013

^_^

@cpcloud
Copy link
Member

cpcloud commented Jul 21, 2013

@Wuvist can you set up travis? if u need help doing that i can show u how. thanks

@Wuvist
Copy link
Author

Wuvist commented Jul 21, 2013

@cpcloud

travis? The CI tools? I'm willing to try it. ^_^

but, how does travis related to this PR?

@jreback
Copy link
Contributor

jreback commented Jul 21, 2013

  • travis is important for error checking on all distributions / configurations, e.g. something that works on py2.7 might not work with 3.3, or a different numpy
  • it gives others a quick read on whether the PR has been error tested

in this case it is probably trivial. but it is good practice. so pls hook up. see contributing.md for a howto

thanks!

@jtratner
Copy link
Contributor

@Wuvist if you have time to rebase this and setup travis - it'd be nice to have this fix.

@jreback
Copy link
Contributor

jreback commented Aug 21, 2013

closing in favor of #4549

@jreback jreback closed this Aug 21, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants