Skip to content

Empty dataframe creation #25887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0x0L opened this issue Mar 26, 2019 · 4 comments · Fixed by #57205
Closed

Empty dataframe creation #25887

0x0L opened this issue Mar 26, 2019 · 4 comments · Fixed by #57205
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Performance Memory or execution speed performance

Comments

@0x0L
Copy link
Contributor

0x0L commented Mar 26, 2019

Hello,

Creating an empty dataframe with specified columns

pd.__version__
# '0.24.2'

%timeit pd.DataFrame([], columns=['a'])
# 1.57 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

is way slower than it should be:

%timeit pd.DataFrame([])
# 184 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit pd.DataFrame([[1]], columns=['a'])
# 343 µs ± 3.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
@jreback
Copy link
Contributor

jreback commented Mar 26, 2019

this is premature optimization

why does this matter in the real world?

@0x0L
Copy link
Contributor Author

0x0L commented Mar 26, 2019

I'm reading from batch records from an API and batches can be empty.
It's just a surprising behavior. Once you noticed, it's nothing that can't be fixed one way or another...

@jreback
Copy link
Contributor

jreback commented Mar 26, 2019

if u think can be fixed w/on adding complexity then pls submit a PR

@mroeschke mroeschke added DataFrame DataFrame data structure Performance Memory or execution speed performance labels Mar 27, 2019
@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@simonjayhawkins
Copy link
Member

can also use examples from OP of #28188 as additional benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants