Skip to content

feat: export the data from a table to parquet files#1000

Merged
v0y4g3r merged 15 commits intoGreptimeTeam:developfrom
fengjiachun:feat/copy-table
Feb 20, 2023
Merged

feat: export the data from a table to parquet files#1000
v0y4g3r merged 15 commits intoGreptimeTeam:developfrom
fengjiachun:feat/copy-table

Conversation

@fengjiachun
Copy link
Copy Markdown
Collaborator

@fengjiachun fengjiachun commented Feb 14, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

To export the data from a table to a Parquet file, use the COPY statement.

COPY tbl TO '/xxx/xxx/output.parquet' WITH (FORMAT = 'parquet');

If the data is too large, it will be automatically split into multiple files, one file per 5 million rows.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

@fengjiachun fengjiachun changed the title feat: copy table feat: export the data from a table to a parquet file Feb 14, 2023
@fengjiachun fengjiachun marked this pull request as draft February 14, 2023 13:36
@fengjiachun fengjiachun force-pushed the feat/copy-table branch 5 times, most recently from 9a54859 to 3fe9f2f Compare February 15, 2023 03:44
@fengjiachun fengjiachun marked this pull request as ready for review February 15, 2023 03:45
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 15, 2023

Codecov Report

Merging #1000 (11692f2) into develop (a9c8584) will decrease coverage by 0.37%.
The diff coverage is 48.24%.

@@             Coverage Diff             @@
##           develop    #1000      +/-   ##
===========================================
- Coverage    86.18%   85.82%   -0.37%     
===========================================
  Files          439      444       +5     
  Lines        63866    64459     +593     
===========================================
+ Hits         55044    55321     +277     
- Misses        8822     9138     +316     
Flag Coverage Δ
rust 85.82% <48.24%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/datanode/src/sql/copy_table.rs 0.00% <0.00%> (ø)
src/query/src/datafusion/planner.rs 68.85% <0.00%> (ø)
src/sql/src/statements.rs 90.54% <ø> (ø)
src/sql/src/statements/statement.rs 50.00% <ø> (ø)
src/table/src/requests.rs 61.53% <0.00%> (-5.13%) ⬇️
src/frontend/src/instance.rs 79.36% <33.33%> (-0.21%) ⬇️
src/datanode/src/sql.rs 79.12% <41.66%> (ø)
src/datanode/src/instance/sql.rs 76.39% <45.94%> (-3.96%) ⬇️
src/sql/src/parsers/copy_parser.rs 85.33% <85.33%> (ø)
src/sql/src/statements/copy.rs 95.65% <95.65%> (ø)
... and 33 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@fengjiachun fengjiachun force-pushed the feat/copy-table branch 3 times, most recently from ed893da to ab8258e Compare February 15, 2023 07:23
@fengjiachun fengjiachun changed the title feat: export the data from a table to a parquet file feat: export the data from a table to parquet files Feb 15, 2023
@fengjiachun fengjiachun added this to the v0.1 milestone Feb 15, 2023
@sunng87 sunng87 added docs-required This change requires docs update. cloud followup required labels Feb 15, 2023
Comment thread src/datanode/src/sql/copy_table.rs
Comment thread src/datanode/src/sql/copy_table.rs Outdated
Comment thread src/sql/src/parsers/copy_parser.rs
@fengjiachun
Copy link
Copy Markdown
Collaborator Author

PTAL @MichaelScofield

Copy link
Copy Markdown
Collaborator

@MichaelScofield MichaelScofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're conflicts

Comment thread src/sql/src/error.rs Outdated
@killme2008
Copy link
Copy Markdown
Member

@fengjiachun Maybe we can impl COPY tbl From file too. Just like duckdb https://duckdb.org/docs/sql/statements/copy.html

@fengjiachun
Copy link
Copy Markdown
Collaborator Author

@fengjiachun Maybe we can impl COPY tbl From file too. Just like duckdb https://duckdb.org/docs/sql/statements/copy.html

Good suggestion, I can implement it in a new PR

@fengjiachun
Copy link
Copy Markdown
Collaborator Author

PTAL @MichaelScofield @killme2008

@evenyag
Copy link
Copy Markdown
Contributor

evenyag commented Feb 20, 2023

one file per million rows

This might be too small. 10 ~ 50 million should be better choices.

@fengjiachun
Copy link
Copy Markdown
Collaborator Author

one file per million rows

This might be too small. 10 ~ 50 million should be better choices.

I was worried about the high memory usage, I adjusted the default value to 5M, and in the future I think we should add an Option to configure it.

Comment thread src/datanode/src/sql/copy_table.rs Outdated
Comment thread src/table/src/requests.rs
Comment thread src/datanode/src/sql/copy_table.rs
Comment thread src/datanode/src/sql/copy_table.rs Outdated
Comment thread src/datanode/src/sql/copy_table.rs Outdated
@v0y4g3r
Copy link
Copy Markdown
Contributor

v0y4g3r commented Feb 20, 2023

Maybe it's time for us to find a way to support stream writes of SST files. Let me create an issue for this.

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Comment thread src/datanode/src/error.rs Outdated
fengjiachun and others added 2 commits February 20, 2023 11:50
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: fys <40801205+Fengys123@users.noreply.github.com>
Comment thread src/datanode/src/sql/copy_table.rs
Copy link
Copy Markdown
Contributor

@v0y4g3r v0y4g3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@v0y4g3r v0y4g3r merged commit 9161796 into GreptimeTeam:develop Feb 20, 2023
@fengjiachun fengjiachun deleted the feat/copy-table branch February 20, 2023 08:45
@killme2008 killme2008 mentioned this pull request May 8, 2023
paomian pushed a commit to paomian/greptimedb that referenced this pull request Oct 19, 2023
* feat: copy table parser

* feat: coopy table

* chore: minor fix

* chore: give stmt a more clearer name

* chore: unified naming

* chore: minor change

* chore: add a todo

* chore: end up with an empty file when occur an empty table

* feat: format with copy table

* feat: with options

* chore: by cr

* chore: default 5M rows per segment

* Update src/datanode/src/sql/copy_table.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* Update src/datanode/src/sql/copy_table.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* Update src/datanode/src/error.rs

Co-authored-by: fys <40801205+Fengys123@users.noreply.github.com>

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: fys <40801205+Fengys123@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required This change requires docs update.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants