Skip to content

Latest commit

 

History

History
1196 lines (850 loc) · 41 KB

File metadata and controls

1196 lines (850 loc) · 41 KB

Functions

By default, functions are installed into the public schema. You can choose an alternate location by running CREATE EXTENSION pg_duckdb WITH SCHEMA your_schema_name.

Note: ALTER EXTENSION is not currently supported for moving the extension to a different schema.

Data Lake Functions

Name Description
read_parquet Read a Parquet file
read_csv Read a CSV file
read_json Read a JSON file
iceberg_scan Read an Iceberg dataset
iceberg_metadata Read Iceberg metadata
iceberg_snapshots Read Iceberg snapshot information
delta_scan Read a Delta dataset

JSON Functions

All of the DuckDB json functions and aggregates. Postgres JSON/JSONB functions are not supported.

Union Type Functions

Name Description
union_extract Extracts a value from a union type by tag name.
union_tag Gets the tag name of the active member in a union type.

MAP Functions

All of the DuckDB map functions.

Name Description
cardinality Return the size of the map
element_at Return the value for a given key as a list
map_concat Merge multiple maps
map_contains Check if a map contains a given key
map_contains_entry Check if a map contains a given key-value pair
map_contains_value Check if a map contains a given value
map_entries Return a list of struct(k, v) for each key-value pair
map_extract Extract a value from a map using a key
map_extract_value Return the value for a given key or NULL
map_from_entries Create a map from an array of struct(k, v)
map_keys Get all keys from a map as a list
map_values Get all values from a map as a list

Aggregates

Name Description
approx_count_distinct Approximates the count of distinct elements using HyperLogLog.

Sampling Functions

Name Description
TABLESAMPLE Samples a subset of rows from a table or query result.

Time Functions

Name Description
time_bucket Buckets timestamps into time intervals for time-series analysis.
strftime Formats timestamps as strings using format codes.
strptime Parses strings into timestamps using format codes.
epoch Converts timestamps to Unix epoch seconds.
epoch_ms Converts timestamps to Unix epoch milliseconds.
epoch_us Converts timestamps to Unix epoch microseconds.
epoch_ns Converts timestamps to Unix epoch nanoseconds.
make_timestamp Creates a timestamp from microseconds since epoch.
make_timestamptz Creates a timestamp with timezone from microseconds since epoch.

DuckDB Administration Functions

Name Description
duckdb.install_extension Installs a DuckDB extension.
duckdb.load_extension Loads a DuckDB extension for the current session.
duckdb.autoload_extension Configures whether an extension should be auto-loaded.
duckdb.query Runs a SELECT query directly against DuckDB.
duckdb.raw_query Runs any query directly against DuckDB (for debugging).
duckdb.recycle_ddb Resets the DuckDB instance in the current connection (for debugging).

Secrets Management Functions

Name Description
duckdb.create_simple_secret Creates a simple secret for cloud storage access.
duckdb.create_azure_secret Creates an Azure secret using a connection string.

Motherduck Functions

Name Description
duckdb.enable_motherduck Enables MotherDuck integration with a token.
duckdb.is_motherduck_enabled Checks if MotherDuck integration is enabled.
duckdb.force_motherduck_sync Forces a full resync of MotherDuck databases and schemas to Postgres (for debugging).

Detailed Descriptions

read_parquet(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a parquet file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_parquet('file.parquet');
SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_parquet('file.parquet');

Further information:

Required Arguments
Name Type Description
path text or text[] The path, either to a remote httpfs file or a local file (if enabled), of the parquet file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_parquet function. To specify optional parameters, use parameter := 'value'.

read_csv(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a CSV file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_csv('file.csv');
SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_csv('file.csv');

Further information:

Required Arguments
Name Type Description
path text or text[] The path, either to a remote httpfs file or a local file (if enabled), of the CSV file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_csv function. To specify optional parameters, use parameter := 'value'.

Compatibility notes:

  • columns is not currently supported.
  • nullstr must be an array (TEXT[]).

read_json(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a JSON file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_json('file.json');
SELECT r['id'], r['name'] FROM read_json('file.json') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_json('file.json');

Further information:

Required Arguments
Name Type Description
path text or text[] The path, either to a remote httpfs file or a local file (if enabled), of the JSON file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_json function. To specify optional parameters, use parameter := 'value'.

Compatibility notes:

  • columns is not currently supported.

iceberg_scan(path TEXT, ...) -> SETOF duckdb.row

Reads an Iceberg table, either from a remote location (via httpfs) or a local directory.

To use iceberg_scan, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM iceberg_scan('data/iceberg/table');
SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');

Further information:

Required Arguments
Name Type Description
path text The path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_scan function based on the DuckDB source code. However, documentation on these parameters is limited. To specify optional parameters, use parameter := 'value'.

Name Type Default Description
allowed_moved_paths boolean false Ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved.
mode text ''
metadata_compression_codec text 'none'
skip_schema_inference boolean false
version text 'version-hint.text'
version_name_format text 'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_metadata(path TEXT, ...) -> SETOF iceberg_metadata_record

To use iceberg_metadata, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

Return metadata about an iceberg table. Data is returned as a set of icerberg_metadata_record, which is defined as:

CREATE TYPE duckdb.iceberg_metadata_record AS (
  manifest_path TEXT,
  manifest_sequence_number NUMERIC,
  manifest_content TEXT,
  status TEXT,
  content TEXT,
  file_path TEXT
);

Further information:

Required Arguments
Name Type Description
path text The path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_metadata function based on the DuckDB source code. However, documentation on these parameters is limited. To specify optional parameters, use parameter := 'value'.

Name Type Default Description
allowed_moved_paths boolean false Ensures that some path resolution is performed, which allows scanning Iceberg tables that are moved.
metadata_compression_codec text 'none'
skip_schema_inference boolean false
version text 'version-hint.text'
version_name_format text 'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_snapshots(path TEXT, ...) -> SETOF iceberg_snapshot_record

Reads Iceberg snapshot information from an Iceberg table.

To use iceberg_snapshots, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

This function returns snapshot metadata for an Iceberg table, which can be useful for time travel queries and understanding table history.

SELECT * FROM iceberg_snapshots('data/iceberg/table');

Further information:

Required Arguments
Name Type Description
path text The path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_snapshots function. To specify optional parameters, use parameter := 'value'.

Name Type Default Description
metadata_compression_codec text 'none'
skip_schema_inference boolean false
version text 'version-hint.text'
version_name_format text 'v%s%s.metadata.json,%s%s.metadata.json'

delta_scan(path TEXT) -> SETOF duckdb.row

Reads a delta dataset, either from a remote (via httpfs) or a local location.

To use delta_scan, you must enable the delta extension:

SELECT duckdb.install_extension('delta');

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM delta_scan('/path/to/delta/dataset');
SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');

Further information:

Required Arguments
Name Type Description
path text The path, either to a remote httpfs location or a local location (if enabled) of the delta dataset to read.

duckdb.install_extension(extension_name TEXT, repository TEXT DEFAULT 'core') -> bool

Installs a DuckDB extension and configures it to be loaded automatically in every session that uses pg_duckdb.

SELECT duckdb.install_extension('iceberg');
SELECT duckdb.install_extension('avro', 'community');
Security

Since this function can be used to install and download any extensions it can only be executed by a superuser by default. To allow execution by some other admin user, such as my_admin, you can grant such a user the following permissions:

GRANT ALL ON FUNCTION duckdb.install_extension(TEXT, TEXT) TO my_admin;
Required Arguments
Name Type Description
extension_name text The name of the extension to install

duckdb.load_extension(extension_name TEXT) -> void

Loads a DuckDB extension for the current session only. Unlike install_extension, this doesn't configure the extension to be loaded automatically in future sessions.

SELECT duckdb.load_extension('iceberg');
Required Arguments
Name Type Description
extension_name text The name of the extension to load

duckdb.autoload_extension(extension_name TEXT, autoload BOOLEAN) -> void

Configures whether an installed extension should be automatically loaded in new sessions.

-- Disable auto-loading for an extension
SELECT duckdb.autoload_extension('iceberg', false);

-- Enable auto-loading for an extension
SELECT duckdb.autoload_extension('iceberg', true);
Required Arguments
Name Type Description
extension_name text The name of the extension to configure
autoload boolean Whether the extension should be auto-loaded

duckdb.query(query TEXT) -> SETOF duckdb.row

Executes the given SELECT query directly against DuckDB. This can be useful if DuckDB syntax makes the query easier to write or if you want to use a function that is not exposed by pg_duckdb yet. If you use it because of a missing function in pg_duckdb, please also open an issue on the GitHub repository so that we can add support. For example the below query shows a query that puts FROM before SELECT and uses a list comprehension. Both of those features are not supported in Postgres.

SELECT * FROM duckdb.query('FROM range(10) as a(a) SELECT [a for i in generate_series(0, a)] as arr');

duckdb.raw_query(query TEXT) -> void

Runs an arbitrary query directly against DuckDB. Compared to duckdb.query, this function can execute any query, not just SELECT queries. The main downside is that it doesn't return its result as rows, but instead sends the query result to the logs. So the recommendation is to use duckdb.query when possible, but if you need to run e.g. some DDL you can use this function.

duckdb.recycle_ddb() -> void

pg_duckdb keeps the DuckDB instance open inbetween transactions. This is done to save session level state, such as manually done SET commands. If you want to clear this session level state for some reason you can close the currently open DuckDB instance using:

CALL duckdb.recycle_ddb();

duckdb.enable_motherduck(token TEXT, database_name TEXT) -> void

Enables MotherDuck integration with the provided authentication token.

-- Enable MotherDuck with default database
SELECT duckdb.enable_motherduck('your_token_here');

-- Enable MotherDuck with specific database
SELECT duckdb.enable_motherduck('your_token_here', 'my_database');
Required Arguments
Name Type Description
token text Your MotherDuck authentication token
Optional Arguments
Name Type Description
database_name text Specific MotherDuck database to connect to

duckdb.is_motherduck_enabled() -> boolean

Checks whether MotherDuck integration is currently enabled for this session.

SELECT duckdb.is_motherduck_enabled();

duckdb.create_simple_secret(type TEXT, key_id TEXT, secret TEXT, region TEXT, ...) -> void

Creates a simple secret for accessing cloud storage services like S3, GCS, or R2.

-- Create an S3 secret
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1'
);

-- Create an S3 secret with session token
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1',
    session_token := 'your_session_token'
);
Required Arguments
Name Type Description
type text The type of secret ('S3', 'GCS', 'R2', etc.)
key_id text The access key ID or equivalent
secret text The secret key or equivalent
region text The region for the service
Optional Arguments
Name Type Description
session_token text Session token for temporary credentials
endpoint text Custom endpoint URL
url_style text URL style ('vhost' or 'path')
use_ssl text Whether to use SSL ('true' or 'false')
scope text Scope for the secret (default: '')

duckdb.create_azure_secret(connection_string TEXT, scope TEXT DEFAULT '') -> TEXT

Creates an Azure secret using an Azure Blob Storage connection string.

-- Create an Azure secret
SELECT duckdb.create_azure_secret(
    'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net'
);

-- Create an Azure secret with specific scope
SELECT duckdb.create_azure_secret(
    'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net',
    'my_scope'
);
Required Arguments
Name Type Description
connection_string text The Azure Blob Storage connection string
Optional Arguments
Name Type Description
scope text Scope for the secret (default: '')

duckdb.force_motherduck_sync(drop_with_cascade BOOLEAN DEFAULT false)

Warning: There are known issues with this function. To re-trigger a sync, it is recommended to use the following command instead:

SELECT * FROM pg_terminate_backend((
  SELECT pid FROM pg_stat_activity WHERE backend_type = 'pg_duckdb sync worker'
));

pg_duckdb will normally automatically synchronize your MotherDuck tables with Postgres using a Postgres background worker. Sometimes this synchronization fails. This can happen for various reasons, but often this is due to permission issues or users having created dependencies on MotherDuck tables that need to be updated. In those cases this function can be helpful for a few reasons:

  1. To show the ERRORs that happen during syncing
  2. To retrigger a sync after fixing the issue
  3. To drop the MotherDuck tables with CASCADE to drop all objects that depend on it.

For the first two usages you can simply call this procedure like follows:

CALL duckdb.force_motherduck_sync();

But for the third usage you need to run pass it the drop_with_cascade parameter:

CALL duckdb.force_motherduck_sync(drop_with_cascade := true);

NOTE: Dropping with cascade will drop all objects that depend on the MotherDuck tables. This includes all views, functions, and tables that depend on the MotherDuck tables. This can be a destructive operation, so use with caution.

time_bucket(bucket_width INTERVAL, timestamp_col TIMESTAMP, origin TIMESTAMP) -> TIMESTAMP

Buckets timestamps into time intervals for time-series analysis. This function is compatible with TimescaleDB's time_bucket function, allowing for easier migration and interoperability.

-- Group events by hour
SELECT time_bucket(INTERVAL '1 hour', created_at) as hour_bucket, COUNT(*)
FROM events
GROUP BY hour_bucket
ORDER BY hour_bucket;

-- Group by 15-minute intervals
SELECT time_bucket(INTERVAL '15 minutes', timestamp_col), AVG(value)
FROM sensor_data
WHERE timestamp_col >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Further information:

Required Arguments
Name Type Description
bucket_width interval The interval size for bucketing (e.g., '1 hour', '15 minutes')
timestamp_col timestamp The timestamp column to bucket
Optional Arguments
Name Type Description
origin timestamp The origin point for bucketing. Buckets are aligned to this timestamp.

Note: The time_bucket function also supports timezone and time offset parameters for more advanced time bucketing scenarios.

strftime(timestamp_expr, format_string) -> TEXT

Formats timestamps as strings using standard format codes. This function provides flexible timestamp formatting for display and export purposes.

-- Format current timestamp
SELECT strftime(NOW(), '%Y-%m-%d %H:%M:%S') AS formatted_time;

-- Format timestamps in different formats
SELECT
    order_id,
    strftime(created_at, '%Y-%m-%d') AS order_date,
    strftime(created_at, '%H:%M') AS order_time,
    strftime(created_at, '%A, %B %d, %Y') AS readable_date
FROM orders;

-- Use for partitioning file exports
COPY (SELECT * FROM events WHERE event_date = '2024-01-01')
TO 's3://bucket/events/' || strftime('2024-01-01'::timestamp, '%Y/%m/%d') || '/events.parquet';

Common format codes:

  • %Y - 4-digit year (2024)
  • %m - Month as number (01-12)
  • %d - Day of month (01-31)
  • %H - Hour (00-23)
  • %M - Minute (00-59)
  • %S - Second (00-59)
  • %A - Full weekday name (Monday)
  • %B - Full month name (January)

Further information:

Required Arguments
Name Type Description
timestamp_expr timestamp The timestamp value to format
format_string text The format string with format codes

strptime(string_expr, format_string) -> TIMESTAMP

Parses strings into timestamps using format codes. This is the inverse of strftime and is useful for parsing timestamps from various string formats.

-- Parse date strings
SELECT strptime('2024-01-15 14:30:00', '%Y-%m-%d %H:%M:%S') AS parsed_timestamp;

-- Parse different formats
SELECT
    strptime('Jan 15, 2024', '%b %d, %Y') AS date1,
    strptime('15/01/2024', '%d/%m/%Y') AS date2,
    strptime('2024-01-15T14:30:00Z', '%Y-%m-%dT%H:%M:%SZ') AS iso_date;

-- Parse log timestamps
SELECT
    log_id,
    strptime(timestamp_string, '%Y-%m-%d %H:%M:%S') AS parsed_time,
    message
FROM raw_logs;

Further information:

Required Arguments
Name Type Description
string_expr text The string to parse as a timestamp
format_string text The format string describing the input format

epoch(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch seconds (seconds since 1970-01-01 00:00:00 UTC).

-- Get current epoch time
SELECT epoch(NOW()) AS current_epoch;

-- Convert timestamps for API usage
SELECT
    event_id,
    epoch(event_timestamp) AS epoch_seconds
FROM events;

-- Filter using epoch time
SELECT * FROM events
WHERE epoch(created_at) > 1640995200; -- After 2022-01-01
Required Arguments
Name Type Description
timestamp_expr timestamp The timestamp to convert to epoch seconds

map_extract(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Extracts a value from a map using the specified key. If the key doesn't exist, returns an empty array.

-- Extract value from a map
SELECT map_extract(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {1}

-- Extract non-existent key
SELECT map_extract(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {}
Required Arguments
Name Type Description
map_col duckdb.map The map to extract from
key duckdb.unresolved_type The key to look up in the map

map_keys(map_col duckdb.map) -> duckdb.unresolved_type

Returns all keys from a map as an array.

-- Get all keys from a map
SELECT map_keys(r['map_col']) as keys 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: {a,b,c}

-- Empty map
SELECT map_keys(r['map_col']) as keys 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: {}
Required Arguments
Name Type Description
map_col duckdb.map The map to extract keys from

map_values(map_col duckdb.map) -> duckdb.unresolved_type

Returns all values from a map as an array.

-- Get all values from a map
SELECT map_values(r['map_col']) as values 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: {1,2,3}

-- Empty map
SELECT map_values(r['map_col']) as values 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: {}
Required Arguments
Name Type Description
map_col duckdb.map The map to extract values from

cardinality(map_col duckdb.map) -> numeric

Returns the size of the map (number of key-value pairs).

-- Get the number of entries in a map
SELECT cardinality(r['map_col']) as size 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: 3

-- Empty map
SELECT cardinality(r['map_col']) as size 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: 0
Required Arguments
Name Type Description
map_col duckdb.map The map to get the size of

element_at(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Returns the value for a given key as an array.

-- Get value for a specific key
SELECT element_at(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {1}

-- Non-existent key
SELECT element_at(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {}
Required Arguments
Name Type Description
map_col duckdb.map The map to extract from
key duckdb.unresolved_type The key to look up in the map

map_concat(map_col duckdb.map, map_col2 duckdb.map) -> duckdb.map

Merges multiple maps. On key collision, the value is taken from the last map.

-- Merge two maps
SELECT map_concat(r1['map1'], r2['map2']) as merged 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map1 $$) r1, 
     duckdb.query($$ SELECT MAP(['b', 'c'], [3, 4]) as map2 $$) r2;
-- Returns: {a=1, b=3, c=4}

-- Note: 'b' value from map2 (3) overwrites map1's value (2)
Required Arguments
Name Type Description
map_col duckdb.map The first map
map_col2 duckdb.map The second map to merge

map_contains(map_col duckdb.map, key duckdb.unresolved_type) -> boolean

Checks if a map contains a given key.

-- Check if key exists
SELECT map_contains(r['map_col'], 'a') as has_key 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check for non-existent key
SELECT map_contains(r['map_col'], 'c') as has_key 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
Name Type Description
map_col duckdb.map The map to check
key duckdb.unresolved_type The key to search for

map_contains_entry(map_col duckdb.map, key duckdb.unresolved_type, value duckdb.unresolved_type) -> boolean

Checks if a map contains a given key-value pair.

-- Check if key-value pair exists
SELECT map_contains_entry(r['map_col'], 'a', 1) as has_entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check with wrong value for existing key
SELECT map_contains_entry(r['map_col'], 'a', 2) as has_entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
Name Type Description
map_col duckdb.map The map to check
key duckdb.unresolved_type The key to search for
value duckdb.unresolved_type The value to match with the key

map_contains_value(map_col duckdb.map, value duckdb.unresolved_type) -> boolean

Checks if a map contains a given value.

-- Check if value exists
SELECT map_contains_value(r['map_col'], 1) as has_value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check for non-existent value
SELECT map_contains_value(r['map_col'], 3) as has_value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
Name Type Description
map_col duckdb.map The map to check
value duckdb.unresolved_type The value to search for

map_entries(map_col duckdb.map) -> duckdb.struct[]

Returns an array of struct(key, value) for each key-value pair in the map.

-- Get all key-value pairs as structs
SELECT map_entries(r['map_col']) as entries 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {"(a,1)","(b,2)"}

-- Access individual struct fields
SELECT unnest(map_entries(r['map_col'])) as entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
Required Arguments
Name Type Description
map_col duckdb.map The map to extract entries from

map_extract_value(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Returns the value for a given key or NULL if the key is not contained in the map.

-- Extract single value (not as array)
SELECT map_extract_value(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: 1

-- Non-existent key returns NULL
SELECT map_extract_value(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: NULL
Required Arguments
Name Type Description
map_col duckdb.map The map to extract from
key duckdb.unresolved_type The key to look up in the map

map_from_entries(entries duckdb.struct[]) -> duckdb.map

Creates a map from an array of struct(k, v).

-- Create map from array of structs
SELECT map_from_entries(r['entries']) as new_map 
FROM duckdb.query($$ 
    SELECT [{'k': 'a', 'v': 1}, {'k': 'b', 'v': 2}] as entries 
$$) r;
-- Returns: {a=1, b=2}

-- This is the inverse operation of map_entries
SELECT map_from_entries(map_entries(r['map_col'])) as reconstructed 
FROM duckdb.query($$ SELECT MAP(['x', 'y'], [10, 20]) as map_col $$) r;
-- Returns: {x=10, y=20}
Required Arguments
Name Type Description
entries duckdb.struct[] Array of structs with 'k' (key) and 'v' (value) fields

epoch_ms(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch milliseconds.

-- High-precision timestamp for JavaScript
SELECT epoch_ms(NOW()) AS timestamp_ms;

-- For time-series data
SELECT
    sensor_id,
    epoch_ms(reading_time) AS timestamp_ms,
    value
FROM sensor_readings;
Required Arguments
Name Type Description
timestamp_expr timestamp The timestamp to convert to epoch milliseconds

epoch_ms(milliseconds) -> TIMESTAMP

Converts Unix epoch milliseconds to a timestamp. This is the inverse of the above function.

-- Convert epoch milliseconds to timestamp
SELECT epoch_ms(1640995200000) AS timestamp_from_ms; -- 2022-01-01 00:00:00

-- Convert stored milliseconds back to timestamps
SELECT
    event_id,
    epoch_ms(timestamp_ms) AS event_time
FROM events;
Required Arguments
Name Type Description
milliseconds bigint Milliseconds since Unix epoch

epoch_us(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch microseconds.

-- Microsecond precision timestamps
SELECT epoch_us(NOW()) AS timestamp_us;
Required Arguments
Name Type Description
timestamp_expr timestamp The timestamp to convert to epoch microseconds

epoch_ns(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch nanoseconds.

-- Nanosecond precision timestamps
SELECT epoch_ns(NOW()) AS timestamp_ns;
Required Arguments
Name Type Description
timestamp_expr timestamp The timestamp to convert to epoch nanoseconds

make_timestamp(microseconds) -> TIMESTAMP

Creates a timestamp from microseconds since Unix epoch (1970-01-01 00:00:00 UTC).

-- Create timestamp from current epoch microseconds
SELECT make_timestamp(epoch_us(NOW())) AS reconstructed_timestamp;

-- Create specific timestamps
SELECT make_timestamp(1640995200000000) AS new_years_2022; -- 2022-01-01 00:00:00
Required Arguments
Name Type Description
microseconds bigint Microseconds since Unix epoch

make_timestamptz(microseconds) -> TIMESTAMPTZ

Creates a timestamp with timezone from microseconds since Unix epoch.

-- Create timestamptz from current epoch microseconds
SELECT make_timestamptz(epoch_us(NOW())) AS reconstructed_timestamptz;

-- Create specific timestamptz
SELECT make_timestamptz(1640995200000000) AS new_years_2022_tz;
Required Arguments
Name Type Description
microseconds bigint Microseconds since Unix epoch

TABLESAMPLE (sampling_method(percentage | rows))

Samples a subset of rows from a table or query result. This is useful for analyzing large datasets by working with representative samples, improving query performance for exploratory data analysis.

-- Sample 10% of rows from a table
SELECT * FROM large_table TABLESAMPLE SYSTEM(10);

-- Sample approximately 1000 rows
SELECT * FROM events TABLESAMPLE SYSTEM(1000 ROWS);

-- Sample from data lake files
SELECT * FROM read_parquet('s3://datalake/**/*.parquet') TABLESAMPLE SYSTEM(5);

-- Use sampling for quick data profiling
SELECT
    region,
    COUNT(*) as sample_count,
    AVG(revenue) as avg_revenue
FROM sales_data TABLESAMPLE SYSTEM(2)
GROUP BY region;

-- Sample from joins for performance
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
JOIN orders o TABLESAMPLE SYSTEM(10) ON c.id = o.customer_id
GROUP BY c.name;

Sampling Methods:

  • SYSTEM: Random sampling at the storage level (faster, approximate percentage)
  • BERNOULLI: Row-by-row random sampling (slower, exact percentage)
-- System sampling (recommended for large tables)
SELECT * FROM huge_table TABLESAMPLE SYSTEM(1);

-- Bernoulli sampling (exact percentage)
SELECT * FROM medium_table TABLESAMPLE BERNOULLI(5);

Use Cases:

  • Data exploration: Quick analysis of large datasets
  • Performance testing: Test queries on sample data
  • Data profiling: Understand data distribution patterns
  • ETL development: Develop pipelines on sample data
  • Quality checks: Validate data quality on samples

Further information:

Required Arguments
Name Type Description
sampling_method keyword Either SYSTEM or BERNOULLI
percentage numeric Percentage of rows to sample (0-100)
Optional Arguments
Name Type Description
rows integer Approximate number of rows to sample (use with ROWS keyword)

union_extract(union_col, tag) -> duckdb.unresolved_type

Extracts a value from a union type by specifying the tag name of the member you want to access.

-- Extract the string value if the union contains a string
SELECT union_extract(my_union_column, 'string') FROM my_table;

-- Extract integer value from union
SELECT union_extract(data_field, 'integer') AS extracted_int FROM mixed_data;
Required Arguments
Name Type Description
union_col duckdb.union or duckdb.unresolved_type The union column to extract from
tag text The tag name of the union member to extract

union_tag(union_col) -> duckdb.unresolved_type

Returns the tag name of the currently active member in a union type.

-- Get the active tag for each row
SELECT union_tag(my_union_column) AS active_type FROM my_table;

-- Filter rows based on union tag
SELECT * FROM my_table WHERE union_tag(data_field) = 'string';
Required Arguments
Name Type Description
union_col duckdb.union or duckdb.unresolved_type The union column to get the tag from

approx_count_distinct(expression) -> BIGINT

Approximates the count of distinct elements using the HyperLogLog algorithm. This is much faster than COUNT(DISTINCT ...) for large datasets, with a small error rate.

-- Approximate distinct count of customer IDs
SELECT approx_count_distinct(customer_id) FROM orders;

-- Compare with exact count
SELECT
    approx_count_distinct(customer_id) AS approx_distinct,
    COUNT(DISTINCT customer_id) AS exact_distinct
FROM orders;
Required Arguments
Name Type Description
expression any The expression to count distinct values for