Skip to content

[Bug] hash_collissions in dbt snapshot #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
espenhoh opened this issue Sep 3, 2024 · 3 comments
Open
1 task done

[Bug] hash_collissions in dbt snapshot #154

espenhoh opened this issue Sep 3, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@espenhoh
Copy link

espenhoh commented Sep 3, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

snapshot is implemented with the ora_hash() function.
This gives a lot of collissions in a large data set and duplicate ids, causing errors in the merge statement for data volumes of millions of rows.
dbt snapshot will fail once a hash collission appears, and the snapshot tables becomes impossible to update.

Expected Behavior

No collissions,

Steps To Reproduce

run dbt snapshot command on data with a few million updates.
First dbt snapshot command succeeds as there are only inserts, but duplicates in dbt_scd_id exists causing trouble on subsequest runs.

Relevant log output using --debug flag enabled

06:08:52    Database Error in snapshot table_name (snapshots\table_name.sql)
  ORA-30926: unable to get a stable set of rows in the source tables
  Help: https://docs.oracle.com/error-help/db/ora-30926/
  compiled Code at target\run\dbt_project\snapshots\table_name.sql

Environment

- OS: Windows
- Python: 3.11
- dbt: 1.8.1

What Oracle database version are you using dbt with?

19c

Additional Context

No response

@espenhoh espenhoh added the bug Something isn't working label Sep 3, 2024
@aosingh aosingh self-assigned this Sep 3, 2024
@aosingh
Copy link
Member

aosingh commented Sep 4, 2024

Hi @espenhoh

I understand the issue because we have seen these in the past.

#52
#102

And yes, we decided not to update ORA_HASH function because it would break existing snapshots, Users who are starting fresh snapshots should include the following macro in their dbt project. This uses a much stronger hash function - SHA256

{% macro oracle__snapshot_hash_arguments(args) -%}
    STANDARD_HASH({%- for arg in args -%}
        coalesce(cast({{ arg }} as varchar(4000) ), '')
        {% if not loop.last %} || '|' || {% endif %}
    {%- endfor -%}, 'SHA256')
{%- endmacro %}

For existing snapshots, to migrate, there is an option.

dbt snapshot uses MERGE INTO statement.

MERGE INTO target t
USING staging s
ON (s.dbt_scd_id = d.dbt_scd_id)

So, before using the new hash function you need to UPDATE the column dbt_scd_id in the snapshot table using the STANDARD_HASH function.

UPDATE <SNAPSHOT_TABLE> 
SET dbt_scd_id = STANDARD_HASH(<args>, 'SHA256')

@aosingh
Copy link
Member

aosingh commented Mar 31, 2025

@espenhoh

dbt-oracle==1.9.1rc1 release candidate is out where we have changed the hash algorithm to SHA256

@itgkarl
Copy link

itgkarl commented Apr 23, 2025

For anybody who is trying to migrate existing dbt-oracle snapshots with the old ora_hash() to be 1.9.1 compatible:

With dbt-oracle 1.9.1 in place run the following operation on your snapshot table with PROMOTION_COSTS_SNAPSHOT being your snapshot u want to migrate and promo_id being your unique Identifier of the snapshot.

dbt --debug run-operation update_legacy_dbt_scd_id \
    --args '{snapshot_table: PROMOTION_COSTS_SNAPSHOT, cols: ["promo_id", "dbt_updated_at"]}'

This way is found in the code comments here:

1. Create a macro oracle__snapshot_hash_arguments(args) in your dbt project

The Oracle error indicating that you have to migrate is an ORA-01790 like:
oracle adapter: Oracle error: ORA-01790: expression must have same datatype as corresponding expression
With the new Hash function the dbt_scd_id of the snapshot had to change from NUMBER to RAW(32) causing the datatype collision.

Hope that helps anybody with the same Problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants