Hi all Clueless question that I don t know how to search for Meltano #getting-started

Hi all! Clueless question that I don't know how to...

simon_podhajsky

04/29/2024, 4:35 AM

Hi all! Clueless question that I don't know how to search for: Can I run a post-transform extract-load sequence? Specifically, I have a

run tap-mysql target-duckdb dbt-duckdb:build

pipeline setup, but I'd like to then create another

.duckdb

file with a small subset of the tables (notably excluding most of those created in the original

target-duckdb

loader action). Is this possible, or do I need to define another pipeline

simon_podhajsky

04/29/2024, 5:17 AM

Follow-up: for now, I added a separate

tap-duckdb duckdb-subsetter

pipeline run, where

duckdb-subsetter

is set to

inherit_from: target-duckdb

. This seems workable but gets stuck very early on the following error:

Copy code

2024-04-29T05:06:28.309258Z [info     ] time=2024-04-29 05:06:28 name=target_duckdb level=CRITICAL message=Primary key is set to mandatory but not defined in the [main-final__all_achievements] stream cmd_type=elb consumer=True job_name=dev:tap-duckdb-to-duckdb-subsetter name=duckdb-subsetter producer=False run_id=4e28d909-bd6c-45a7-8ce3-2456a43bf1fb stdio=stderr string_id=duckdb-subsetter

The issue is that the table that's being replicated has no primary key requirement (though perhaps I'm misunderstanding the error message and it is in fact the database settings that require the primary key?):

Copy code

-- adk_wrapped.main.final__all_achievements definition

CREATE TABLE final__all_achievements(clovek_id BIGINT,
school_year BIGINT,
achievement_id VARCHAR,
achievement_name VARCHAR,
achievement_description VARCHAR,
achievement_data JSON,
achievement_type VARCHAR,
achievement_priority INTEGER,
achievement_image VARCHAR);

I suppose to test that earlier hypothesis, I can try to define a primary key on

final__all_achievements

via a dbt constraint, but that seems like a pretty roundabout way of going at it. Any thoughts? Searching for the error in the Slack turns up nothing.

Edgar Ramírez (Arch.dev)

04/29/2024, 9:10 PM

If it doesn't make much sense for your transformed table to have a primary key, you can set

primary_key_required: false

on the inherited target. That said, I would try a different approach that doesn't require re-exporting a subset of the tables and instead use ATTACH to materialize those tables in a different database. Maybe use

path

for the desired subset of tables, and

attach

for the database generated by the EL pipeline.

simon_podhajsky

04/30/2024, 5:17 AM

Thank you so much!

primary_key_required: false

takes me to a different error (target-duckdb tries to create a table with no columns), so I'll investigate the ATTACH option first - thanks for pointing it out!

👍 1

9 Views

Open in Slack

Previous Next