Hi all! Clueless question that I don't know how to...
# getting-started
s
Hi all! Clueless question that I don't know how to search for: Can I run a post-transform extract-load sequence? Specifically, I have a
run tap-mysql target-duckdb dbt-duckdb:build
pipeline setup, but I'd like to then create another
.duckdb
file with a small subset of the tables (notably excluding most of those created in the original
target-duckdb
loader action). Is this possible, or do I need to define another pipeline
Follow-up: for now, I added a separate
tap-duckdb duckdb-subsetter
pipeline run, where
duckdb-subsetter
is set to
inherit_from: target-duckdb
. This seems workable but gets stuck very early on the following error:
Copy code
2024-04-29T05:06:28.309258Z [info     ] time=2024-04-29 05:06:28 name=target_duckdb level=CRITICAL message=Primary key is set to mandatory but not defined in the [main-final__all_achievements] stream cmd_type=elb consumer=True job_name=dev:tap-duckdb-to-duckdb-subsetter name=duckdb-subsetter producer=False run_id=4e28d909-bd6c-45a7-8ce3-2456a43bf1fb stdio=stderr string_id=duckdb-subsetter
The issue is that the table that's being replicated has no primary key requirement (though perhaps I'm misunderstanding the error message and it is in fact the database settings that require the primary key?):
Copy code
-- adk_wrapped.main.final__all_achievements definition

CREATE TABLE final__all_achievements(clovek_id BIGINT,
school_year BIGINT,
achievement_id VARCHAR,
achievement_name VARCHAR,
achievement_description VARCHAR,
achievement_data JSON,
achievement_type VARCHAR,
achievement_priority INTEGER,
achievement_image VARCHAR);
I suppose to test that earlier hypothesis, I can try to define a primary key on
final__all_achievements
via a dbt constraint, but that seems like a pretty roundabout way of going at it. Any thoughts? Searching for the error in the Slack turns up nothing.
e
If it doesn't make much sense for your transformed table to have a primary key, you can set
primary_key_required: false
on the inherited target. That said, I would try a different approach that doesn't require re-exporting a subset of the tables and instead use ATTACH to materialize those tables in a different database. Maybe use
path
for the desired subset of tables, and
attach
for the database generated by the EL pipeline.
s
Thank you so much!
primary_key_required: false
takes me to a different error (target-duckdb tries to create a table with no columns), so I'll investigate the ATTACH option first - thanks for pointing it out!
👍 1