Are primary keys automatically migrated along with...
# getting-started
j
Are primary keys automatically migrated along with the rest of the data when creating a table for the first time in the target?
a
Hi, @josh_lloyd. This depends on the target you are using. In general, yes, many/most of the popular database targets enforce the primary key in one fashion or another. Which target are you using?
j
snowflake
Maybe this is turning into a bit more troubleshooting then, because I’m using a custom redshift tap, but when I have the primary_key_required setting set to True my pipeline just fails. I’ve even tried to list a replication-key manually in the extractor metadata, but to no avail
a
@josh_lloyd - I’ve run into the issue as well. Can you confirm if there are any tables in your redshift source which do not have a detectable primary key? It sounds like the snowflake target is failing as a precaution due to the
primary_key_required
. That setting is not related to needing primary keys on the target (Snowflake doesn’t actually have “real” primary keys) but it is complaining that without a primary key it cannot ensure duplicates would be removed for you.
The recommended practice is generally: 1. For tables with primary keys logic, ensuring the primary key is specified gives Snowflake the logic it needs to merge upsert the data (which ensures there are no duplicates). 2. For tables with valid replication keys but no primary key, snowflake will append just new records. Duplicates will be rare but are still possible. 3. For tables with no replication key and no primary key, snowflake will likely append a full copy of the table during each pipeline execution. 4. In either #2 or #3 scenario, there may be a need to dedupe the output. For this reason, the Snowflake target wants you to confirm that you are okay with dealing with potential duplicates by specifying
primary_key_required=false
. Does this help at all?
j
most of these redshift source tables appear to have primary keys.
I guess I was hoping that those columns would also be labeled as primary keys in Snowflake as well. It’s good to know the behavior in either case. I mostly just wanted to confirm that there wasn’t some step I was missing in order to get those primary keys in the target. Sounds like providing a
replication-key
may or may not even solve that.
a
Yes, this is not clearly documented in the Snowflake tap, but you'll need to disable the
primary_key_required
setting if any tables at all do not have a primary key. In my own implementation, I resisted this for a while until I realized the setting wasn't needed or especially helpful.
To your other point, yes, the Redshift tap should detect primary keys and pass the information along to the target. However, because Snowflake doesn't enforce primary key uniqueness, I don't recall if the target chooses to actually define those as primary keys or just treat them as such during the upsert.