A while back I posted a discussion about an issue ...
# troubleshooting
a
A while back I posted a discussion about an issue where I could not load data into postgres. pipelinewise-tap-postgres was complaining about a duplicate key
(duplicate key value violates unique constraint)
, even though the key it was reporting did not actually exist in the db. I discovered this possibly related issue with psycopg2. https://github.com/psycopg/psycopg2/issues/1036 In summary, if there are parallel statements executing
CREATE TABLE IF NOT EXISTS
then it can raise this error. I don't know the internals too well but is it possible to cause the db to execute`create table if not exists` messages in parallel when running meltano (from within dagster if that's relevant)? That might explain my issue. Edit: prior thread: https://meltano.slack.com/archives/C069CQNHDNF/p1722527619910199 Going to give
meltanolabs
variant a go...
đź‘€ 1
c
this looks like its a problem when the table is named the same and multiple
create table if not exists
are ran in parallel - I assume you mean target-postgres, since that is where
create table if not exists
are executing. This would be an extremely common problem for anyone using target-postgres if the issue was with parallelism when creating tables [of different names]. But in your original thread your unique key violation is coming from a table you’ve defined, not any of the PGSQL internal structure tables used for user-defined tables.
e
Thanks for sharing the link to the psycopg2 issue! I'll try to come up with a MRE, unless someone beats me to it. Note to self to link this back to the target-postgres issue, and to add an issue template that asks for the version of postgres and the target.
a
@Edgar RamĂ­rez (Arch.dev) I'm almost glad to see someone else with the same issue, I thought I was going mad! @Charles Feduke yes sorry it's target. the issue I am describing is loading into a table called
media_insights
which does already exist. I only have a surface understanding of postgres, afraid I don't really understand your last sentence and the implications
c
so the thread from psycopg2 is discussing a situation where someone is trying to issue a ton of data definition language statements in parallel to create the same table (why… who knows). PostgreSQL uses itself to store table definitions, to include unique keys on constraint names, object names, etc. So the issue on the psycopg2 thread is because one of the threads is in the process of creating the table and its associated index while another thread has issued the
create table if not exists
and the table in question doesn’t technically exist in the PostgreSQL system tables because its still being created, so it also tries to create the table and its associated objects and thus the result is a unique key violation because some of the objects do exist (but in this scenario its from the PostgreSQL system tables which store the definitions for all of the other tables in the database). It looks like Edgar has successfully identified your problem though which is good.
a
Thanks, appreciate the explanation
@Edgar RamĂ­rez (Arch.dev) I tried the approach of using a jsonl file as a middle man and got the data loaded into the target without error. Edit: Actually they ended up a different schema, because it didn't have the meltano namespace for the extractor. But it did manage to load the file, albeit by creating new tables in an empty schema. More edits: I had used a mapper to insert a date field, and this date field was one of the primary keys defined in
metadata:
, overriding the key definition in the tap repo. I wonder if that was affecting it too. I've just decided to edit my fork to include those keys and add the missing fields via
post_process
. That with a full delete of the target schema in postgres got me back up and running.
đź‘€ 1