Hi I have a quick question about an issue that i m running i Meltano #troubleshooting

Hi, I have a quick question about an issue that i'...

felix_klein

04/17/2023, 3:59 PM

Hi, I have a quick question about an issue that i'm running into. I'm using meltano to load and transform (with dbt) data form an external database into a local postgres database. This runs on a schedule with the integrated airflow scheduler. Then i have an independent program that uses the data synced into this db, and that also runs on a schedule. Sometimes those schedules overlap, and the independent program seems to see an empty table while dbt is running. Is there a way to prevent this, have dbt replace all data in the target table at once in a transaction or similar without using handrolled locks? Or am i misunderstanding something here?

visch

04/17/2023, 7:42 PM

hmm what would cause the table to become empty? Is it during the meltano run?

felix_klein

04/18/2023, 6:56 AM

yes thats what it looks like

visch

04/18/2023, 1:15 PM

Could you share your meltano.yml? And the command you're using to run

pat_nadolny

04/18/2023, 2:46 PM

Have you been able to isolate whether the table thats empty is a raw table written by a meltano loader or a table generated by dbt? If by dbt is it a table or a view? I have a few ideas depending on the situation

pat_nadolny

04/18/2023, 2:47 PM

but also +1 to Derek's ask to see the meltano.yml config. For me I'm curious what loader variant youre using

felix_klein

04/19/2023, 10:25 AM

Hi, here's my (redacted) config: ```version: 1 default_environment: dev project_id: REDACTED environments: - name: dev config: plugins: extractors: - name: tap-REDACTED_TAP_1 config: host: REDACTED port: REDACTED user: REDACTED dbname: REDACTED ssl: false metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - name: tap-REDACTED_TAP_2 config: host: REDACTED user: REDACTED database: REDACTED ssl: true metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - name: tap-REDACTED_TAP_3 config: host: REDACTED port: REDACTED user: REDACTED database: REDACTED ssl: true metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* loaders: - name: target-postgres config: host: REDACTED port: REDACTED user: REDACTED dbname: REDACTED ssl: false add_metadata_columns: true hard_delete: true transformers: - name: dbt-postgres config: host: REDACTED user: REDACTED port: REDACTED dbname: REDACTED schema: REDACTED ssl: false - name: prod config: plugins: extractors: - name: tap-REDACTED_TAP_1 config: host: REDACTED port: REDACTED user: REDACTED dbname: REDACTED ssl: false metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - name: tap-REDACTED_TAP_2 config: host: REDACTED user: REDACTED database: REDACTED ssl: true metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* - name: tap-REDACTED_TAP_3 config: host: REDACTED user: REDACTED database: REDACTED ssl: true metadata: "*": replication-method: FULL_TABLE select: - REDACTED.* - REDACTED.* - REDACTED.* - REDACTED.* loaders: - name: target-postgres config: host: REDACTED port: REDACTED user: REDACTED dbname: REDACTED ssl: false add_metadata_columns: true hard_delete: true transformers: - name: dbt-postgres config: host: REDACTED user: REDACTED port: REDACTED dbname: REDACTED schema: REDACTED ssl: false plugins: extractors: - name: tap-REDACTED_TAP_1 inherit_from: tap-postgres variant: transferwise pip_url: pipelinewise-tap-postgres - name: tap-REDACTED_TAP_2 inherit_from: tap-mysql variant: transferwise pip_url: pipelinewise-tap-mysql -…

felix_klein

04/19/2023, 10:26 AM

Also, I'm currently using the default variants for everything in meltano.

visch

04/19/2023, 12:32 PM

Based on

the table that i'm seeing the issue with is a result of dbt so i think it is created by dbt

It's likely not a tap/target being the issue. From your initial description it sounded like it could be an activate version thing from the default target-postgres. ---- The actual error from DBT would have probably cleared most of my confusion up above just fyi for next time! https://stackoverflow.com/questions/70383118/using-dbt-and-meltano-how-can-i-prevent-multiple-dbt-job-runs-from-conflicting is one possible solution, basically don't allow a dbt job to run at the same time as another job. Locks on the DB can help you here.

felix_klein

04/19/2023, 3:02 PM

Thanks for your reply, I found the linked stackoverflow post, but that uses a handrolled lock on the entire DB, thus my question if this is possible without that, given that I couldn't find any other solutions and that does seem like quite a simple thing i figured that I was doing something wrong. Maybe i should ask that question in the DBT community? Also while I'm here another question - can you tell from my meltano.yml why the deletion of records in the target db does not work? I'm using

FULL_TABLE

add_metadata_columns

as well as

hard_delete

, but when i delete a record in the source (tap) db and re-run meltano, those records are neither marked as deleted (

_sdc_deleted_at

) nor hard-deleted - they just stay there until i delete them manually or drop the table before running meltano. I tried finding existing posts about this issue but couldn't find anything, so I guess I'm doing something wrong here?

felix_klein

04/19/2023, 3:05 PM

And sorry for so many questions - I'm quite new to data science and given that meltano combines a lot of different tools and techniques It can get a bit confusing 😄

visch

04/19/2023, 3:15 PM

but that uses a handrolled lock on the entire DB, thus my question if this is possible without that - maybe i should ask that question in the DBT community?

What do you think "handrolled" lock means / does that's negative here?

seem like quite a simple thing i figured that I was doing something wrong

It's not simple, just like this isn't a quick question 😄 . dbt is really designed to be ran all together a few times a day.

Also while I'm here another question - can you tell from my meltano.yml why the deletion of records in the target db does not work

Assuming you're talking about

REDACTED_JOB_1

, which refers to

tap-REDACTED_TAP_3

which is

pipelinewise-tap-mysql

. Also I think what you want here is for the target table to match the source table 1-1. For records to be deleted properly using purely the tap/target combo we recommend using

activate_version

, note that another simple solution for full_table syncs is to just drop the target table before you load your new tap data, but that requires you to be ok with the target table being in a broken state for a period of time. So I'm going to look at the question, "why doesn't

pipelinewise-tap-mysql

with

pipelinewise-target-postgres

properly use activate version to allow my source/target tables to "match". I believe

pipelinewise-tap-mysql

supports activate version. I also believe pipelinewise-target-postgres supports activate_version. So it "should" be working as

hard_delete

is set to true. What I think is you should debug this by running something like

meltano invoke tap-redacted_tap3 > out

look at the output file for some activate_version messages. If they are there, cool! Then run you target with that data

cat out | meltano invoke target-postgres

then it's debugging into why it's not happening properly. Should open a sepearte thread if you're not sure if it's a tap/target issue at this point

felix_klein

04/19/2023, 3:34 PM

What do you think "handrolled" lock means / does that's negative here?

Sorry if that came across as negative, I didn't mean that. What I meant by "hand-rolled" is doing it via a hook and macros and then writing my own locking logic and not dbt or meltano "native" functionality. But I will look into this 🙂 About

ACTIVATE_VERSION

: Yes those messages are indeed present in the tap output - I will continue debugging this and open a seperate issue if needed. Thanks a lot for your help, that cleared things up for me!

visch

04/19/2023, 3:35 PM

Sorry if that came across as negative, I didn't mean that. What I meant by "hand-rolled" is doing it via a hook and macros and then writing my own locking logic and not dbt or meltano "native" functionality. But I will look into this 🙂

Totally ok, no offense taken at all just was curious if there was a technical reason you didn't like it. That reason makes sense to me Good luck with activate version! It "should" work 😄

felix_klein

04/19/2023, 3:36 PM

Thanks! Yes the "technical" reason is that I'm lazy and didn't want to mantain my own locking logic 😜

visch

04/19/2023, 3:36 PM

If you're being lazy I recommend not running dbt constantly, just run it once a day or twice a day and call it good !

felix_klein

04/19/2023, 3:37 PM

Yeah the problem is that the resulting data is synced into business analytics tools where we need relatively up-to-date data, so I guess I'll have to work around the issues (I'm aware that i could completely forego dbt and do it differently, but i quite like dbt by now...). The cleanest solution would probably be to incorporate my "after-dbt" processing into the airflow jobs, so that i have a single schedule instead of 2 independent schedules that clash.

Open in Slack

Previous Next