Hello all wave I have a question about extracting data from Meltano #troubleshooting

Hello all :wave: I have a question about extractin...

omar_abed

03/30/2023, 1:29 PM

Hello all 👋 I have a question about extracting data from two accounts into a single destination using inheritance. I configured two extractors, like so:

Copy code

extractors:
  - name: tap-mytap-account1
    namespace: tap_mytap
    executable: tap-mytap
    config:
      username: 'abc'
      password: '123'
  - name: tap-mytap-account2
    namespace: tap_mytap
    inherit_from: tap-mytap-account1
    config:
      username: 'xyz'
      password: '789'

And I'm trying to load them into the same table using the bigquery loader, like so

Copy code

schedules:
  - name: mytap_account1_to_bigquery
    extractor: tap-mytap-account1
    loader: target-bigquery
    transform: skip
    ...
  - name: mytap_account2_to_bigquery
    extractor: tap-mytap-account2
    loader: target-bigquery
    ...
    transform: skip

My question is: Is there a way to use the

target-bigquery-truncate

loader instead to get these to truncate the destination table on each run? Currently, one job runs and then truncates the results of the other. Can inherited jobs "run together" essentially and truncate previous runs together, rather than truncating each other? My current workaround is to append the table and dedupe later on, but this is causing my source table to grow unnecessarily large, when all I need is the latest results from both runs, combined.

Sven Balnojan

03/30/2023, 2:00 PM

I'm not getting all the details, but why don't you separate out the truncate into an extra pipeline step?

omar_abed

03/30/2023, 2:08 PM

Hmm that makes sense, like truncating prior to running these loaders? I guess I hadn't thought of that. But what if the truncation succeeds and the loaders fail for some reason?

omar_abed

03/30/2023, 2:11 PM

I guess I didn't even think about an isolated truncation step. I'm pretty new to Meltano, still getting a feel for how I'd do that.

omar_abed

03/30/2023, 2:12 PM

I guess I could load the first job into the

target-bigquery-truncate

loader, and the next into the

target-bigquery

loader, and offset them by a few minutes to make sure the first runs first

Sven Balnojan

03/30/2023, 2:13 PM

Well doesn't sound like an actual meltano issue to be honest. So what I'd do (and this is pretty applicable to most databases etc.): 1. ingest my data into a new table called "bla_tmp" 2. if succeeds, swap my tables (or truncate and rename,...) 3. enjoy. (all other orders kind of produce data downtime or what you just described)

omar_abed

03/30/2023, 2:15 PM

Okay that makes sense to me. And I'm just now realizing that the

target-bigquery-truncate

loader I'm talking about is a custom inherited loader which just sets the

replication_method

on the

target-bigquery

loader to

truncate

. So.. never mind. 🤦 Thanks for walking me through it though, this makes a lot of sense

Sven Balnojan

03/30/2023, 2:16 PM

I figured that, but your problem would still be there 🙂 The first loading and then renaming option is probably the default recommendation. Let me know what you decide to go with!

Open in Slack

Previous Next