Hello all :wave: I have a question about extractin...
# troubleshooting
o
Hello all 👋 I have a question about extracting data from two accounts into a single destination using inheritance. I configured two extractors, like so:
Copy code
extractors:
  - name: tap-mytap-account1
    namespace: tap_mytap
    executable: tap-mytap
    config:
      username: 'abc'
      password: '123'
  - name: tap-mytap-account2
    namespace: tap_mytap
    inherit_from: tap-mytap-account1
    config:
      username: 'xyz'
      password: '789'
And I'm trying to load them into the same table using the bigquery loader, like so
Copy code
schedules:
  - name: mytap_account1_to_bigquery
    extractor: tap-mytap-account1
    loader: target-bigquery
    transform: skip
    ...
  - name: mytap_account2_to_bigquery
    extractor: tap-mytap-account2
    loader: target-bigquery
    ...
    transform: skip
My question is: Is there a way to use the
target-bigquery-truncate
loader instead to get these to truncate the destination table on each run? Currently, one job runs and then truncates the results of the other. Can inherited jobs "run together" essentially and truncate previous runs together, rather than truncating each other? My current workaround is to append the table and dedupe later on, but this is causing my source table to grow unnecessarily large, when all I need is the latest results from both runs, combined.
s
I'm not getting all the details, but why don't you separate out the truncate into an extra pipeline step?
o
Hmm that makes sense, like truncating prior to running these loaders? I guess I hadn't thought of that. But what if the truncation succeeds and the loaders fail for some reason?
I guess I didn't even think about an isolated truncation step. I'm pretty new to Meltano, still getting a feel for how I'd do that.
I guess I could load the first job into the
target-bigquery-truncate
loader, and the next into the
target-bigquery
loader, and offset them by a few minutes to make sure the first runs first
s
Well doesn't sound like an actual meltano issue to be honest. So what I'd do (and this is pretty applicable to most databases etc.): 1. ingest my data into a new table called "bla_tmp" 2. if succeeds, swap my tables (or truncate and rename,...) 3. enjoy. (all other orders kind of produce data downtime or what you just described)
o
Okay that makes sense to me. And I'm just now realizing that the
target-bigquery-truncate
loader I'm talking about is a custom inherited loader which just sets the
replication_method
on the
target-bigquery
loader to
truncate
. So.. never mind. 🤦 Thanks for walking me through it though, this makes a lot of sense
s
I figured that, but your problem would still be there 🙂 The first loading and then renaming option is probably the default recommendation. Let me know what you decide to go with!