Hi I m running the tap shopify and I m facing a weird behavi Meltano #plugins-general

Hi, I'm running the tap-shopify and I'm facing a w...

jose_ribeiro

05/24/2021, 10:28 PM

Hi, I'm running the tap-shopify and I'm facing a weird behavior, which I couldn't find a solution for yet. The elt run seems to not be respecting the bookmark, so, every time that I run the elt, I got duplicated records on BQ. I'm running the elt using:

Copy code

meltano elt tap-shopify target-bigquery --full-refresh --job_id=shopify_to_bq

On my

.env

, I'm specifying the state and the catalog like:

Copy code

TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.json
TAP_SHOPIFY__STATE=extract/tap-shopify.state.json

On my catalog, I have the replication method, and key as:

Copy code

"metadata": {
  "selected": true,
  "table-key-properties": ["id"],
  "forced-replication-method": "INCREMENTAL",
  "valid-replication-keys": ["updated_at"]
}

I can see the bookmark line on console, but the state file remains empty.

Copy code

{
  "bookmarks": {
    "currently_sync_stream": "transactions",
    "orders": {
      "since_id": 123,
      "updated_at": "2021-05-24T22:18:48.000000Z"
    },
    "products": {
      "since_id": 456,
      "updated_at": "2021-05-24T22:18:55.000000Z"
    },
    "transaction_orders": {
      "since_id": 789,
      "updated_at": "2021-05-24T22:19:20.000000Z"
    },
    "transactions": {
      "created_at": "2021-05-24T22:17:37.000000Z"
    }
  }
}

Also, if I write the state file using the content above, it also ignores that and start over again using the

since_id=0

. Any idea how to fix or what I'm doing wrong?

aaronsteers

05/24/2021, 11:04 PM

In the metadata, can you try specifying specifically

replication-method

replication-key

as documented here: https://meltano.com/docs/singer-spec.html#metadata

chris_kings-lynne

05/25/2021, 1:57 AM

I thought it always gets the most recent row twice, because it uses

>=

to guaratnee it doesn’t miss anything. and then bigquery rows are immuatable right?

jules_huisman

05/25/2021, 8:55 AM

Did you include the

--full-refresh

on purpose? I think that makes it ignore the state from your previous runs.

jose_ribeiro

05/25/2021, 1:54 PM

good catch @jules_huisman! I used that to try to overwrite the duplicated data, but it seems to be appending everything to the existing data

jules_huisman

05/25/2021, 1:58 PM

If you are using the

adswerve

variant of

target-bigquery

you can set

replication_method

in the config of the target to

truncate

in order to overwrite existing data. (Default is

append

from the top of my head)

jose_ribeiro

05/25/2021, 2:12 PM

so, what's the difference between using the

truncate

over the

--full-refresh

argument? Should I use both?

jose_ribeiro

05/25/2021, 2:15 PM

I just did a small test re-importing everything and now I have more duplicated on the table

jules_huisman

05/25/2021, 2:17 PM

--full-refresh

relates mainly to the extraction of the data, whether to use the previous state of the job or to ignore the state and pull all the data (handled by Meltano). The

replication_method

config is used specifically for

target-bigquery

and specifies whether each table should be truncated on each load, or simply append the data to the existing table.

jules_huisman

05/25/2021, 2:21 PM

Oh sorry, I see now that the

replication_method

functionality is not present on the main branch of target-bigquery, I was probably working with another branch.

jose_ribeiro

05/25/2021, 2:22 PM

that's why it's always using the append!

jose_ribeiro

05/25/2021, 2:24 PM

any idea how to make this run?

jules_huisman

05/25/2021, 2:25 PM

mmm, no actually it should work. It is in the source code: https://github.com/adswerve/target-bigquery/blob/6c28d438cbe183d3bd7d86b2997a12a1017d54df/target_bigquery/__init__.py#L46

jose_ribeiro

05/25/2021, 2:44 PM

Now I could see the "WRITE_TRUNCATE" being sent to BQ, but I still don't understand why it's always duplicating the data. I'm deleting all records from my table and after running twice:

Copy code

meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq

I started getting duplicated records:

Copy code

SELECT COUNT(*) TT,
          ID
    FROM `<shopify-order-table>` 
 GROUP BY ID
 HAVING TT > 1

jules_huisman

05/25/2021, 2:50 PM

Ah, apparently it is a bug in target-bigquery. https://github.com/adswerve/target-bigquery/issues/2. Someone fixed it in this branch: https://github.com/adswerve/target-bigquery/tree/hotfix/issue2

matt_hardner

05/25/2021, 3:01 PM

Thanks for info, I'm running into the same issue. It appears

adswerve

is the only provider who offers the truncate option if I'm not mistaken.

jose_ribeiro

05/25/2021, 3:18 PM

yes! so, I tested it changing the PIP url, and then I started getting a different error, which seems to be related to the schema I think this branch have the fix, but it might be outdated

jose_ribeiro

05/25/2021, 3:19 PM

thanks for helping me on this Jules! I'll open a new thread to discuss a different issue, once we know about this truncate now

jules_huisman

05/25/2021, 6:07 PM

No problem!

Open in Slack

Previous Next