jose_ribeiro
05/24/2021, 10:28 PMmeltano elt tap-shopify target-bigquery --full-refresh --job_id=shopify_to_bq
On my .env
, I'm specifying the state and the catalog like:
TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.json
TAP_SHOPIFY__STATE=extract/tap-shopify.state.json
On my catalog, I have the replication method, and key as:
"metadata": {
"selected": true,
"table-key-properties": ["id"],
"forced-replication-method": "INCREMENTAL",
"valid-replication-keys": ["updated_at"]
}
I can see the bookmark line on console, but the state file remains empty.
{
"bookmarks": {
"currently_sync_stream": "transactions",
"orders": {
"since_id": 123,
"updated_at": "2021-05-24T22:18:48.000000Z"
},
"products": {
"since_id": 456,
"updated_at": "2021-05-24T22:18:55.000000Z"
},
"transaction_orders": {
"since_id": 789,
"updated_at": "2021-05-24T22:19:20.000000Z"
},
"transactions": {
"created_at": "2021-05-24T22:17:37.000000Z"
}
}
}
Also, if I write the state file using the content above, it also ignores that and start over again using the since_id=0
.
Any idea how to fix or what I'm doing wrong?aaronsteers
05/24/2021, 11:04 PMreplication-method
, replication-key
as documented here: https://meltano.com/docs/singer-spec.html#metadatachris_kings-lynne
05/25/2021, 1:57 AM>=
to guaratnee it doesn’t miss anything. and then bigquery rows are immuatable right?jules_huisman
05/25/2021, 8:55 AM--full-refresh
on purpose? I think that makes it ignore the state from your previous runs.jose_ribeiro
05/25/2021, 1:54 PMjules_huisman
05/25/2021, 1:58 PMadswerve
variant of target-bigquery
you can set replication_method
in the config of the target to truncate
in order to overwrite existing data. (Default is append
from the top of my head)jose_ribeiro
05/25/2021, 2:12 PMtruncate
over the --full-refresh
argument? Should I use both?jose_ribeiro
05/25/2021, 2:15 PMjules_huisman
05/25/2021, 2:17 PM--full-refresh
relates mainly to the extraction of the data, whether to use the previous state of the job or to ignore the state and pull all the data (handled by Meltano). The replication_method
config is used specifically for target-bigquery
and specifies whether each table should be truncated on each load, or simply append the data to the existing table.jules_huisman
05/25/2021, 2:21 PMreplication_method
functionality is not present on the main branch of target-bigquery, I was probably working with another branch.jose_ribeiro
05/25/2021, 2:22 PMjose_ribeiro
05/25/2021, 2:24 PMjules_huisman
05/25/2021, 2:25 PMjose_ribeiro
05/25/2021, 2:44 PMmeltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
I started getting duplicated records:
SELECT COUNT(*) TT,
ID
FROM `<shopify-order-table>`
GROUP BY ID
HAVING TT > 1
jules_huisman
05/25/2021, 2:50 PMmatt_hardner
05/25/2021, 3:01 PMadswerve
is the only provider who offers the truncate option if I'm not mistaken.jose_ribeiro
05/25/2021, 3:18 PMjose_ribeiro
05/25/2021, 3:19 PMjules_huisman
05/25/2021, 6:07 PM