jose_ribeiro
05/24/2021, 10:28 PMmeltano elt tap-shopify target-bigquery --full-refresh --job_id=shopify_to_bq
On my .env, I'm specifying the state and the catalog like:
TAP_SHOPIFY__CATALOG=extract/tap-shopify.catalog.json
TAP_SHOPIFY__STATE=extract/tap-shopify.state.json
On my catalog, I have the replication method, and key as:
"metadata": {
"selected": true,
"table-key-properties": ["id"],
"forced-replication-method": "INCREMENTAL",
"valid-replication-keys": ["updated_at"]
}
I can see the bookmark line on console, but the state file remains empty.
{
"bookmarks": {
"currently_sync_stream": "transactions",
"orders": {
"since_id": 123,
"updated_at": "2021-05-24T22:18:48.000000Z"
},
"products": {
"since_id": 456,
"updated_at": "2021-05-24T22:18:55.000000Z"
},
"transaction_orders": {
"since_id": 789,
"updated_at": "2021-05-24T22:19:20.000000Z"
},
"transactions": {
"created_at": "2021-05-24T22:17:37.000000Z"
}
}
}
Also, if I write the state file using the content above, it also ignores that and start over again using the since_id=0.
Any idea how to fix or what I'm doing wrong?aaronsteers
05/24/2021, 11:04 PMreplication-method, replication-key as documented here: https://meltano.com/docs/singer-spec.html#metadatachris_kings-lynne
05/25/2021, 1:57 AM>= to guaratnee it doesn’t miss anything. and then bigquery rows are immuatable right?jules_huisman
05/25/2021, 8:55 AM--full-refresh on purpose? I think that makes it ignore the state from your previous runs.jose_ribeiro
05/25/2021, 1:54 PMjules_huisman
05/25/2021, 1:58 PMadswerve variant of target-bigquery you can set replication_method in the config of the target to truncate in order to overwrite existing data. (Default is append from the top of my head)jose_ribeiro
05/25/2021, 2:12 PMtruncate over the --full-refresh argument? Should I use both?jose_ribeiro
05/25/2021, 2:15 PMjules_huisman
05/25/2021, 2:17 PM--full-refresh relates mainly to the extraction of the data, whether to use the previous state of the job or to ignore the state and pull all the data (handled by Meltano). The replication_method config is used specifically for target-bigquery and specifies whether each table should be truncated on each load, or simply append the data to the existing table.jules_huisman
05/25/2021, 2:21 PMreplication_method functionality is not present on the main branch of target-bigquery, I was probably working with another branch.jose_ribeiro
05/25/2021, 2:22 PMjose_ribeiro
05/25/2021, 2:24 PMjules_huisman
05/25/2021, 2:25 PMjose_ribeiro
05/25/2021, 2:44 PMmeltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
I started getting duplicated records:
SELECT COUNT(*) TT,
ID
FROM `<shopify-order-table>`
GROUP BY ID
HAVING TT > 1jules_huisman
05/25/2021, 2:50 PMmatt_hardner
05/25/2021, 3:01 PMadswerve is the only provider who offers the truncate option if I'm not mistaken.jose_ribeiro
05/25/2021, 3:18 PMjose_ribeiro
05/25/2021, 3:19 PMjules_huisman
05/25/2021, 6:07 PM