jose_ribeiro
05/25/2021, 4:51 PMINCREMENTAL
pull, but it seems to not be working as I'm expecting.
This is my first run:
➜ meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
meltano | Running extract & load...
meltano | Found state in extract/tap-shopify.state.json
meltano | Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json
tap-shopify | INFO Syncing stream: orders
tap-shopify | INFO GET <https://shop.myshopify.com/admin/api/2021-04/orders.json?since_id=1&updated_at_min=2021-05-25> 12:30:00+00:00&updated_at_max=2021-05-25 16:11:57+00:00&limit=175&status=any
target-bigquery | INFO Pushing state: {}
meltano | Incremental state has been updated at 2021-05-25 16:11:58.921386.
...
target-bigquery | INFO Copy t_meltano_orders_a0856b4bf18c43d0848ec18eac403c49 to meltano_orders
target-bigquery | INFO Pushing state: {'bookmarks': {'currently_sync_stream': 'orders', 'orders': {'since_id': 3789635780666, 'updated_at': '2021-05-25T16:11:57.000000Z'}}}
meltano | Incremental state has been updated at 2021-05-25 16:12:32.573075.
I was expecting the last state above to be the initial state on the second run bellow. But it's starting over from the ?since_id=1
when I was expecting to be ?since_id=3789635780666
.
Also we can see the line Pushing state: {}
, which is a bit strange once Meltano found the state.
➜ meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
meltano | Running extract & load...
meltano | Found state in extract/tap-shopify.state.json
meltano | Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json
tap-shopify | INFO Syncing stream: orders
tap-shopify | INFO GET <https://shop.myshopify.com/admin/api/2021-04/orders.json?since_id=1&updated_at_min=2021-05-25> 12:30:00+00:00&updated_at_max=2021-05-25 16:14:10+00:00&limit=175&status=any
tap-shopify | INFO --> 200 OK 2736839b
target-bigquery | INFO Pushing state: {}
meltano | Incremental state has been updated at 2021-05-25 16:14:11.784550.
...
target-bigquery | INFO Pushing state: {'bookmarks': {'currently_sync_stream': 'orders', 'orders': {'since_id': 3789628440634, 'updated_at': '2021-05-25T16:14:10.000000Z'}}}
meltano | Incremental state has been updated at 2021-05-25 16:14:44.962819.
meltano | Extract & load complete!
meltano | Transformation skipped.
On the catalog, I'm already set the replication method.
"streams": [{
"stream": "orders",
"tap_stream_id": "orders",
"schema": {
...
},
"metadata": [
...
],
"key_properties": ["id"],
"replication_key": "updated_at",
"replication_method": "INCREMENTAL"
}]
Any idea of what I'm doing wrong?douwe_maan
05/25/2021, 5:03 PMmeltano elt tap-shopify target-bigquery --job_id=shopify_to_bq --dump=state
to verify Meltano has stored the correct state?douwe_maan
05/25/2021, 5:04 PMPushing state: {}
is a target-bigquery bug: https://github.com/adswerve/target-bigquery/issues/9, but I don't think that should affect you here unless the second job failsjose_ribeiro
05/25/2021, 5:10 PM➜ meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq --dump=state
[2021-05-25 14:08:58,270] [18585|MainThread|meltano.core.plugin.singer.tap] [INFO] Found state in extract/tap-shopify.state.json
[2021-05-25 14:08:58,271] [18585|MainThread|meltano.core.plugin.singer.tap] [INFO] Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json
➜ meltano-test git:(master) ✗
jose_ribeiro
05/25/2021, 5:11 PMextract/tap-shopify.state.json
to be filled with bookmark content after the first run?douwe_maan
05/25/2021, 5:11 PMstate
file instead of letting Meltano handle the state transparentlydouwe_maan
05/25/2021, 5:11 PMdouwe_maan
05/25/2021, 5:11 PMstate: ...
from meltano.yml
douwe_maan
05/25/2021, 5:12 PMcatalog: ...
if you want the automatic behavior: https://meltano.com/docs/integration.html#extractor-catalog-generationjose_ribeiro
05/25/2021, 5:25 PMCRITICAL 'type' or 'anyOf' are required fields in property: {}
As far as I understood, I got this error because of some attributes likes this https://github.com/singer-io/tap-shopify/blob/master/tap_shopify/schemas/orders.json#L9
I seems the target requires to describe all nested attributes to be able to insert the data on BQ. Is that right?douwe_maan
05/25/2021, 5:25 PMjose_ribeiro
05/25/2021, 8:35 PMcharley_guillaume
06/28/2021, 2:15 PMdocker run -v $(pwd):/project \
-w /project \
-p 5000:5000 \
meltano/meltano elt tap-whise target-postgres --job_id=whisepipeline --dump=state
command I am left with No state was found, complete import.
I have no clue what I must do to save the state.douwe_maan
06/28/2021, 3:07 PMcharley_guillaume
06/28/2021, 3:30 PMcharley_guillaume
06/28/2021, 3:32 PMdouwe_maan
06/28/2021, 3:35 PMI shouldn’t specify a state file in meltano.yml nor create one?Correct!
How can I browse the state then?If you want to view the state currently stored in the system DB, you can use the
--dump=state
trick you already found, but that won’t work as expected if you have state: state.json
in meltano.yml
, since that’ll be used instead of the system DB.
and how can I make sure the state is being used properly?If the tap and target behave correctly, and you’re using the same
--job_id
on meltano elt
each time, you can generally trust that it’s being used properly 🙂 The --state=dump
option allows you to double check that behaviorcharley_guillaume
06/28/2021, 3:41 PMNo state was found, complete import.
with message Could not find state file for this pipeline
.
Do you have some options I could investigate?douwe_maan
06/28/2021, 3:47 PMmeltano elt
pipeline without --dump=state
so that state could be collected?douwe_maan
06/28/2021, 3:48 PMstate
capability under capabilities
in meltano.yml
?charley_guillaume
06/28/2021, 3:56 PMdouwe_maan
06/28/2021, 3:57 PM<project>/.meltano/meltano.db
and see if it has any rows in the job
table for your whisepipeline
job ID?charley_guillaume
06/28/2021, 4:03 PM53 whisepipeline SUCCESS 2021-06-28 15:31:49.683501 2021-06-28 15:39:00.518569 {} 0 e6b3ab238d4b4dfbb0976486f61f393f cli 2021-06-28 15:38:59.962122
charley_guillaume
06/28/2021, 4:07 PMself.state = singer.bookmarks.write_bookmark(self.state, self.name+"_"+str(id), self.replication_key, lastupdatetime)
singer.write_state(self.state)