Hi guys, I'm trying to configure the `INCREMENTAL`...
# plugins-general
j
Hi guys, I'm trying to configure the
INCREMENTAL
pull, but it seems to not be working as I'm expecting. This is my first run:
Copy code
➜  meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
meltano         | Running extract & load...
meltano         | Found state in extract/tap-shopify.state.json
meltano         | Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json
tap-shopify     | INFO Syncing stream: orders
tap-shopify     | INFO GET <https://shop.myshopify.com/admin/api/2021-04/orders.json?since_id=1&updated_at_min=2021-05-25> 12:30:00+00:00&updated_at_max=2021-05-25 16:11:57+00:00&limit=175&status=any
target-bigquery | INFO Pushing state: {}
meltano         | Incremental state has been updated at 2021-05-25 16:11:58.921386.
...
target-bigquery | INFO Copy t_meltano_orders_a0856b4bf18c43d0848ec18eac403c49 to meltano_orders
target-bigquery | INFO Pushing state: {'bookmarks': {'currently_sync_stream': 'orders', 'orders': {'since_id': 3789635780666, 'updated_at': '2021-05-25T16:11:57.000000Z'}}}
meltano         | Incremental state has been updated at 2021-05-25 16:12:32.573075.
I was expecting the last state above to be the initial state on the second run bellow. But it's starting over from the
?since_id=1
when I was expecting to be
?since_id=3789635780666
. Also we can see the line
Pushing state: {}
, which is a bit strange once Meltano found the state.
Copy code
➜  meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq
meltano         | Running extract & load...
meltano         | Found state in extract/tap-shopify.state.json
meltano         | Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json
tap-shopify     | INFO Syncing stream: orders
tap-shopify     | INFO GET <https://shop.myshopify.com/admin/api/2021-04/orders.json?since_id=1&updated_at_min=2021-05-25> 12:30:00+00:00&updated_at_max=2021-05-25 16:14:10+00:00&limit=175&status=any
tap-shopify     | INFO --> 200 OK 2736839b
target-bigquery | INFO Pushing state: {}
meltano         | Incremental state has been updated at 2021-05-25 16:14:11.784550.
...
target-bigquery | INFO Pushing state: {'bookmarks': {'currently_sync_stream': 'orders', 'orders': {'since_id': 3789628440634, 'updated_at': '2021-05-25T16:14:10.000000Z'}}}
meltano         | Incremental state has been updated at 2021-05-25 16:14:44.962819.
meltano         | Extract & load complete!
meltano         | Transformation skipped.
On the catalog, I'm already set the replication method.
Copy code
"streams": [{
  "stream": "orders",
  "tap_stream_id": "orders",
  "schema": {
    ...
  },
  "metadata": [
    ...
  ],
  "key_properties": ["id"],
  "replication_key": "updated_at",
  "replication_method": "INCREMENTAL"
}]
Any idea of what I'm doing wrong?
d
Can you run
meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq --dump=state
to verify Meltano has stored the correct state?
The
Pushing state: {}
is a target-bigquery bug: https://github.com/adswerve/target-bigquery/issues/9, but I don't think that should affect you here unless the second job fails
j
hey @douwe_maan! So, it seems to be empty:
Copy code
➜  meltano-test git:(master) ✗ meltano elt tap-shopify target-bigquery --job_id=shopify_to_bq --dump=state
[2021-05-25 14:08:58,270] [18585|MainThread|meltano.core.plugin.singer.tap] [INFO] Found state in extract/tap-shopify.state.json
[2021-05-25 14:08:58,271] [18585|MainThread|meltano.core.plugin.singer.tap] [INFO] Found catalog in /home/zek/projects/meltano-test/extract/tap-shopify.catalog.json

➜  meltano-test git:(master) ✗
Should I expect the file
extract/tap-shopify.state.json
to be filled with bookmark content after the first run?
d
Ah I think that's the issue! You've set an explicit
state
file instead of letting Meltano handle the state transparently
If you provide a path to a state file, Meltano will assume you're managing it yourself
So you'll want to remove
state: ...
from
meltano.yml
You also probably don't need
catalog: ...
if you want the automatic behavior: https://meltano.com/docs/integration.html#extractor-catalog-generation
j
Awesome @douwe_maan! The incremental pull worked! thanks for that! Regarding the catalog, I tried to make it automatic last week, but I was forced to make it explicit because I started getting this error:
Copy code
CRITICAL 'type' or 'anyOf' are required fields in property: {}
As far as I understood, I got this error because of some attributes likes this https://github.com/singer-io/tap-shopify/blob/master/tap_shopify/schemas/orders.json#L9 I seems the target requires to describe all nested attributes to be able to insert the data on BQ. Is that right?
d
@jose_ribeiro Glad the state stuff works now! As for that schema, you're right, but you can use Meltano's built-in https://meltano.com/docs/plugins.html#schema-extra instead of overriding the entire catalog 🙂
j
Hey @douwe_maan, sorry for my delay. I got a bit busy. (I've deleted my previous comment) I think I got the right path! thanks for helping me out!!
c
Hello @douwe_maan, can you explain how the state file is then created? I am saving the state with bookmarks. When running the
docker run -v $(pwd):/project \
-w /project \
-p 5000:5000 \
meltano/meltano elt tap-whise target-postgres --job_id=whisepipeline --dump=state
command I am left with
No state was found, complete import.
I have no clue what I must do to save the state.
d
Hi @charley_guillaume! State is typically stored automatically in the system database: https://meltano.com/docs/integration.html#incremental-replication-state. Do you have a specific need to store it separately?
c
Oh! Excellent. So I shouldn't specify a state file in meltano.yml nor create one?
How can I browse the state then? and how can I make sure the state is being used properly?
d
I shouldn’t specify a state file in meltano.yml nor create one?
Correct!
How can I browse the state then?
If you want to view the state currently stored in the system DB, you can use the
--dump=state
trick you already found, but that won’t work as expected if you have
state: state.json
in
meltano.yml
, since that’ll be used instead of the system DB.
and how can I make sure the state is being used properly?
If the tap and target behave correctly, and you’re using the same
--job_id
on
meltano elt
each time, you can generally trust that it’s being used properly 🙂 The
--state=dump
option allows you to double check that behavior
c
I've tried this method, however, I keep getting the error
No state was found, complete import.
with message
Could not find state file for this pipeline
. Do you have some options I could investigate?
d
Did you already run a
meltano elt
pipeline without
--dump=state
so that state could be collected?
Does tap-whise support state? Did you add the
state
capability under
capabilities
in
meltano.yml
?
c
I did both correctly, yes!
d
All right. To debug this, can you look directly in the system DB in
<project>/.meltano/meltano.db
and see if it has any rows in the
job
table for your
whisepipeline
job ID?
c
I do have a row :
53	whisepipeline	SUCCESS	2021-06-28 15:31:49.683501	2021-06-28 15:39:00.518569	{}	0	e6b3ab238d4b4dfbb0976486f61f393f	cli	2021-06-28 15:38:59.962122
Here is how I save the state:
self.state = singer.bookmarks.write_bookmark(self.state, self.name+"_"+str(id), self.replication_key, lastupdatetime)
singer.write_state(self.state)