peter_pezon
12/06/2022, 8:19 PMselect
and select_filter
?
• Is there a way to filter records by date range? Some taps have a start_date
config, is that general attribute or only supported with some taps?
Thank youtaylor
12/06/2022, 8:22 PMstart_date
config then yes, but not all taps (apis / files) would support incremental replication in that waypeter_pezon
12/06/2022, 8:27 PMpeter_pezon
12/06/2022, 8:30 PMstart_date
? - it's not listed as a plugin setting. Otherwise, is the approach to use inline stream maps with ___filter___
?taylor
12/06/2022, 8:34 PMstart_date
currently, but I’m looking at the code and there seems to be a replication_key that’s not clear how it’s set to me. @ken_payne is in South Africa so won’t be able to respond until tomorrow. Would you mind opening an issue on the repo for this?taylor
12/06/2022, 8:36 PMpeter_pezon
12/06/2022, 8:53 PMconfig:
plugins:
extractors:
- name: tap-snowflake
select:
- schema1-table1.a
- schema1-table1.b
- schema1-table1.c
- schema1-table1.date
- "!*.*"
metadata:
schema1-table1.*:
replication-method: INCREMENTAL
replication-key: date
date:
is-replication-key: true
taylor
12/06/2022, 8:59 PMmeltano invoke tap-snowflake
and see what the output looks likepeter_pezon
12/06/2022, 9:04 PMpat_nadolny
12/06/2022, 9:06 PMpat_nadolny
12/06/2022, 9:06 PMpat_nadolny
12/06/2022, 9:07 PMmeltano invoke tap-snowflake --state state.json
(I have to test that syntax)peter_pezon
12/06/2022, 9:14 PMpat_nadolny
12/06/2022, 9:18 PM{
"currently_syncing": null,
"bookmarks": {
"USERDEV_PROD-PNADOLNY_MELTANO_HUB-HUB_METRICS_AUDIT": {
"replication_key": "UPDATED_DATE",
"version": 1670361200499,
"replication_key_value": "2022-11-30T07:18:13.943169+00:00"
}
}
}
I just ran it without the --state
flag (my table has 2 rows) then copied the last state message into my state.json and edited from therepat_nadolny
12/06/2022, 9:18 PMmetadata:
'*':
replication-method: INCREMENTAL
replication-key: UPDATED_DATE
select:
- '*HUB_METRICS_AUDIT.UPDATED_DATE'
- '*HUB_METRICS_AUDIT.METRIC_TYPE'
pat_nadolny
12/06/2022, 9:19 PMpeter_pezon
12/06/2022, 9:38 PMmeltano run tap-snowflake target-jsonl
. Plugin loaded >50,000 records. Source table has >500mil records, so I cancelled it. Don't see a state.json file generatedpeter_pezon
12/06/2022, 9:48 PM% meltano state list
2022-12-06T21:46:30.166180Z [info ] Environment 'dev' is active
2022-12-06T21:46:30.233004Z [info ] No state IDs found.
The [docs](https://docs.meltano.com/guide/integration#incremental-replication-state) suggest running with --state-id
but seems that's an invalid flag:
meltano run tap-snowflake target-jsonl --state-id=tap-snowflake-target-jsonl-state-1
2022-12-06T21:45:00.094722Z [info ] Environment 'dev' is active
Usage: meltano run [OPTIONS] [BLOCKS]...
Try 'meltano run --help' for help.
Error: No such option: --state-id (Possible options: --no-state-update, --state-id-suffix)
pat_nadolny
12/06/2022, 10:01 PMmeltano state set dev:tap-snowflake-to-target-jsonl '{"singer_state": { "currently_syncing": null, "bookmarks": { "USERDEV_PROD-PNADOLNY_MELTANO_HUB-HUB_METRICS_AUDIT": { "replication_key": "UPDATED_DATE", "version": 1670361200499, "replication_key_value": "2022-11-30T07:18:13.943169+00:00" } } }}'
?pat_nadolny
12/06/2022, 10:03 PMmeltano elt
. Do your logs have any STATE
messages in them when you ran meltano invoke tap-snowflake
? Like for example mine looks like:cody_hanson
12/06/2022, 10:05 PMmeltano state set
cody_hanson
12/06/2022, 10:05 PMpeter_pezon
12/06/2022, 10:05 PMmeltano run
, meltano elt
, and meltano job
?peter_pezon
12/06/2022, 10:06 PMSTATE
messagespeter_pezon
12/06/2022, 10:08 PM2022-12-06T22:00:38.223765Z [debug ] {"type": "STATE", "value": {"bookmarks": {"sap-sales_invoice_v": {"starting_replication_value": null}}}} cmd_type=extractor name=tap-snowflake (out) run_id=6ab45b32-aaa7-4050-a2c7-9db30502ce0d state_id=2022-12-06T215744--tap-snowflake--target-jsonl stdio=stdout
it's output a few times but the value is the same.pat_nadolny
12/06/2022, 10:08 PMpat_nadolny
12/06/2022, 10:11 PMpeter_pezon
12/06/2022, 10:12 PMpeter_pezon
12/06/2022, 10:14 PMpat_nadolny
12/06/2022, 10:16 PMmeltano remove
then meltano add
will probably be the most sure fire way to get everything (python package, metadata, etc.) updated. It will remove your configs too so you might want to make a copy of those firstpeter_pezon
12/06/2022, 10:45 PMmeltano run tap-snowflake-x-y-z target-jsonl --state-id-suffix xyz
. It's stalled a bit. Looks like it's attempting to basically do select * from table order by date asc - on a table that's 500M rows. it's a bit mad.peter_pezon
12/06/2022, 10:45 PMpeter_pezon
12/06/2022, 10:47 PMpeter_pezon
12/06/2022, 10:59 PMSTATE
message:
{"type": "STATE", "value": {"currently_syncing": "XXX-YYY-ZZZ", "bookmarks": {"XXX-YYY-ZZZ":
{"replication_key": "DATE", "version": 1670366638536, "replication_key_value": "2016-08-12T00:00:00+00:00"}}}} cmd_type=elb consumer=False name=tap-snowflake-xxx-yyy-zzz producer=True stdio=stdout string_id=tap-snowflake-xxx-yyy-zzz
So, I can set the state with meltano state set tap-snowflake-xxx-yyy-zzz "{...}"
after changing the value of replication_key_value
. Can the date be arbitrary, or should it match a date value in the source table? Can the version be an arbitrary number?pat_nadolny
12/06/2022, 11:33 PMpeter_pezon
12/06/2022, 11:33 PM$ meltano state list
2022-12-06T23:33:06.916782Z [info ] Environment 'dev' is active
2022-12-06T23:33:06.985596Z [info ] No state IDs found.
$ meltano state set tap-snowflake-xxx-yyy-zzz-to-target-jsonlnum2 --input-file xxx-yyy-zzz-state.json
2022-12-06T23:33:11.169833Z [info ] Environment 'dev' is active
This will overwrite the state's current value. Continue? [y/N]: y
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.
singer_state not found in top level of provided state
pat_nadolny
12/06/2022, 11:36 PMmeltano state set dev:tap-snowflake-to-target-jsonl '{"singer_state": <INSERT_YOUR_STATE>}'
- see if that workspat_nadolny
12/06/2022, 11:37 PMdev:tap-snowflake-to-target-jsonl
is built from the plugin names youre running so if you use tap-snowflake-xyz you'll need to include that in the state IDpeter_pezon
12/06/2022, 11:46 PMpat_nadolny
12/06/2022, 11:47 PMpeter_pezon
12/06/2022, 11:50 PMstarting_starting_timestamp
, and singer_sdk suggests that start_date
is used if there is no state. https://github.com/meltano/sdk/blob/main/singer_sdk/streams/core.py#L227
i don't see where start_date
is actually used. is the context populated by the config block?peter_pezon
12/06/2022, 11:50 PMpeter_pezon
12/06/2022, 11:52 PMpeter_pezon
12/06/2022, 11:54 PMValueError: Could not detect type from empty type_dict. Did you forget to define a property in the stream schema?
pat_nadolny
12/06/2022, 11:55 PMpat_nadolny
12/06/2022, 11:59 PMstart_date
be supported if you'd like. I'm not sure if thats already a planned feature or notpeter_pezon
12/07/2022, 12:06 AMpeter_pezon
12/07/2022, 12:07 AM- name: tap-snowflake-xxx-yyy
inherit_from: tap-snowflake
config:
start_date: "2022-12-01T00:00:00+00:00"
select:
- xxx-yyy.*
metadata:
"*":
replication-method: INCREMENTAL
replication-key: z_date
z_date:
is-replication-key: true
peter_pezon
12/07/2022, 12:10 AMFile "/x/y/z/.meltano/extractors/tap-snowflake/venv/lib/python3.10/site-packages/snowflake/connector/errors.py", line 209, in default_errorhandler
sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 252004: 252004: Failed processing pyformat-parameters: 255001: 255001: Binding data in type (column) is not supported.
[SQL: SELECT ...
FROM xxx.yyy
WHERE %(replication_key)s >= %(start_val)s ORDER BY xxx.yyy.z_date]
[parameters: {'replication_key': Column('z_date', DATE(), table=<yyy>), 'start_val': '2022-12-01T00:00:00+00:00'}]
(Background on this error at: <https://sqlalche.me/e/14/f405>)
peter_pezon
12/07/2022, 4:48 PMmeltano state set dev:tap-snowflake-xyz-to-target-jsonl '{"singer_state": { "currently_syncing": null, "bookmarks": { "XYZ": { "replication_key": "Z_DATE", "version": 1670361200499, "replication_key_value": "2022-11-30T07:18:13.943169+00:00" } } }}'
Run ELT:
meltano --log-level=debug elt tap-snowflake-xyz target-jsonl --state-id dev:tap-snowflake-xyz-to-target-jsonl
peter_pezon
12/07/2022, 4:49 PMpat_nadolny
12/07/2022, 4:52 PMrun
instead of elt
eventually. Yeah that workflow wasnt very clean, having a start_date would have solved it in a way simpler way.peter_pezon
12/07/2022, 4:53 PMrun
? instead of --state-id
there is --state-id-suffix
peter_pezon
12/07/2022, 4:55 PMWhen running within an active environment, meltano run activates
incremental job support. State ID's are autogenerated using the
format
`{active_environment.name}:{extractor_name}-to-{loader_name}(:{state-id-
suffix})` for each extract/load pair found:
`meltano --environment=prod run tap-gitlab target-postgres tap-
salesforce target-mysql`
peter_pezon
12/07/2022, 4:57 PMBATCH
. I can try this same thing again. It looks like start_date
is somewhat supported - at least, there is an attempt to fill in replication_key_value
with start_date
, but there's an error on query compilepat_nadolny
12/07/2022, 4:58 PM