chris_schmid
09/06/2023, 4:34 PM"dbt_version": {
"type": "string"
}
The config above results in an error whenever meltano encounters a null value in that field, like the below run:
```2023-09-06T160752.170412Z [info ] 2023-09-06 160752,169 | INFO | singer_sdk.metrics | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 0.229858, "tags": {"stream": "jobs", "endpoint": "/accounts/{account_id}/jobs", "http_status_code": 200, "status": "succeeded", "context": {"account_id": "10"}}} cmd_type=elb consumer=False name=tap-dbt producer=True stdio=stderr string_id=tap-dbt
2023-09-06T160752.209022Z [info ] 2023-09-06 160752,208 | WARNING | tap-dbt | Properties ('execution', 'run_generate_sources', 'raw_dbt_version', 'created_at', 'updated_at', 'deactivated', 'run_failure_count', 'deferring_job_definition_id', 'deferring_environment_id', 'lifecycle_webhooks', 'lifecycle_webhooks_url', 'is_deferrable', 'job_type', 'triggers_on_draft_pr', 'generate_sources', 'cron_humanized', 'next_run', 'next_run_humanized') were present in the 'jobs' stream but not found in catalog schema. Ignoring. cmd_type=elb consumer=False name=tap-dbt producer=True stdio=stderr string_id=tap-dbt
2023-09-06T160752.210651Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.211157Z [info ] File "/project/.meltano/loaders/target-jsonl/venv/bin/target-jsonl", line 8, in <module> cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.211339Z [info ] sys.exit(main()) cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.211741Z [info ] File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.9/site-packages/target_jsonl.py", line 92, in main cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.212592Z [info ] state = persist_messages( cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.212920Z [info ] File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.9/site-packages/target_jsonl.py", line 54, in persist_messages cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.213027Z [info ] validators[o['stream']].validate((o['record'])) cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.213166Z [info ] File "/project/.meltano/loaders/target-jsonl/venv/lib/python3.9/site-packages/jsonschema/validators.py", line 130, in validate cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.213307Z [info ] raise error cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.213450Z [info ] jsonschema.exceptions.ValidationError: None is not of type 'string' cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.213837Z [info ] cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-09-06T160752.214001Z [info ] Failed validating 'type' in schema['propert…chris_schmid
09/06/2023, 4:36 PMfinished_at as the replication key. However this field is sometimes NULL for our runs.
Are there any workarounds for this, aside from tweaking our local code to replicate the runs table with full table replication?edgar_ramirez_mondragon
09/08/2023, 7:39 PM1. Allowing NULL types globallyWe probably need to either go and manually the
"null" type in the schemas, or refactor them to use the SDK typing helpers, which make fields nullable by default.
2. Incremental replication where the replication key is sometimes null:I'm not sure what's the best approach here and whether it's really unexpected that
finished_at is null, or it can't be used as a replication key.
Regardless, PRs are more than welcome!chris_schmid
09/08/2023, 8:49 PMedgar_ramirez_mondragon
09/08/2023, 11:03 PMis there a way to inject logic into the metadata file from the meltano.yml file that replaces all instances of "type": string to "type": ["string", "null"]?There's no way to replace all instances of a type, but you can use the schema setting to override the types.
edgar_ramirez_mondragon
09/08/2023, 11:08 PMmark_johnston
09/10/2023, 11:16 PMmark_johnston
09/10/2023, 11:22 PMupdated_at - the API does not allow ordering by that key, so it is not possible to perform the reverse-sort method that I'm describing in the README.md - I will update with a more detailed description on:
https://github.com/MeltanoLabs/tap-dbt/issues/213mark_johnston
09/10/2023, 11:40 PMfinished_at works, I think all records with finished_at=null would be extracted first. This might lead to a lot of records being extracted repeatedly if you have a lot of 'runs' where that is the case.
The example in the README.md illustrates how the replication works:
https://github.com/MeltanoLabs/tap-dbt#incremental-run-stream
Trying to work out how the null values coming through are causing issues in the replication so we can work on a fix.edgar_ramirez_mondragon
09/11/2023, 2:22 PMnull fix.chris_schmid
09/11/2023, 7:44 PMchris_schmid
09/11/2023, 7:44 PMmark_johnston
09/11/2023, 7:59 PMfinished_at=null forever?
If they do stay like that forever then the way the incremental method is written, the set of runs where finished_at=null will appear first on each run, but you could deduplicate them based on records you've already 'seen' by id - we do this with target-snowflake using id as the primary_key.As you say, updated_at would be the ideal field to use for replication_key, thanks for reaching out to dbt - keep us updated if they come back with anything.chris_schmid
09/22/2023, 2:26 PMmark_johnston
09/25/2023, 9:10 AMmark_johnston
09/27/2023, 8:36 PMchris_schmid
10/03/2023, 3:15 PMedgar_ramirez_mondragon
10/03/2023, 3:37 PM