jan_soubusta
08/05/2024, 5:51 AM!*.fields__customfield_10002
).
Is it possible to exclude columns nested in such a JSON?Edgar Ramírez (Arch.dev)
08/05/2024, 11:49 PMschema
, then excluding it with select
? Kind of like the example in https://github.com/edgarrmondragon/tap-zendesk/?tab=readme-ov-file#selected-custom-fields.jan_soubusta
08/06/2024, 11:43 AMEdgar Ramírez (Arch.dev)
08/06/2024, 2:00 PMjan_soubusta
08/07/2024, 3:47 PMplugins:
extractors:
- name: tap-jira
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/tap-jira.git>
config:
domain: <http://gooddata.atlassian.net|gooddata.atlassian.net>
auth:
flow: password
start_date: 2024-07-01
flattening_enabled: True
flattening_max_depth: 1
page_size:
issues: 100
# Add custom fields to the schema
schema:
issues:
fields:
type: object
properties:
customfield_10002:
type: [array, ["null"]]
select:
- issues.*
- issues.self
- issues.id
- issues.key
- issues.fields
- "!issues.fields.customfield_11322"
customfield_11322 is not excluded.
Moreover, tap-jira produces a huge debug to STDOUT mentioning fields which are not a part of the catalog, e.g:
2024-08-07T15:43:09.861648Z [info ] 2024-08-07 17:43:09,861 | WARNING | tap-jira.issues | Properties ('fields.customfield_10990', 'fields.customfield_10873', 'fields.customfield_10742', ..........
were present in the 'issues' stream but not found in catalog schema. Ignoring.
It overflows Github actions(parsing STDOUT), it's really annoying.
I tried to define all these fields in schema.
Meltano startup time increased drastically and then it failed with:
2024-08-07T15:45:52.343589Z [info ] Environment 'cicd_dev_local' is active
2024-08-07T15:46:13.796148Z [info ] Performing full refresh, ignoring state left behind by any previous runs.
.......
2024-08-07T15:46:14.953288Z [info ] if next(iter(field_schema.values()))[0]["type"] == "string": cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:14.953489Z [info ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:14.953665Z [info ] KeyError: 0 cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:15.113255Z [error ] Extractor failed
2024-08-07T15:46:15.113648Z [error ] Block run completed. block_type=ExtractLoadBlocks err=RunnerError('Extractor failed') exit_codes={<PluginType.EXTRACTORS: 'extractors'>: 1} set_number=0 success=False
Maybe I defined the type for some customfield wrongly, I don't know. There are hundreds, it's not possible to maintain everything manually.
Any guidance is more than welcome 🙏