Hey, tap-jira stopped to work for me because one c...
# troubleshooting
j
Hey, tap-jira stopped to work for me because one custom_field in our Jira started to contain an unsupported value (field 10002, expected number, got array). Fields column is exposed from Jira as a huge JSON. It can be flattened. No matter how I consume it, the validation for field 10002 fails. I don't want to consume this column. I can't find out how to exclude it, no pattern works (e.g.
!*.fields__customfield_10002
). Is it possible to exclude columns nested in such a JSON?
e
Have you tried declaring the field in
schema
, then excluding it with
select
? Kind of like the example in https://github.com/edgarrmondragon/tap-zendesk/?tab=readme-ov-file#selected-custom-fields.
j
Interesting! Will try it, thanks! Btw I use tap-jira based on Meltano SDK.
e
Yeah, then something like the above should work AFAICT. But if not let me know.
j
Does not work. Defining the schema helps in the way it no longer fails with error. But, no field is excluded, no matter how I specify the exclude. For example:
Copy code
plugins:
  extractors:
    - name: tap-jira
      variant: meltanolabs
      pip_url: git+<https://github.com/MeltanoLabs/tap-jira.git>
      config:
        domain: <http://gooddata.atlassian.net|gooddata.atlassian.net>
        auth:
          flow: password
        start_date: 2024-07-01
        flattening_enabled: True
        flattening_max_depth: 1
        page_size:
          issues: 100

      # Add custom fields to the schema
      schema:
        issues:
          fields:
            type: object
            properties:
              customfield_10002:
                type: [array, ["null"]]

      select:
        - issues.*
        - issues.self
        - issues.id
        - issues.key
        - issues.fields
        - "!issues.fields.customfield_11322"
customfield_11322 is not excluded. Moreover, tap-jira produces a huge debug to STDOUT mentioning fields which are not a part of the catalog, e.g:
Copy code
2024-08-07T15:43:09.861648Z [info     ] 2024-08-07 17:43:09,861 | WARNING  | tap-jira.issues      | Properties ('fields.customfield_10990', 'fields.customfield_10873', 'fields.customfield_10742', ..........
were present in the 'issues' stream but not found in catalog schema. Ignoring.
It overflows Github actions(parsing STDOUT), it's really annoying. I tried to define all these fields in schema. Meltano startup time increased drastically and then it failed with:
Copy code
2024-08-07T15:45:52.343589Z [info     ] Environment 'cicd_dev_local' is active
2024-08-07T15:46:13.796148Z [info     ] Performing full refresh, ignoring state left behind by any previous runs.
.......
2024-08-07T15:46:14.953288Z [info     ]     if next(iter(field_schema.values()))[0]["type"] == "string": cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:14.953489Z [info     ]        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:14.953665Z [info     ] KeyError: 0                    cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
2024-08-07T15:46:15.113255Z [error    ] Extractor failed              
2024-08-07T15:46:15.113648Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Extractor failed') exit_codes={<PluginType.EXTRACTORS: 'extractors'>: 1} set_number=0 success=False
Maybe I defined the type for some customfield wrongly, I don't know. There are hundreds, it's not possible to maintain everything manually. Any guidance is more than welcome 🙏