Hello I'm using `tap-jira` to retrieve data and ...
# troubleshooting
a
Hello I'm using
tap-jira
to retrieve data and
target-s3-csv
to load the data into the S3 I'm having the the following issue: Imagine that I have
column_a, column_b, column_c
coming from a
tap_stream_id
Copy code
INPUT:
row_1: 
1, 2, 3

row_2: (coming with empty value on column_b)
1,3

OUTPUT:
column_a, column_b, column_c
1, 2, 3 
1,3,
Which leads to columns mismatch and integrity issues. Is there a way of configuring Meltano to set the column_b missing value to '' (empty string), or null value? Or how should we handle these cases? Thanks in advance.
what comes out of tap
Copy code
{"type": "RECORD", "stream": "fields", "record": {"id": "statuscategorychangedate", "key": "statuscategorychangedate", "name": "Status Category Changed", "custom": false, "searchable": true, "schema": {"type": "datetime", "system": "statuscategorychangedate"}}, "time_extracted": "2024-04-02T17:58:28.410891+00:00"}

{"type": "RECORD", "stream": "fields", "record": {"id": "parent", "key": "parent", "name": "Parent", "custom": false, "searchable": false}, "time_extracted": "2024-04-02T17:58:28.411065+00:00"}
what is inserted on .csv
Copy code
custom,id,key,name,schema__system,schema__type,searchable

False,statuscategorychangedate,statuscategorychangedate,Status Category Changed,statuscategorychangedate,datetime,True

False,parent,parent,Parent,False
which is incorrect. on the second row it skipped the columns that had no values
Anyone? 🙂
r
Can you try with
target-jsonl
?
a
So, I've changed from using target-s3-csv, to target-s3 and configure it to write .parquet files Now I'm having other issues, I'll come back here later. But thanks a lot for the help @Reuben (Matatika) 😄
👀 1
I'm running:
Copy code
meltano --log-level=debug el tap-jira target-s3
Copy code
- name: tap-jira
        config:
          auth:
            flow: password
            username: ____
          domain: ____
        select:
        # - issues.*
        - fields.* 
        
      loaders:
      - name: target-s3-csv
        config:
          s3_bucket: __-__
      - name: target-s3
        config:
          format:
            format_type: parquet
          prefix: raw
          flattening_enabled: true
          flattening_max_depth: 1
When I run with - fields.* with works with no issues When I run with - issues.* it gives me the following error:
Copy code
2024-04-03T15:53:55.348524Z [info     ] Failed validating 'type' in schema['properties']['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348596Z [info     ]     {'type': ['string', 'null']} cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348660Z [info     ]                                cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348723Z [info     ] On instance['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348784Z [info     ]     Decimal('3.0')             cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
It looks like the problem is on
fields__customfield_10033
column
Copy code
2024-04-03T16:07:51.336465Z [debug    ] Created configuration at /Users/___/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json
with
Copy code
"customfield_10033": {
                "type": [
                  "string",
                  "null"
                ]
              },
Copy code
2024-04-03T16:07:51.369129Z [debug    ] Invoking: ['/Users/afonso/Projects/data-integrations/.meltano/extractors/tap-jira/venv/bin/tap-jira', '--config', '/Users/afonso/Projects/data-integrations/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json', '--discover']
This is also present on the logs Why does it assume that is a String? How meltano handles this? What options do I have here? It is strange why only some rows get errors
r
You might be able to use stream maps to coerce it to a string (you will possibly have to flatten on extract to avoid this issue):
Copy code
config:
  stream_maps:
    fields:
      fields__customfield_10033: str(fields__customfield_10033)
a
ok I'm already flattening the columns
flattening_enabled: true
flattening_max_depth: 1
but thanks a lot Reuben. I'll take a look on that documentation and i'll get back to you 🙂
but maybe I'll have to add more max_depth levels
r
Yes, but you will need to flatten before the mappings are applied.
a
Ah yes, the flattening I have is only on the target step, got it.
👍 1
Thanks!
r
Good luck! 😅
a
So, I have not followed that, since that field is not 100% needed. So I just did not select id 🙂 I'm having a new issue now that is: The tap has data on it but I'm not getting the records on the buckets. This is the log the elt command:
Copy code
{
  "type": "SCHEMA",
  "stream": "issue_worklogs",
  "schema": {
    "properties": {
      "id": {
        "type": [
          "string",
          "null"
        ]
      },
      "self": {
        "type": [
          "string",
          "null"
        ]
      },
      "author": {
        "properties": {
          "accountId": {
            "type": [
              "string",
              "null"
            ]
          },
          "self": {
            "type": [
              "string",
              "null"
            ]
          },
          "displayName": {
            "type": [
              "string",
              "null"
            ]
          },
          "active": {
            "type": [
              "boolean",
              "null"
            ]
          }
        },
        "type": [
          "object",
          "null"
        ]
      },
      "updateAuthor": {
        "properties": {
          "accountId": {
            "type": [
              "string",
              "null"
            ]
          },
          "self": {
            "type": [
              "string",
              "null"
            ]
          },
          "displayName": {
            "type": [
              "string",
              "null"
            ]
          },
          "active": {
            "type": [
              "boolean",
              "null"
            ]
          }
        },
        "type": [
          "object",
          "null"
        ]
      },
      "updated": {
        "format": "date-time",
        "type": [
          "string",
          "null"
        ]
      },
      "started": {
        "format": "date-time",
        "type": [
          "string",
          "null"
        ]
      },
      "timeSpentSeconds": {
        "type": [
          "integer",
          "null"
        ]
      },
      "issueId": {
        "type": [
          "string",
          "null"
        ]
      }
    },
    "type": "object"
  },
  "key_properties": [
    "id"
  ]
}
Here I'm only getting SCHEMA and STATE type's, and other 'working' streams I'm also getting 'RECORD' Do you have any idea why this is happening? It looks like it has data on the tap, but not populating the S3 bucket. State example
catalog.json
Copy code
"tap_stream_id": "issue_worklogs",
      "replication_method": "FULL_TABLE",
      "key_properties": [
        "id"
      ],
but then on the logs I'm getting
Copy code
{"type": "STATE", "value": {"bookmarks": {"issues": {"starting_replication_value": null}
how does Meltano interpret when starting_replication_value": null? and why is it set this way? since is set as FULL_TABLE
r
If you are seeing
RECORD
messages but no data, then it's probably a configuration issue with the loader.
a
Sorry, I'm not seing
RECORD
. I'm only seing
STATE
or
SCHEMA
r
I would try with
target-jsonl
to confirm - you can also just
meltano invoke tap-jira
and inspect the output for
RECORD
messages, rather than combing through the output of
meltano elt
(which probably has a lot of logs you don't care about in this case).
a
Ok, I'll try it now. Thanks a lot again @Reuben (Matatika)
r
I think
"starting_replication_value": null
is just a way of indicating no state, which is implied by
FULL_TABLE
.
a
meltano invoke tap-jira
I've tried this but no
RECORD
type written to console I'll now try the
target-jsonl
@Reuben (Matatika)
meltano run tap-jira target-jsonl
also does not output
RECORD
What else can I check here?
r
So probably something is wrong with your config, can you share it?
a
hello @Reuben (Matatika) I then realized that there was no data present there. The stakeholder was confused on the column /object name 😄
👍 1