Hello I m using `tap jira` to retrieve data and `target s3 c Meltano #troubleshooting

Hello I'm using `tap-jira` to retrieve data and ...

Afonso Diniz

04/02/2024, 5:27 PM

Hello I'm using

tap-jira

to retrieve data and

target-s3-csv

to load the data into the S3 I'm having the the following issue: Imagine that I have

column_a, column_b, column_c

coming from a

tap_stream_id

Copy code

INPUT:
row_1: 
1, 2, 3

row_2: (coming with empty value on column_b)
1,3

OUTPUT:
column_a, column_b, column_c
1, 2, 3 
1,3,

Which leads to columns mismatch and integrity issues. Is there a way of configuring Meltano to set the column_b missing value to '' (empty string), or null value? Or how should we handle these cases? Thanks in advance.

Afonso Diniz

04/02/2024, 6:03 PM

what comes out of tap

Copy code

{"type": "RECORD", "stream": "fields", "record": {"id": "statuscategorychangedate", "key": "statuscategorychangedate", "name": "Status Category Changed", "custom": false, "searchable": true, "schema": {"type": "datetime", "system": "statuscategorychangedate"}}, "time_extracted": "2024-04-02T17:58:28.410891+00:00"}

{"type": "RECORD", "stream": "fields", "record": {"id": "parent", "key": "parent", "name": "Parent", "custom": false, "searchable": false}, "time_extracted": "2024-04-02T17:58:28.411065+00:00"}

what is inserted on .csv

Copy code

custom,id,key,name,schema__system,schema__type,searchable

False,statuscategorychangedate,statuscategorychangedate,Status Category Changed,statuscategorychangedate,datetime,True

False,parent,parent,Parent,False

which is incorrect. on the second row it skipped the columns that had no values

Afonso Diniz

04/03/2024, 8:28 AM

Anyone? 🙂

Reuben (Matatika)

04/03/2024, 8:36 AM

Can you try with

target-jsonl

Afonso Diniz

04/03/2024, 2:54 PM

So, I've changed from using target-s3-csv, to target-s3 and configure it to write .parquet files Now I'm having other issues, I'll come back here later. But thanks a lot for the help @Reuben (Matatika) 😄

👀 1

Afonso Diniz

04/03/2024, 4:04 PM

I'm running:

Copy code

meltano --log-level=debug el tap-jira target-s3

Copy code

- name: tap-jira
        config:
          auth:
            flow: password
            username: ____
          domain: ____
        select:
        # - issues.*
        - fields.* 
        
      loaders:
      - name: target-s3-csv
        config:
          s3_bucket: __-__
      - name: target-s3
        config:
          format:
            format_type: parquet
          prefix: raw
          flattening_enabled: true
          flattening_max_depth: 1

When I run with - fields.* with works with no issues When I run with - issues.* it gives me the following error:

Copy code

2024-04-03T15:53:55.348524Z [info     ] Failed validating 'type' in schema['properties']['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348596Z [info     ]     {'type': ['string', 'null']} cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348660Z [info     ]                                cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348723Z [info     ] On instance['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

2024-04-03T15:53:55.348784Z [info     ]     Decimal('3.0')             cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr

It looks like the problem is on

fields__customfield_10033

column

Copy code

2024-04-03T16:07:51.336465Z [debug    ] Created configuration at /Users/___/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json

with

Copy code

"customfield_10033": {
                "type": [
                  "string",
                  "null"
                ]
              },

Copy code

2024-04-03T16:07:51.369129Z [debug    ] Invoking: ['/Users/afonso/Projects/data-integrations/.meltano/extractors/tap-jira/venv/bin/tap-jira', '--config', '/Users/afonso/Projects/data-integrations/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json', '--discover']

This is also present on the logs Why does it assume that is a String? How meltano handles this? What options do I have here? It is strange why only some rows get errors

Reuben (Matatika)

04/03/2024, 5:16 PM

You might be able to use stream maps to coerce it to a string (you will possibly have to flatten on extract to avoid this issue):

Copy code

config:
  stream_maps:
    fields:
      fields__customfield_10033: str(fields__customfield_10033)

Afonso Diniz

04/03/2024, 5:21 PM

ok I'm already flattening the columns

flattening_enabled: true

flattening_max_depth: 1

but thanks a lot Reuben. I'll take a look on that documentation and i'll get back to you 🙂

Afonso Diniz

04/03/2024, 5:23 PM

but maybe I'll have to add more max_depth levels

Reuben (Matatika)

04/03/2024, 5:23 PM

Yes, but you will need to flatten before the mappings are applied.

Afonso Diniz

04/03/2024, 5:24 PM

Ah yes, the flattening I have is only on the target step, got it.

👍 1

Afonso Diniz

04/03/2024, 5:24 PM

Thanks!

Reuben (Matatika)

04/03/2024, 5:25 PM

Good luck! 😅

Afonso Diniz

04/04/2024, 8:42 AM

So, I have not followed that, since that field is not 100% needed. So I just did not select id 🙂 I'm having a new issue now that is: The tap has data on it but I'm not getting the records on the buckets. This is the log the elt command:

Copy code

{
  "type": "SCHEMA",
  "stream": "issue_worklogs",
  "schema": {
    "properties": {
      "id": {
        "type": [
          "string",
          "null"
        ]
      },
      "self": {
        "type": [
          "string",
          "null"
        ]
      },
      "author": {
        "properties": {
          "accountId": {
            "type": [
              "string",
              "null"
            ]
          },
          "self": {
            "type": [
              "string",
              "null"
            ]
          },
          "displayName": {
            "type": [
              "string",
              "null"
            ]
          },
          "active": {
            "type": [
              "boolean",
              "null"
            ]
          }
        },
        "type": [
          "object",
          "null"
        ]
      },
      "updateAuthor": {
        "properties": {
          "accountId": {
            "type": [
              "string",
              "null"
            ]
          },
          "self": {
            "type": [
              "string",
              "null"
            ]
          },
          "displayName": {
            "type": [
              "string",
              "null"
            ]
          },
          "active": {
            "type": [
              "boolean",
              "null"
            ]
          }
        },
        "type": [
          "object",
          "null"
        ]
      },
      "updated": {
        "format": "date-time",
        "type": [
          "string",
          "null"
        ]
      },
      "started": {
        "format": "date-time",
        "type": [
          "string",
          "null"
        ]
      },
      "timeSpentSeconds": {
        "type": [
          "integer",
          "null"
        ]
      },
      "issueId": {
        "type": [
          "string",
          "null"
        ]
      }
    },
    "type": "object"
  },
  "key_properties": [
    "id"
  ]
}

Here I'm only getting SCHEMA and STATE type's, and other 'working' streams I'm also getting 'RECORD' Do you have any idea why this is happening? It looks like it has data on the tap, but not populating the S3 bucket. State example

Afonso Diniz

04/04/2024, 9:00 AM

catalog.json

Copy code

"tap_stream_id": "issue_worklogs",
      "replication_method": "FULL_TABLE",
      "key_properties": [
        "id"
      ],

but then on the logs I'm getting

Copy code

{"type": "STATE", "value": {"bookmarks": {"issues": {"starting_replication_value": null}

how does Meltano interpret when starting_replication_value": null? and why is it set this way? since is set as FULL_TABLE

Reuben (Matatika)

04/04/2024, 12:26 PM

If you are seeing

RECORD

messages but no data, then it's probably a configuration issue with the loader.

Afonso Diniz

04/04/2024, 12:26 PM

Sorry, I'm not seing

RECORD

. I'm only seing

STATE

SCHEMA

Reuben (Matatika)

04/04/2024, 12:29 PM

I would try with

target-jsonl

to confirm - you can also just

meltano invoke tap-jira

and inspect the output for

RECORD

messages, rather than combing through the output of

meltano elt

(which probably has a lot of logs you don't care about in this case).

Afonso Diniz

04/04/2024, 12:38 PM

Ok, I'll try it now. Thanks a lot again @Reuben (Matatika)

Reuben (Matatika)

04/04/2024, 12:50 PM

I think

"starting_replication_value": null

is just a way of indicating no state, which is implied by

FULL_TABLE

Afonso Diniz

04/04/2024, 4:32 PM

meltano invoke tap-jira

I've tried this but no

RECORD

type written to console I'll now try the

target-jsonl

Afonso Diniz

04/04/2024, 4:32 PM

@Reuben (Matatika)

Afonso Diniz

04/04/2024, 4:36 PM

meltano run tap-jira target-jsonl

also does not output

RECORD

What else can I check here?

Reuben (Matatika)

04/04/2024, 5:29 PM

So probably something is wrong with your config, can you share it?

Afonso Diniz

04/05/2024, 4:33 PM

hello @Reuben (Matatika) I then realized that there was no data present there. The stakeholder was confused on the column /object name 😄

👍 1

13 Views

Open in Slack

Previous Next