Afonso Diniz
04/02/2024, 5:27 PMtap-jira
to retrieve data and target-s3-csv
to load the data into the S3
I'm having the the following issue:
Imagine that I have column_a, column_b, column_c
coming from a tap_stream_id
INPUT:
row_1:
1, 2, 3
row_2: (coming with empty value on column_b)
1,3
OUTPUT:
column_a, column_b, column_c
1, 2, 3
1,3,
Which leads to columns mismatch and integrity issues.
Is there a way of configuring Meltano to set the column_b missing value to '' (empty string), or null value?
Or how should we handle these cases?
Thanks in advance.Afonso Diniz
04/02/2024, 6:03 PM{"type": "RECORD", "stream": "fields", "record": {"id": "statuscategorychangedate", "key": "statuscategorychangedate", "name": "Status Category Changed", "custom": false, "searchable": true, "schema": {"type": "datetime", "system": "statuscategorychangedate"}}, "time_extracted": "2024-04-02T17:58:28.410891+00:00"}
{"type": "RECORD", "stream": "fields", "record": {"id": "parent", "key": "parent", "name": "Parent", "custom": false, "searchable": false}, "time_extracted": "2024-04-02T17:58:28.411065+00:00"}
what is inserted on .csv
custom,id,key,name,schema__system,schema__type,searchable
False,statuscategorychangedate,statuscategorychangedate,Status Category Changed,statuscategorychangedate,datetime,True
False,parent,parent,Parent,False
which is incorrect. on the second row it skipped the columns that had no valuesAfonso Diniz
04/03/2024, 8:28 AMReuben (Matatika)
04/03/2024, 8:36 AMtarget-jsonl
?Afonso Diniz
04/03/2024, 2:54 PMAfonso Diniz
04/03/2024, 4:04 PMmeltano --log-level=debug el tap-jira target-s3
- name: tap-jira
config:
auth:
flow: password
username: ____
domain: ____
select:
# - issues.*
- fields.*
loaders:
- name: target-s3-csv
config:
s3_bucket: __-__
- name: target-s3
config:
format:
format_type: parquet
prefix: raw
flattening_enabled: true
flattening_max_depth: 1
When I run with - fields.* with works with no issues
When I run with - issues.* it gives me the following error:
2024-04-03T15:53:55.348524Z [info ] Failed validating 'type' in schema['properties']['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
2024-04-03T15:53:55.348596Z [info ] {'type': ['string', 'null']} cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
2024-04-03T15:53:55.348660Z [info ] cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
2024-04-03T15:53:55.348723Z [info ] On instance['fields__customfield_10033']: cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
2024-04-03T15:53:55.348784Z [info ] Decimal('3.0') cmd_type=loader name=target-s3 run_id=a49fd585-88cc-4794-a733-3ad02c3b893d state_id=2024-04-03T154910--tap-jira--target-s3 stdio=stderr
It looks like the problem is on fields__customfield_10033
column
2024-04-03T16:07:51.336465Z [debug ] Created configuration at /Users/___/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json
with
"customfield_10033": {
"type": [
"string",
"null"
]
},
2024-04-03T16:07:51.369129Z [debug ] Invoking: ['/Users/afonso/Projects/data-integrations/.meltano/extractors/tap-jira/venv/bin/tap-jira', '--config', '/Users/afonso/Projects/data-integrations/.meltano/run/elt/2024-04-03T160751--tap-jira--target-s3/e409a3d3-a5b9-46c6-9c02-9e2bb8fb9e9b/tap.53c45948-a14c-4d40-afb6-92942b253cf1.config.json', '--discover']
This is also present on the logs
Why does it assume that is a String? How meltano handles this?
What options do I have here? It is strange why only some rows get errorsReuben (Matatika)
04/03/2024, 5:16 PMconfig:
stream_maps:
fields:
fields__customfield_10033: str(fields__customfield_10033)
Afonso Diniz
04/03/2024, 5:21 PMflattening_enabled: true
flattening_max_depth: 1
but thanks a lot Reuben. I'll take a look on that documentation and i'll get back to you 🙂Afonso Diniz
04/03/2024, 5:23 PMReuben (Matatika)
04/03/2024, 5:23 PMAfonso Diniz
04/03/2024, 5:24 PMAfonso Diniz
04/03/2024, 5:24 PMReuben (Matatika)
04/03/2024, 5:25 PMAfonso Diniz
04/04/2024, 8:42 AM{
"type": "SCHEMA",
"stream": "issue_worklogs",
"schema": {
"properties": {
"id": {
"type": [
"string",
"null"
]
},
"self": {
"type": [
"string",
"null"
]
},
"author": {
"properties": {
"accountId": {
"type": [
"string",
"null"
]
},
"self": {
"type": [
"string",
"null"
]
},
"displayName": {
"type": [
"string",
"null"
]
},
"active": {
"type": [
"boolean",
"null"
]
}
},
"type": [
"object",
"null"
]
},
"updateAuthor": {
"properties": {
"accountId": {
"type": [
"string",
"null"
]
},
"self": {
"type": [
"string",
"null"
]
},
"displayName": {
"type": [
"string",
"null"
]
},
"active": {
"type": [
"boolean",
"null"
]
}
},
"type": [
"object",
"null"
]
},
"updated": {
"format": "date-time",
"type": [
"string",
"null"
]
},
"started": {
"format": "date-time",
"type": [
"string",
"null"
]
},
"timeSpentSeconds": {
"type": [
"integer",
"null"
]
},
"issueId": {
"type": [
"string",
"null"
]
}
},
"type": "object"
},
"key_properties": [
"id"
]
}
Here I'm only getting SCHEMA and STATE type's, and other 'working' streams I'm also getting
'RECORD'
Do you have any idea why this is happening? It looks like it has data on the tap, but not populating the S3 bucket.
State exampleAfonso Diniz
04/04/2024, 9:00 AM"tap_stream_id": "issue_worklogs",
"replication_method": "FULL_TABLE",
"key_properties": [
"id"
],
but then on the logs I'm getting
{"type": "STATE", "value": {"bookmarks": {"issues": {"starting_replication_value": null}
how does Meltano interpret when starting_replication_value": null? and why is it set this way? since is set as FULL_TABLEReuben (Matatika)
04/04/2024, 12:26 PMRECORD
messages but no data, then it's probably a configuration issue with the loader.Afonso Diniz
04/04/2024, 12:26 PMRECORD
.
I'm only seing STATE
or SCHEMA
Reuben (Matatika)
04/04/2024, 12:29 PMtarget-jsonl
to confirm - you can also just meltano invoke tap-jira
and inspect the output for RECORD
messages, rather than combing through the output of meltano elt
(which probably has a lot of logs you don't care about in this case).Afonso Diniz
04/04/2024, 12:38 PMReuben (Matatika)
04/04/2024, 12:50 PM"starting_replication_value": null
is just a way of indicating no state, which is implied by FULL_TABLE
.Afonso Diniz
04/04/2024, 4:32 PMmeltano invoke tap-jira
I've tried this but no RECORD
type written to console
I'll now try the target-jsonl
Afonso Diniz
04/04/2024, 4:32 PMAfonso Diniz
04/04/2024, 4:36 PMmeltano run tap-jira target-jsonl
also does not output RECORD
What else can I check here?Reuben (Matatika)
04/04/2024, 5:29 PMAfonso Diniz
04/05/2024, 4:33 PM