Afonso Diniz
04/07/2024, 3:03 PM- name: tap-jira
config:
auth:
flow: password
username: karel@rauva.com
domain: rauva.atlassian.net
stream_maps:
issues:
__filter__: key.startswith('DATA')
updated_test: fields.updated
select:
- issues.key
- issues.fields
- issues.fields.updated
- updated_test
metadata:
issues:
replication-method: INCREMENTAL
# replication-key: updated_test # no
# replication-key: fields.updated # no
# replication-key: fields__updated # no
replication-key: issues.fields.updated # no
issues.fields.updated - is the field I want to use to set the state of the loading.
Initially I've tried with this: replication-key: issues.fields.updated
2024-04-07T14:59:54.185389Z [info ] singer_sdk.exceptions.InvalidReplicationKeyException: Field 'issues.fields.updated' is not in schema for stream 'issues' cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
Then I thought that the problem was that the column I'm trying to use to the INCREMENTAL
loading is not flattened.
To fix that I've created a new column named updated_test
(using stream_maps
).
When I select it (without the metadata step), I'm getting exactly what I wanted. A copy of fields.updated
column
But then when I try to use that new column on metadata step, I get the same error as before :
2024-04-07T14:55:32.185389Z [info ] singer_sdk.exceptions.InvalidReplicationKeyException: Field 'updated_test' is not in schema for stream 'issues' cmd_type=elb consumer=False name=tap-jira producer=True stdio=stderr string_id=tap-jira
What am I doing wrongly here? Is the creation of the column not being correctly done? If not, how should it be done?
And, is there a way to set replication-key to a not flattened column?
Let me know.
Thanks in advance, all the help is welcomed 🙂Reuben (Matatika)
04/07/2024, 7:25 PMupdated_test
that isn't defined in the tap issue
stream schema and are trying to set it as the replication key. You probably want to provide a schema override for updated_test
.Afonso Diniz
04/07/2024, 7:42 PMupdated_test
, then with the stream_map
assign
*updated_test:* fields.updated
, then applying the replication-key.
I'll keep you posted.
Thanks a lot for the helpAfonso Diniz
04/07/2024, 7:58 PMAfonso Diniz
04/07/2024, 8:17 PM- name: tap-jira
schema:
issues:
updated_test_1:
type: ["string", "null"]
updated_test_2:
type: ["string", "null"]
config:
stream_maps:
issues:
updated_test: fields.updated
select:
- issues.key
- updated_test
- updated_test_1
- updated_test_2
When I run meltano select tap-jira --list --all > list_jira.txt
I get on the console:
2024-04-07T20:12:39.108944Z [warning ] Stream `updated_test` was not found in the catalog
2024-04-07T20:12:39.109025Z [warning ] Stream `updated_test_1` was not found in the catalog
2024-04-07T20:12:39.109082Z [warning ] Stream `updated_test_2` was not found in the catalog
Then on list_jira.txt
Enabled patterns:
issues.key
updated_test
updated_test_1
updated_test_2
...
[automatic] issues.updated_test_1
[automatic] issues.updated_test_2
Which indicates that updated_test_1
and updated_test_2
, where added to the schema, right? And updated_test
was not, because it was only set on the stream_map
But then,
When I add
metadata:
issues:
replication-method: INCREMENTAL
replication-key: updated_test_1 # no
I keep getting the error:
2024-04-07T20:17:11.599170Z [info ] singer_sdk.exceptions.InvalidReplicationKeyException: Field 'updated_test_1' is not in schema for stream 'issues' cmd_type=extractor name=tap-jira run_id=2d93dcda-09f8-4c0e-8d97-67c35b1fb51c state_id=2024-04-07T201709--tap-jira--target-s3 stdio=stderr
Afonso Diniz
04/07/2024, 8:25 PMschema
extra holds an object describing Singer stream schema override rules that are applied to the extractor's discovered catalog file when the extractor is run using meltano elt
or meltano invoke
. These rules are not applied when a catalog is provided manually.'
From what I read here it should override the catalog file. It's strange why it's not working 😕Reuben (Matatika)
04/07/2024, 9:00 PMAfonso Diniz
04/07/2024, 9:44 PMupdated_test: fields.updated
The issue I'm having is that when
metadata:
issues:
replication-method: INCREMENTAL
replication-key: updated_test_1 # no
updated_test_1:
is-replication-key: true
tries to get updated_test_1
as replication-key, it says it does not exist on the catalog.
Which does not originally, but I'm creating and overwriting it with:
- name: tap-jira
schema:
issues:
updated_test_1:
type: ["string", "null"]
😕Reuben (Matatika)
04/07/2024, 10:32 PMIf a schema is specified for a property that does not yet exist in the discovered stream's schema, the property (and its schema) will be added to the catalog. This allows you to define a full schema for taps such asthat do not themselves have the ability to discover the schema of their streams.tap-dynamodb
This makes it sound like it should work,..
Reuben (Matatika)
04/08/2024, 12:00 AMselect
rules are wrong also. Currently, it is identifying updated_test
, updated_test_1
and updated_test_2
as streams to select - not properties of a stream (hence Stream was not found in the catalog
warnings), So, everything that's not the key
property of the issues
stream will be ignored - most likely including your stream map property updated_test_1
.
You probably want
select:
- issues.key
- issues.updated_test_1
Afonso Diniz
04/08/2024, 9:20 AM2024-04-07T20:12:39.108944Z [warning ] Stream `updated_test` was not found in the catalog
2024-04-07T20:12:39.109025Z [warning ] Stream `updated_test_1` was not found in the catalog
2024-04-07T20:12:39.109082Z [warning ] Stream `updated_test_2` was not found in the catalog
The question here is the order of execution of the commands: schema
, select
, metadata
and stream_maps
.
And how to configure metadata
to use the newly created columns and not only the tables on the catalog
.
Cause I'm still getting the:
2024-04-07T20:17:11.599170Z [info ] singer_sdk.exceptions.InvalidReplicationKeyException: Field 'updated_test_1' is not in schema for stream 'issues' cmd_type=extractor name=tap-jira run_id=2d93dcda-09f8-4c0e-8d97-67c35b1fb51c state_id=2024-04-07T201709--tap-jira--target-s3 stdio=stderr
When
metadata:
issues:
replication-method: INCREMENTAL
replication-key: updated_test_1 # no
Reuben (Matatika)
04/08/2024, 10:07 AMAfonso Diniz
04/08/2024, 11:25 AMEdgar Ramírez (Arch.dev)
04/08/2024, 3:34 PMfields.updated
always present or is it particular to your Jira installation?Afonso Diniz
04/08/2024, 3:35 PMEdgar Ramírez (Arch.dev)
04/08/2024, 3:35 PMstream_maps
for setting a replication key, so I'm trying to see if we can change it upstream in the tap itself)Afonso Diniz
04/08/2024, 3:36 PMfields.updated
is always present yesAfonso Diniz
04/08/2024, 3:37 PM(I wouldn't rely onGot it.for setting a replication key, so I'm trying to see if we can change it upstream in the tap itself)stream_maps
Edgar Ramírez (Arch.dev)
04/08/2024, 3:42 PMAfonso Diniz
04/08/2024, 3:43 PMissues.updated
-> this is column is empty (at least on my side)
issues.fields.updated
-> this is the columns I'm talking about, and I want to use as replication-keyEdgar Ramírez (Arch.dev)
04/08/2024, 3:44 PMAfonso Diniz
04/08/2024, 3:45 PMAfonso Diniz
04/08/2024, 3:46 PMid
cannot come in null right?
you're removing it from replication_key
, but it is primary_key
, and therefore is never null
Edgar Ramírez (Arch.dev)
04/08/2024, 3:46 PMAfonso Diniz
04/08/2024, 3:46 PMAfonso Diniz
04/08/2024, 3:46 PMEdgar Ramírez (Arch.dev)
04/08/2024, 3:46 PMpip_url: git+<https://github.com/MeltanoLabs/tap-jira@refs/pull/71/head>
?Afonso Diniz
04/08/2024, 3:46 PMAfonso Diniz
04/08/2024, 3:46 PMAfonso Diniz
04/08/2024, 3:47 PMAfonso Diniz
04/08/2024, 3:47 PMEdgar Ramírez (Arch.dev)
04/08/2024, 3:47 PMAfonso Diniz
04/08/2024, 3:48 PMAfonso Diniz
04/09/2024, 7:22 AMAfonso Diniz
04/09/2024, 7:25 AM2024-04-09T07:22:10.277277Z [debug ] {"type": "RECORD", "stream": "issues", "record": {"id": "__", "self": "<https://___.atlassian.net/rest/api/3/issue/__>", "key": "__-3795", "fields": {"parent": {"id": "__", "key": "___", "fields": {"summary": "___"}}, "status": {"description": "", "name": "Done", "id": "10041"}, "creator": {"accountId": "__", "displayName": "__", "active": true, "timeZone": "Europe/__", "accountType": "atlassian"}, "reporter": {"accountId": "___", "displayName": "__", "active": true, "accountType": "atlassian"}, "issuetype": {"id": "10014", "name": "Sub-task"}, "project": {"id": "10008", "key": "__", "name": "__", "projectTypeKey": "software", "simplified": false}, "resolutiondate": "2024-03-21T16:47:44.769+0000", "updated": "2024-03-21T16:47:44.775+0000", "summary": "__", "duedate": null}}, "time_extracted": "2024-04-09T07:22:10.277080+00:00"} cmd_type=extractor name=tap-jira (out) run_id=ed16db38-22d1-4426-bd35-1541d5bf58eb state_id=2024-04-09T072105--tap-jira--target-s3-csv stdio=stdout
from what I can see, the updated column is not coming with values, only fields.updatedEdgar Ramírez (Arch.dev)
04/09/2024, 1:41 PMmeltano install extractor tap-jira --clean
Afonso Diniz
04/09/2024, 4:47 PM