Hi I have a question: I'm using tap-kustomer > ...
# troubleshooting
p
Hi I have a question: I'm using tap-kustomer > target-s3 tap-kustomer has
updated_at
as replication key and this key is also present in the record as string. when I try to ingest the stream using type-defined schema (in parquet), it fails. I saw a peculiar thing, the
updated_at
key suddenly became
datetime.datetime
when passed to target-s3 via context. I want to ask, is there a timestamp conversion that takes place implicitly, in the taps?
message has been deleted
this is quite weird and need some direction here, thank you
v
json schema dates are strings with a format of date so it's not super weird re https://json-schema.org/understanding-json-schema/reference/string#dates-and-times Your specifics I don't know, it's not enough information to really help 😕
The unfortunate thing is once you have enough info to help me solve it it'll probably be obvious what to do on your side 🙂
p
oh it's not about the schema definition, it's that when the tap produces record, a key in that record is
updated_at
which is string but when the target processes it the same key becomes
datetime.datetime
for the same record.
v
Are you sure? If I were you I"d make a single output file that contains 1 schema message, and 1 record message that hits your error ie run
meltano invoke tap-kustomer > out
go filter
out
down to just one record maybe even one field if you're feeling up to it that causes the error when running
cat out | meltano invoke target-s3
If you still can't get it share that output file and the out response you're getting 🙂
p
yes, on it
Copy code
-- stream output from tap-kustomer --
{"type": "STATE", "value": {}}
{"type": "SCHEMA", "stream": "conversations", "schema": {"properties": {"type": {"type": ["string", "null"]}, "id": {"type": ["string", "null"]}, "updated_at": {"format": "date-time", "type": ["string", "null"]}}, "type": ["object", "null"]}, "key_properties": ["id"], "bookmark_properties": ["updated_at"]}
{"type": "RECORD", "stream": "conversations", "record": {"type": "conversation", "id": "64c849056f6e881dd8474251", "updated_at": "2023-08-01T00:02:21.371Z"}, "time_extracted": "2023-10-06T14:58:45.180526+00:00"}


-- output from target-s3 --
{"type": "conversation", "id": "64c849056f6e881dd8474251", "updated_at": datetime.datetime(2023, 8, 1, 0, 2, 21, 371000, tzinfo=tzutc()), "_PROCESS_DATE": "2023-10-06T14:59:32.060194"}
updated_at
is replication key for the stream
v
What target-s3 are you using can you share your meltano.yml? Pretty clear it's a loader bug now
p
yeap
let me share my yml
Copy code
- name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  - name: target-s3
    variant: crowemi
    pip_url: git+<https://github.com/crowemi/target-s3.git>
    config:
      append_date_to_filename_grain: microsecond
      format:
        format_type: parquet
      include_process_date: true
      prefix: meltano/flat/2
v
Well it's a target bug for sure, I'd put an issue in there / look at the code for the target to figure out what's going on!
That repro file will be helpful
p
I'm on it for a while now, but thank for the input. I'll update here with some findings 🙂
v
One odd thing I just noticed is you said the output is json, but your config says it's supposed to be parquet
p
yes I'm giving this out of the record log because creation of parquet is failing as I'm trying to make parquet schema from tap. This difference in
updated_at
data type fails the file creation.
v
I'm questioning whether this is a target bug now in the way you're saying.
p
I'm developing this target to define schema from tap's schema output, so a possibility here is I might have got it wrong, but weird because there's no update to any code that processes records whatsoever
essentially finding any timestamp key and converting it
is this done to populate downstream datetime columns by default?
seems like it
I can now adjust this in my schema generation code and hope all will be good. Thanks again @visch
was able to get it working. Basically, I am trying to make schema in a more declarative way using the schema by tap. I'll link the PR here.