Hey there, I have a tap that has incompatible date...
# troubleshooting
a
Hey there, I have a tap that has incompatible date formats with the target and having some issues moving the data round, ideally I would like that at some point (during stream maps) I have some way to tackle this, but not sure if its possible or how to replace a value on the records mid-stream based on some conditions. The date format is 0000-00-00 000000 which I also assume is not a valid python format. I can imagine something like this (pseudo config, there might be syntax/formatting issues)
Copy code
stream_maps:
    date_column: date_column if not str(date_column).startswith('0000-0') else None
1
r
You're describing a filter operation, so it would probably look like
Copy code
stream_maps:
  stream_name:
    __filter__: not self.startswith('0000-0')
https://sdk.meltano.com/en/latest/stream_maps.html#filtering-out-records-from-a-stream-using-filter-operation
ty 1
Python does parse that date format fine though (looks like ISO 8601 without the
T
)...
Copy code
>>> datetime.fromisoformat("2024-01-01 00:00:00")
datetime.datetime(2024, 1, 1, 0, 0)
What's the error from the target? Maybe it's expecting the
T
?
Copy code
stream_maps:
  stream_name:
    date_column: self.replace(" ", "T")
https://stackoverflow.com/a/9532375
a
Oh wow, I thought that Python would refuse to parse this, given that it’s cleaerly not a date so my first assumption was wrong! I am not sure tbh, the “real” dates seem to be working fine when loading into BQ but those give issues!
r
Do you have a log output or something?
Copy code
meltano invoke tap > tap.out
cat tap.out | meltano invoke target-bigquery  # logs from this
a
Yeah I can provide it later but basically is the target raising a bigquery API error saying that too many records are wrong, to check some collection called
error[]
which then is empty
seems like the lower limit is
0001-01-01 00
according to this https://popsql.com/learn-sql/bigquery/date-and-time-data-types-in-bigquery so 0000-00-00 would be outside those
r
Ah, I see. I just assumed you were referring to
0000-00-00
as some placeholder value. That should be a fairly easy transform then:
Copy code
stream_maps:
  stream_name:
    date_column: self if not self.startswith("0000") else datetime.datetime.min.isoformat()
ty 1
👍 1
a
Thanks! I think I’d rather null though instead of the min
r
Right, of course. I definitely overthought that. 😅
a
I am having a very weird case where I am using a
___filter___: 1 != 1
. But all the data is still being extracted (I am using this filter to debug, original cause did not work) for the stream, any idea?
Its’ a bit weird, I see in some configs the filters are used accessing. the fields by
record['foo'] != 0
and other accesing the property directly like
foo != 0
Regardless I don’t get it to work
I can even add fields that do not exist in the record, or streams that do not exist and Mletano won’t crash? Like
Copy code
- name: foo-tap
  config:
    stream_maps:
      non_existing_stream:
        __filter__: non_existing_field != bar
I feel like the entire config is being ignored, maybe a parsing issue on my end on the yaml file otherwise I’d expect meltano to crash?
Mmm or I just tried adding new fields:
Copy code
- name: foo-tap
  config:
    stream_maps:
      non_existing_stream:
        foo: bar
        __filter__: non_existing_field != bar
But they are not in the jsonl target, so yeah
r
Yeah,
stream_maps
is only natively supported by taps/targets built with the Meltano SDK. For others, you can use the meltano-map-transformer mapper plugin.
a
Is there an easy way to identify which have been built with Meltano SDK and which not?
r
On Meltano Hub, they have the
Meltano SDK
badge: https://hub.meltano.com/extractors/tap-mysql/
ty 1
From https://sdk.meltano.com/en/latest/stream_maps.html#a-feature-for-all-singer-users-enabled-by-the-sdk:
Note: to support non-SDK taps and targets, the standalone inline mapper plugin meltano-map-transformer follows all specifications defined here and can apply mapping transformations between any Singer tap and target, even if they are not built using the SDK.
This part of the README is useful if you end up using `meltano-map-transformer`: https://github.com/MeltanoLabs/meltano-map-transform?tab=readme-ov-file#meltano-installation-instructions