Hello there! I started to use a mapper (<meltano-m...
# troubleshooting
c
Hello there! I started to use a mapper (meltano-map-transformer) to do a simple column renaming. The only way that I found to do that was using the
run
command and defining new properties for the stream with the new name but using the original value. That worked just fine, but I got a couple of problems: • The
run
command does not allow to manually provide a catalog for the extractor (tap-s3-csv) as in
elt
command. Since I'm dealing with CSV files, the extractor sometimes autodetects data types wrongly because it uses a sample of the data for that • The way I'm renaming the stream properties has the side effect to lose the original data types and the resulting stream identifies everything as a string. I found a way to go around this issue by using
int(
and
float(
expressions but that introduced conversion errors for invalid values (such as empty strings) and the expressions doesn't support string to datetime conversions either Any ideas how to deal with those two issues?
a
For the first issue, I believe you can add a
catalog
mapping in your yaml. If you want to use multiple inputs, you can declare a few instances of your tap using the
inherits_from
feature.
```extractors:
- name: tap-gitlab
catalog: extract/tap-gitlab.catalog.json```
I found a way to go around this issue by using
int(
and
float(
expressions but that introduced conversion errors for invalid values (such as empty strings) and the expressions doesn't support string to datetime conversions either
As of now, we don't yet have datetime support in mappers, and the hints you are using are the best that is currently available. To solve for null values, you could use a workaround of
str(my_col or '')
, which just relies on standard Python 'or' operator to coalesce from a null value to a non-null one.
You can also open an issue or PR in the SDK for expanding the capabilities of the mapping functions. For instance, I think others have requested some datetime helpers, but I don't think we have any issue logged as of yet.
c
Thank you @aaronsteers! I'll test your suggestions
s
Not sure if this is helpful, I have a variant of tap-s3-csv as I wanted to resolve a number of issues including overriding the discovered datatype by the tap. I have an additional setting which provides me an option to override the discovered datatype and treat it as a string. https://github.com/s7clarke10/pipelinewise-tap-s3-csv