How do I create my own `type` for a property-level...
# getting-started
c
How do I create my own
type
for a property-level transformation? I want to replace some unicode characters that psycopg2 doesn't like. It looks like I can do this with Inline Stream Maps. The PR for the feature shows a couple of different types:
MASK-HIDDEN
and
lowercase
, but searching through the source I'm not seeing where the functionality of those types is defined. Approach suggested by @thomas_briggs: (thanks!) https://meltano.slack.com/archives/C01TCRBBJD7/p1674916619997379
t
If I understand the question correctly: I don't think you need to. Just add an item to the stream with the name of the field you want to modify where the value is the Python code to modify the field. See the README for the mapper for a decent example.
c
I had thought that value might just be python from looking at the readme. But,
md5
in the example is not a default function in python so kept looking and found Property-level transformations, I think my transform code is going to be pretty simple, but if it gets more complex, is there a way to define my own functions outside of meltano.yml?
t
That is a great question. My guess is you'd need a custom mapper but I dunno. @edgar_ramirez_mondragon helped me figure out the map-transformer in the first place... maybe he can answer that question for you. 😉
c
And one more question - I need to replace invalid chars in all fields in a stream. I do now know the names of all the fields ahead of time. Can I apply the map to an entire record/stream(?) at a time? From what I can see, examples in the SDK all reference fields by name. Or, am I going to have to create 2 pipelines? - 1. extract the original csv as unseparated lines of text, replace the invalid chars, load the clean text into a new intermediate csv file 2. extract the clean intermediate csv and load it into postgres Of course I could use parquet or something else as the intermediate file, or perhaps I could do all of that in line? Is it possible to do
extractor -> mapper -> extractor > loader
?
t
Another good question. I wonder if you could put the Python code to remove the characters in the
__else__
element? I think that applies to all fields that don't have an explicit mapping.
c
That sounds like a fantastic idea, but unfortunately: Else behavior currently limited to `null` assignment. Seems like good reason to enhance the else capabilities, and I think that's one of the things they have in mind. "_For instance, we could in the future add the ability to remove or treat a property from any stream in which it appears._"
t
Bah, sorry, forgot about that 😕
e
I think my transform code is going to be pretty simple, but if it gets more complex, is there a way to define my own functions outside of meltano.yml?
Not really, other than contributing them to the SDK or spinning out your own mapper plugin. See (sdk#1175 and meltano-map-transform#11).
Can I apply the map to an entire record/stream(?) at a time?
As the comments above suggest,
__else__
at the property level is a good candidate to implement this, but I don’t think there’s any issue open for it, so feel free to log one 🙂
c
Thanks @thomas_briggs & @edgar_ramirez_mondragon - I've opened a new issue: Apply map to entire record/stream instead of a field
Regarding my other question - is it possible in the interim to do something like this:
extractor -> mapper -> extractor > loader
? I'm wondering if I can read the file w/o splitting into fields, apply the map, then read the stream and split the fields, then load the data.
e
OK so, the first extractor would output something like this, right?
Copy code
{"schema": {"properties": {"line": {"type": "string"}}}}
{"record": {"line": "a,csv,line,with,bad,chars"}}
You could certainly do that but that’d probably require a custom tap (could be a .py script) and you’d also need to dump it somewhere (so, a target)