How do I create my own `type` for a property level transform Meltano #getting-started

How do I create my own `type` for a property-level...

chrish

01/30/2023, 12:59 PM

How do I create my own

type

for a property-level transformation? I want to replace some unicode characters that psycopg2 doesn't like. It looks like I can do this with Inline Stream Maps. The PR for the feature shows a couple of different types:

MASK-HIDDEN

and

lowercase

, but searching through the source I'm not seeing where the functionality of those types is defined. Approach suggested by @thomas_briggs: (thanks!) https://meltano.slack.com/archives/C01TCRBBJD7/p1674916619997379

thomas_briggs

01/30/2023, 1:45 PM

If I understand the question correctly: I don't think you need to. Just add an item to the stream with the name of the field you want to modify where the value is the Python code to modify the field. See the README for the mapper for a decent example.

chrish

01/30/2023, 2:01 PM

I had thought that value might just be python from looking at the readme. But,

md5

in the example is not a default function in python so kept looking and found Property-level transformations, I think my transform code is going to be pretty simple, but if it gets more complex, is there a way to define my own functions outside of meltano.yml?

thomas_briggs

01/30/2023, 2:15 PM

That is a great question. My guess is you'd need a custom mapper but I dunno. @edgar_ramirez_mondragon helped me figure out the map-transformer in the first place... maybe he can answer that question for you. 😉

chrish

01/30/2023, 2:27 PM

And one more question - I need to replace invalid chars in all fields in a stream. I do now know the names of all the fields ahead of time. Can I apply the map to an entire record/stream(?) at a time? From what I can see, examples in the SDK all reference fields by name. Or, am I going to have to create 2 pipelines? - 1. extract the original csv as unseparated lines of text, replace the invalid chars, load the clean text into a new intermediate csv file 2. extract the clean intermediate csv and load it into postgres Of course I could use parquet or something else as the intermediate file, or perhaps I could do all of that in line? Is it possible to do extractor -> mapper -> extractor > loader
?

thomas_briggs

01/30/2023, 2:35 PM

Another good question. I wonder if you could put the Python code to remove the characters in the

__else__

element? I think that applies to all fields that don't have an explicit mapping.

chrish

01/30/2023, 2:38 PM

That sounds like a fantastic idea, but unfortunately: Else behavior currently limited to `null` assignment. Seems like good reason to enhance the else capabilities, and I think that's one of the things they have in mind. "_For instance, we could in the future add the ability to remove or treat a property from any stream in which it appears._"

thomas_briggs

01/30/2023, 3:00 PM

Bah, sorry, forgot about that 😕

edgar_ramirez_mondragon

01/30/2023, 4:45 PM

I think my transform code is going to be pretty simple, but if it gets more complex, is there a way to define my own functions outside of meltano.yml?

Not really, other than contributing them to the SDK or spinning out your own mapper plugin. See (sdk#1175 and meltano-map-transform#11).

Can I apply the map to an entire record/stream(?) at a time?

As the comments above suggest,

__else__

at the property level is a good candidate to implement this, but I don’t think there’s any issue open for it, so feel free to log one 🙂

chrish

01/31/2023, 1:47 PM

Thanks @thomas_briggs & @edgar_ramirez_mondragon - I've opened a new issue: Apply map to entire record/stream instead of a field

chrish

01/31/2023, 1:50 PM

Regarding my other question - is it possible in the interim to do something like this: extractor -> mapper -> extractor > loader
? I'm wondering if I can read the file w/o splitting into fields, apply the map, then read the stream and split the fields, then load the data.

edgar_ramirez_mondragon

01/31/2023, 8:20 PM

OK so, the first extractor would output something like this, right?

Copy code

{"schema": {"properties": {"line": {"type": "string"}}}}
{"record": {"line": "a,csv,line,with,bad,chars"}}

You could certainly do that but that’d probably require a custom tap (could be a .py script) and you’d also need to dump it somewhere (so, a target)

4 Views

Open in Slack

Previous Next