Can anyone tell if something looks off with my mapper config Meltano #troubleshooting

Can anyone tell if something looks off with my map...

andrew_stewart

03/11/2022, 7:52 PM

Can anyone tell if something looks off with my mapper config + invocation ?

Copy code

meltano --log-level debug run tap-mongodb filter-stream target-postgres

Run invocation could not be completed as block failed: Cannot start plugin meltano-map-transformer: Executable 'meltano-map-transformer' could not be found. Mapper 'meltano-map-transformer' may not havebeen installed yet using `meltano install mapper meltano-map-transformer`, or the executable name may be incorrect.

Copy code

mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    # pip_url: meltano-map-transform
    mappings:
    - name: filter-stream
      config:
        stream_maps:
          mystream:
            __filter__: "'\\u0000' in record['document']"

edgar_ramirez_mondragon

03/11/2022, 7:55 PM

@andrew_stewart can you try adding

executable: meltano-map-transform

, i.e.

Copy code

mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    # pip_url: meltano-map-transform
    executable: meltano-map-transform
    mappings:
    - name: filter-stream
      config:
        stream_maps:
          mystream:
            __filter__: "'\\u0000' in record['document']"

andrew_stewart

03/11/2022, 7:56 PM

Ok, great, that worked! Thanks @edgar_ramirez_mondragon… I don’t suppose you have any advice on my actual filter log do you? 🙂

edgar_ramirez_mondragon

03/11/2022, 8:03 PM

you mean

"'\\u0000' in record['document']"

andrew_stewart

03/11/2022, 8:03 PM

yeah

andrew_stewart

03/11/2022, 8:04 PM

Im trying to figure out a way to filter out record

\u0000

edgar_ramirez_mondragon

03/11/2022, 8:17 PM

record['document']

a string?

andrew_stewart

03/11/2022, 8:20 PM

that’s a good question…. it’s coming from mongodb (via

pipelinewise-tap-mongodb

), which basically treats an entire record as one field on the extractor side.. and then in

target-postgres

that becomes a jsonb field.

andrew_stewart

03/11/2022, 8:20 PM

(so I’m a little fuzzy on what

record['document']

should be at that point in time during the mapper)

andrew_stewart

03/11/2022, 8:22 PM

looks like it might be difficult to actually specify

\u0000

in the meltano.yml because the yaml parser complains about “embedded null byte”

andrew_stewart

03/11/2022, 8:23 PM

Run invocation could not be completed as block failed: Cannot start plugin meltano-map-transformer: embedded null byte

andrew_stewart

03/11/2022, 8:23 PM

(so I tried to escape it.. but that probbly wouldn’t work)

edgar_ramirez_mondragon

03/11/2022, 8:28 PM

i used to see those null chars a lot in mysql. ended up removing them in pre-processing, so equivalently in the tap itself

andrew_stewart

03/11/2022, 8:44 PM

Right, I was hoping to use the new mappers to try tht

andrew_stewart

03/11/2022, 8:45 PM

(but not a ton of documentation on it yet)

edgar_ramirez_mondragon

03/11/2022, 9:02 PM

Wdyt @aaronsteers?

andrew_stewart

03/11/2022, 9:10 PM

@edgar_ramirez_mondragon what’s funny is that I think an issue exactly like this may have possibly been the original inspiration for mappers last year: https://meltano.slack.com/archives/C013EKWA2Q1/p1611959478006300

edgar_ramirez_mondragon

03/12/2022, 12:43 AM

no that makes sense. this is a good use case for stream maps. your usage looks good, but can't tell without trying it and failing 😅

aaronsteers

03/12/2022, 2:04 AM

Agreed this is definitely a great use case.

aaronsteers

03/12/2022, 2:07 AM

(so I’m a little fuzzy on what
record['document']
should be at that point in time during the mapper)

It should be the python representation of whatever field is in the "document" property. If it is a string, you should be able to manipulate it with python string operations. If it is a node with subproperties, you would treat it as a dict object. Syntax should be basically the same as Python but not all operations are easy to perform inline.

aaronsteers

03/12/2022, 2:09 AM

Since this is performing on the filter, you could probably coerce the object to str() and then check if that string contains the problem character.

andrew_stewart

03/12/2022, 3:23 AM

Would

record

there be the property ?

andrew_stewart

03/12/2022, 3:25 AM

(bc I got that part from looking at meltano squared.. wasn't sure if "record" is the property name or a reserved word)

aaronsteers

03/12/2022, 6:16 PM

Oh, sorry. I see the confusion. "record" is the top level python dict which represent all fields.

aaronsteers

03/12/2022, 6:20 PM

So, presuming a record structure of

{ id: 1, user: { name: John } }

, you could use filter expressions like

record['id'] > 0

and

record['user'].get('name', None) is not None

aaronsteers

03/12/2022, 6:25 PM

And to be clear, I'm also completely open to creating new convenience functions. An example if you look into the source code is

md5()

which doesn't exist as a python function in built-in libraries, but we've defined it in the SDK and passed it along to the evaluation context. A similar function could be defined like

deep_str_contains(dict_or_str, str_check)

. Probably this isn't what we'd want to build exactly, but wanted to let you know we have options to add new convenience functions.

aaronsteers

03/12/2022, 6:27 PM

We're all out at our all-hands conference this coming week but are you free by chance the following Wednesday to join an office hours? We can keep discussing in async of course but would be great to also discuss potential to explore updates to the SDK mappers for your use case.

andrew_stewart

03/23/2022, 12:43 AM

oops sorry didn’t see this.. I def want to hit up an office hours soon.

aaronsteers

03/23/2022, 1:36 AM

No worries at all, @andrew_stewart. Tomorrow or next week perhaps?

andrew_stewart

03/23/2022, 8:31 PM

I just gotta find a week when I can duck out of a team meeting that’s then.

2 Views

Open in Slack

Previous Next