Can anyone tell if something looks off with my map...
# troubleshooting
a
Can anyone tell if something looks off with my mapper config + invocation ?
Copy code
meltano --log-level debug run tap-mongodb filter-stream target-postgres

Run invocation could not be completed as block failed: Cannot start plugin meltano-map-transformer: Executable 'meltano-map-transformer' could not be found. Mapper 'meltano-map-transformer' may not havebeen installed yet using `meltano install mapper meltano-map-transformer`, or the executable name may be incorrect.
Copy code
mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    # pip_url: meltano-map-transform
    mappings:
    - name: filter-stream
      config:
        stream_maps:
          mystream:
            __filter__: "'\\u0000' in record['document']"
e
@andrew_stewart can you try adding
executable: meltano-map-transform
, i.e.
Copy code
mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    # pip_url: meltano-map-transform
    executable: meltano-map-transform
    mappings:
    - name: filter-stream
      config:
        stream_maps:
          mystream:
            __filter__: "'\\u0000' in record['document']"
a
Ok, great, that worked! Thanks @edgar_ramirez_mondragon… I don’t suppose you have any advice on my actual filter log do you? 🙂
e
you mean
"'\\u0000' in record['document']"
?
a
yeah
Im trying to figure out a way to filter out record
\u0000
e
is
record['document']
a string?
a
that’s a good question…. it’s coming from mongodb (via
pipelinewise-tap-mongodb
), which basically treats an entire record as one field on the extractor side.. and then in
target-postgres
that becomes a jsonb field.
(so I’m a little fuzzy on what
record['document']
should be at that point in time during the mapper)
looks like it might be difficult to actually specify
\u0000
in the meltano.yml because the yaml parser complains about “embedded null byte”
Run invocation could not be completed as block failed: Cannot start plugin meltano-map-transformer: embedded null byte
(so I tried to escape it.. but that probbly wouldn’t work)
e
i used to see those null chars a lot in mysql. ended up removing them in pre-processing, so equivalently in the tap itself
a
Right, I was hoping to use the new mappers to try tht
(but not a ton of documentation on it yet)
e
Wdyt @aaronsteers?
a
@edgar_ramirez_mondragon what’s funny is that I think an issue exactly like this may have possibly been the original inspiration for mappers last year: https://meltano.slack.com/archives/C013EKWA2Q1/p1611959478006300
e
no that makes sense. this is a good use case for stream maps. your usage looks good, but can't tell without trying it and failing 😅
a
Agreed this is definitely a great use case.
(so I’m a little fuzzy on what
record['document']
should be at that point in time during the mapper)
It should be the python representation of whatever field is in the "document" property. If it is a string, you should be able to manipulate it with python string operations. If it is a node with subproperties, you would treat it as a dict object. Syntax should be basically the same as Python but not all operations are easy to perform inline.
Since this is performing on the filter, you could probably coerce the object to str() and then check if that string contains the problem character.
a
Would
record
there be the property ?
(bc I got that part from looking at meltano squared.. wasn't sure if "record" is the property name or a reserved word)
a
Oh, sorry. I see the confusion. "record" is the top level python dict which represent all fields.
So, presuming a record structure of
{ id: 1, user: { name: John } }
, you could use filter expressions like
record['id'] > 0
and
record['user'].get('name', None) is not None
.
And to be clear, I'm also completely open to creating new convenience functions. An example if you look into the source code is
md5()
which doesn't exist as a python function in built-in libraries, but we've defined it in the SDK and passed it along to the evaluation context. A similar function could be defined like
deep_str_contains(dict_or_str, str_check)
. Probably this isn't what we'd want to build exactly, but wanted to let you know we have options to add new convenience functions.
We're all out at our all-hands conference this coming week but are you free by chance the following Wednesday to join an office hours? We can keep discussing in async of course but would be great to also discuss potential to explore updates to the SDK mappers for your use case.
a
oops sorry didn’t see this.. I def want to hit up an office hours soon.
a
No worries at all, @andrew_stewart. Tomorrow or next week perhaps?
a
I just gotta find a week when I can duck out of a team meeting that’s then.