Hey, love the concept for using a schema around ta...
# plugins-general
s
Hey, love the concept for using a schema around taps & targets so that community infra can be built around it. I need to get data out of a few niche applications (construction software) so happy to share these with the community once I've got them working - just a few implementation details holding me up. I have a custom tap I'm building for SIgnOnSite that works perfectly with
meltano invoke tap-signonsite --properties path/to/properties.json
(doesn't work unless properties is specified, assuming that's intended behaviour). It doesn't work as part of a pipeline in Meltano UI though (just to target-csv, so couldn't be simpler). It gives 2 errors •
invalid catalog output
- I copied the base for my tap from
tap-github
, so assumed that it would work. This looks the same as
tap-stripe
too, so any insight on what's wrong? It results in the JSON catalog being dumped to console when run •
ERROR: unable to parse
- it can't parse the arguments for some reason.. Works fine with
meltano invoke
as noted above, so does the UI pass the arguments differently? Also, I'll likely want to use
meltano elt
once this is working to schedule the pipeline myself - where does it get the properties file from? When I
meltano invoke
, I'm specifying the path, but it doesn't look like there's an option for that with
meltano elt
.
a
What happens if you run
meltano invoke tap-signonsite --discover
?
d
@sam_woolerton When using
meltano elt
,
meltano invoke
, or running a pipeline from the UI, you don't need to provide a properties file explicitly as long as the tap supports discovery mode. As it says on https://meltano.com/docs/integration.html#selecting-entities-and-attributes-for-extraction:
Whenever an extractor is run using
meltano elt
or
meltano invoke
, Meltano will generate the desired catalog on the fly by running the tap in discovery mode and applying the selection, metadata, and schema rules to the resulting catalog file before passing it to the tap in sync mode.
I assume you've specified that the tap supports the
discover
and
properties
capabilities? (Note that
--properties
is considered deprecated, and new taps should use `--catalog`: https://github.com/singer-io/getting-started/blob/master/docs/SYNC_MODE.md) In your case, it looks like this is not working correctly because Meltano considers the discovered catalog (result of
meltano invoke tap-signonsite --discover
) invalid. Your discovery code looks OK, but can you please share the discovered catalog so that we can help you figure out why Meltano might still not like it? The "unable to parse" error you're seeing
target-csv
print originates here: https://github.com/singer-io/target-csv/blob/1b73164ae7482a7f5dc625f2b08e85b7410e5473/target_csv.py#L51 The fact that nothing was printed after the colon suggests that the target received an empty line on its stdin, while all lines output by the tap are expected to be JSON-encoded Singer messages. Do you know why your tap may be outputting blank lines? If your tap is also outputting extra lines when you run
--discover
, that may explain why Meltano fails to parse the output, since it expects the entire output to be valid JSON, which it may not be in your case. You may want to run
meltano elt
in debug mode (https://meltano.com/docs/command-line-interface.html#debugging), so that you can see each line output by the tap, as well as the exact way the tap and target are invoked, including their command line arguments (
--properties
etc).
Since the "invalid catalog output" error is pretty uninformative, I've made this change to print full error message when extractor catalog discovery fails, which will land in the next release: https://gitlab.com/meltano/meltano/-/merge_requests/1849
s
Ok solved the catalog issue - I had some print statements so I could track control flow, and hadn't clicked that they would also go to stdout (I started as a web dev so still kind of new to stdin/stdout concepts). Commented those print lines out and the catalog issue is solved thanks
d
@sam_woolerton Great! Yeah, for logging you're better off using STDERR. Those messages will be reflected in the
meltano elt
output as well 🙂
s
Once I commented out the rest of my
print
statements, the pipeline itself ran perfectly too! Thanks for the help