Hi everyone - another support question here. I'm ...
# plugins-general
i
Hi everyone - another support question here. I'm working on a PoC for process with Meltano. The criteria for success is a working pipeline that fetches JSON events dumped into S3 with AWS Kinesis Firehose and loads them into Snowflake. I created a simple tap that fetches objects from S3 and outputs parsed lines and connected this tap with target-snowflake. The tap seems to be working find according to singer-check-tap but when I run the pipeline I get a Broken pipeline error. It looks like this error is most likely to be caused by incorrect credentials, so I tested them. It looks like they are correct. And here's my question: What is the best way to debug this error?
p
have you tried running
meltano config tap-s3-json list
to see if meltano is finding your credentials properly?
i
As you can see, I'm logging the schema and the parsed records, which means it successfully fetches the data from s3.
d
@ivanovyordan The tap failing with a "Broken pipe" error implies that the target quit unexpectedly, causing target's
stdin
to close, and the
tap stdout
->
target stdin
pipe to break. Could anything be causing
target-snowflake
to fail?
i
I guess it's related with the output from my tap. It contains a json formatted string field. And just to give you a bit more context, I tested the tap with
loader-jsonl
and it was working.
d
@ivanovyordan Ah, all right, it's possible that
target-snowflake
is not liking the JSON formatted string as much. Have you tried dumping the
meltano invoke tap
output into a file, and piping it directly into
meltano invoke target
, so that we can see if it's indeed failing? You can also enable debug mode on
meltano elt
to see the specific messages being sent from the tap and the target, to get an idea of what the offending record might be: https://meltano.com/docs/command-line-interface.html#debugging
i
Oh! Thanks @douwe_maan. I'll try that.
I dropped the JSON columns and it still fails. It looks there's something wrong with my schema as the last invoked line in the stack trace is actually the line where I write it. Here's an example output from the tap:
Copy code
{"type": "SCHEMA", "stream": "my-record", "schema": {"properties": {"namespace": {"type": "string"}, "object_key": {"type": "string"}, "source_system": {"type": "string"}, "timestamp": {"type": "string"}, "_file": {"type": "string"}, "_line": {"type": "integer"}}, "selected": true}, "key_properties": ["_file", "_line"]}
{"type": "RECORD", "stream": "my-record", "record": {"namespace": "my-record/bulk_sent", "object_key": "", "source_system": "main-app", "timestamp": "2020-04-28T07:45:21+00:00", "_file": "my-record/2020/04/28/07/my-record-1-2020-04-28-07-45-21-07ecc118-e76e-4456-b65d-10bd82148507", "_line": 1}}
{"type": "STATE", "value": {"my-record": "my-record/2020/04/28/07/my-record-1-2020-04-28-07-45-21-07ecc118-e76e-4456-b65d-10bd82148507"}}
d
@ivanovyordan The tap tries to flush its stdout to write the SCHEMA message, which fails because of a
BrokenPipe
, indicating that the pipe was actually broken and the target had already quit before the tap attempted to write that schema. Did you try running the target separately, piping in the tap output?
Copy code
meltano invoke tap-s3-json > singer.jsonl
cat singer.jsonl | meltano invoke target-snowflake
That should show us whether the target fails unexpectedly.
i
Sorry @douwe_maan. Forgot to mention that. It fails silently unless I change the log level to debug. When i pass the debug argument I see a log message about the configuration stub + all environment variables. Checked the generated config file and it looks OK. I can tell for sure it fails because of the color of my prompt.
d
@ivanovyordan All right. I'd gladly help you debug
target-snowflake
further, but it may be more productive to see if you have better luck with https://github.com/transferwise/pipelinewise-target-snowflake or https://github.com/datamill-co/target-snowflake, either of which you can add as a custom plugin: https://meltano.com/docs/command-line-interface.html#how-to-use-custom-plugins. Have you considered using either?
i
Sure. Thanks @douwe_maan. I'll definitely do that. If it happens to be a bug in
target-snowflake
, I'll try to fix it and open a merge request.
d
@ivanovyordan Sounds good! If you'd like to debug the target-snowflake you're currently using some more, I'd suggest adding
<http://logger.info|logger.info>
statements to your target-snowflake source at
.meltano/loaders/target-snowflake/venv/lib/python3.6/site-packages/target_snowflake
so that it won't fail silently anymore.
i
Cool. Thanks @douwe_maan
Hey @douwe_maan I had success with using the target from pipelinewise. I also spent some more time experimenting with
target-snowflake
from Gitlab. It looks like the problem comes from
snowflake-connector-python
. The version you depend on is a bit old and have troubles with Python 3.7.7. I could spend some time in upgrading dependencies, but I don't know if that fits in your vision for the project. 🙂
d
@ivanovyordan If you've already done the research, I'd love it if you could look into bumping the dependencies! That would also allow us to resolve https://gitlab.com/meltano/target-snowflake/-/issues/27. As suggested in https://gitlab.com/meltano/meltano/-/issues/2134, though, we'll probably end up deprecating our version of target-snowflake in favor of pipelinewise's sooner rather than later, since the community is ultimately better off with 1 canonical version of each tap and target, and transferwise has done a good job of maintaining it. Upgrading the dependencies would help those who try to get started with Meltano and target-snowflake in the mean time, though, as well as those who already set up their pipelines to use it. I'll gladly accept your contributions to the target, but it's up to you whether you think it's worth the effort if pipelinewise's version is probably gonna be the future anyway 🙂
i
I'll definitely do it. I'm a huge GitLab fan and it's one more way to give back 🙂
d
@ivanovyordan Awesome, thank you ❤️
m
is the
tap-s3-json
mentioned in this thread a public tap? I don’t see it in the Meltano hub, pypi, github, or gitlab. I’m in need of the same kind of tap so would love to avoid re-implementing it if there’s a public version out there.
t
It’s likely this one https://github.com/dcereijodo/tap-s3-json I’ll add it to our list to get it added cc @pat_nadolny
m
thank you! (why didn’t my github search find that one…)
p
it looks like thats actually a scala project so we wont be able to install it with Meltano as of today 😞
t
Oh I didn’t even notice that!
p
it didnt show up on in our tap-github search either so I was confused too. Its because we only look for python repos