Dumb ish question newlines in the source data should be enco Meltano #troubleshooting

Dumb(ish) question: newlines in the source data sh...

thomas_briggs

11/15/2022, 8:26 PM

Dumb(ish) question: newlines in the source data should be encoded to \n by the tap, right? Or they have to be, really, since the Singer format is JSON? Is there then any way for the target to accurately distinguish between the string "\n" and a carriage return? My specific scenario, if it matters, is MySQL to Postgres. varchars containing newlines in MySQL are being stored with literal "\n"s in PG. From testing with

meltano invoke tap-mysql

it looks like the newline is being written to the JSON as "\n" though, so I don't think the issue is with the target but with the tap. I actually think it may be the way the data is being read from MySQL and not the conversion to JSON, i.e. dumping the 'row' object to the log shows the \n, but my Python-fu is not strong enough for me to be sure I'm interpreting all this correctly. 😕

visch

11/15/2022, 8:31 PM

JSON can handle newlines, it's escaped like

Copy code

{
  "newlinedata": "FirstLine\\nSecond Line"
}

Specefic scenario does matter 🙂 If you could post an example SCHEMA and then RECORD message from singer then we could all run it against our favorite targets as well and debug with you

visch

11/15/2022, 8:31 PM

Yes it does work!

thomas_briggs

11/15/2022, 8:51 PM

Thanks @visch. Sounds like the issue is with the way tap-mysql (pipelinewise variant, BTW) is reading the data back from the DB... I think PyMySQL is returning a string with the characters "\n" in it so the tap is never actually seeing a newline character... and thus it doesn't properly get escaped in the JSON.

visch

11/15/2022, 8:52 PM

I've never seen that happen but good luck! I'd say look at the actual record being sent to see if that's what's going on

thomas_briggs

11/16/2022, 9:41 PM

For the curious: this is caused by PyMySQL. It replaces the newline that it reads from the DB with the string "\n". facepalm A literal "\n" in the string gets replaced to "\\n", so I think it's technically possible to distinguish a newline from the string "\n", but... it's annoying extra processing that I shouldn't have to do. 😕

Open in Slack

Previous Next