https://meltano.com/ logo
#announcements
Title
# announcements
w

wide-salesclerk-68871

01/13/2021, 5:51 PM
Just tried running a the first job for about 5m rows from MySQL to Postgres. It was humming along and then I got this error:
Copy code
pipe closed by peer or os.write(pipe, data) raised exception.
Not sure what it means
1
r

ripe-musician-59933

01/13/2021, 5:51 PM
@wide-salesclerk-68871 What Meltano version are you on?
w

wide-salesclerk-68871

01/13/2021, 5:53 PM
1.63.0
r

ripe-musician-59933

01/13/2021, 5:54 PM
@wide-salesclerk-68871 That explains it then; this issue (https://gitlab.com/meltano/meltano/-/issues/2478) was fixed in v1.64.0: https://meltano.slack.com/archives/CP8K1MXAN/p1610064894064300 🙂
w

wide-salesclerk-68871

01/13/2021, 5:55 PM
ahhh cool
I’m guessing there isn’t any reason not to just bump up to 1.65.0
So yeah, I suggest going straight to the latest!
w

wide-salesclerk-68871

01/13/2021, 5:58 PM
Thanks!
I’m using
LOG_BASED
replication. It was replicating one of 5 tables when it failed. After upgrading I reran it and it skipped the table it was replicating when I it failed.
Copy code
tap-mysql       | INFO LOG_BASED stream fake_schema-table_name requires full historical sync
is the only mention of it in the run. Not sure if this is expected or not. I’m planning on trying a
--full-refresh
, but the other tables worked great so I’d like to avoid it if possible.
I’m guessing this is an example where
plugin inheritance
can be pretty helpful. I could seperate the large tables into their own plugins allowing me to full-refresh individually if needed
Actually, now when I try running it without
--full-refresh
again it’s giving me this error:
Copy code
tap-mysql       | CRITICAL 'NoneType' object has no attribute 'settimeout'
and
Copy code
tap-mysql       | pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
r

ripe-musician-59933

01/13/2021, 8:44 PM
You can use
--full-refresh
in combination with
--select=<stream>
to only fully refresh a specific stream.
w

wide-salesclerk-68871

01/13/2021, 8:45 PM
ohhhhh
cool
I’ll play with that then. Thanks!
r

ripe-musician-59933

01/13/2021, 8:45 PM
'NoneType' object has no attribute 'settimeout'
looks like a legitimate bug in tap-mysql; it would help to see the rest of the stack trace if there is any
w

wide-salesclerk-68871

01/13/2021, 8:46 PM
Copy code
tap-mysql       | CRITICAL 'NoneType' object has no attribute 'settimeout'
tap-mysql       | Traceback (most recent call last):
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/tap_mysql/sync_strategies/full_table.py", line 266, in sync_table
tap-mysql       |     params)
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/tap_mysql/sync_strategies/common.py", line 255, in sync_query
tap-mysql       |     row = cursor.fetchone()
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 469, in fetchone
tap-mysql       |     row = self.read_next()
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 464, in read_next
tap-mysql       |     return self._conv_row(self._result._read_rowdata_packet_unbuffered())
tap-mysql       |   File "/opt/meltano/rails/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 1160, in _read_rowdata_packet_unbuffered
tap-mysql       |     packet = self.connection._read_packet()
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 674, in _read_packet
tap-mysql       |     recv_data = self._read_bytes(bytes_to_read)
tap-mysql       |   File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 707, in _read_bytes
tap-mysql       |     CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
tap-mysql       | pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
r

ripe-musician-59933

01/13/2021, 8:47 PM
That could be caused by the extractor being blocked for a while while the loader is working through the buffered messages, with the connection having expired by the time the extractor is unblocked. Increasing the buffer size may help: https://meltano.com/docs/settings.html#elt-buffer-size
Ideally the tap would handle the connection failure better by attempting to reconnect, but that'd need to be contributed to the tap; not much Meltano can do to help
w

wide-salesclerk-68871

01/13/2021, 8:52 PM
ok cool. Thanks. It’s interesting because I didn’t have this issue on the initial run. Only the subsequent one
r

ripe-musician-59933

01/13/2021, 8:54 PM
The exact timing of the extractor and loader won't be the same each time, of course, so it's possible that in second run the delay just happened to be slightly over the mysql timeout, while it was slightly under before
w

wide-salesclerk-68871

01/13/2021, 9:08 PM
ahhhh, that makes sense. I’ll try adjusting the buffer.
I tried updating the buffer to
52428800
and got the same error. I was planning on going a bit higher, but it seemed like x5 would be enough
r

ripe-musician-59933

01/15/2021, 4:46 PM
@wide-salesclerk-68871 How many records are you syncing in this run? You'll want to make the buffer large enough to fit the entire batch of extracted records so that the tap can finish in one go without being blocked when the buffer is full, since it's that block that creates the delay that leads to the connection expiring
A more durable fix would be to have the tap attempt to retry the connection, but that would need to be done in a contribution to https://github.com/transferwise/pipelinewise-tap-mysql
w

wide-salesclerk-68871

01/15/2021, 5:21 PM
gotcha. I’m actually not using the piplinewise tap, I’m using the singer version.
r

ripe-musician-59933

01/15/2021, 5:22 PM
Ah, all right. Same deal though 🙂
👍 1
w

wide-salesclerk-68871

01/15/2021, 5:23 PM
but that fits with what I was thinking (I think). The table is ~5 million rows. It failed after about 1.1 million had synced. So it would need to be able to hold the 1.1 million rows? Right? Also, this is because it failed partway through the run right? I didn’t have this issue on the initial run of another table that was a similar size
r

ripe-musician-59933

01/15/2021, 5:26 PM
if it failed after about 1.1 million rows, that suggests that 1.1 million rows fit into the buffer, and then when the buffer was (mostly) depleted, the tap was given back control to start querying for new rows again, at which point it found that the connection had expired. So to prevent the tap from ever getting back control after a delay, the buffer would need to be big enough to fit all 5 million rows, so that the tap can run and run and run until it's done
I don't think it'd be related to being partway through the run; 2 runs don't typically impact each other in that way, although it's possible that the tap uses different mechanism for the initial full table load and subsequent incremental loads, which could change the picture.
Are the tables of similar size of similar width (column count + per-row data size) as well?
A same-sized buffer can contain many more narrow records than wide ones, of course, so it's possible that 5M of one table fit, while only 1M of another do
So with the narrow table, tap-mysql could get all of its work done in one run, while with the wider table it would need to continue after the buffer was depleted, hitting the connection timeout
Increasing the buffer size in Meltano is a workaround to prevent that timeout, but the real fix would be better connection handling in the tap, as mentioned 🙂
w

wide-salesclerk-68871

01/15/2021, 5:49 PM
Totally. I’m sorry I should be clear. I’m pretty sure it failed due to the bug in
1.63.0
the first time (the beginning of this thread). I do think it’s a pretty wide table though so that might be it. It looked like the tap was chunking rows about 100,000 at a time. I would assume that would be help control the size of the buffer? That’s why I thought it weird that it was able to handle 1.1m the first time (11 chunks) but won’t even run a single “chunk” the second time. But maybe I’m missing something basic
r

ripe-musician-59933

01/15/2021, 5:52 PM
When you say "the first time", was that on 1.63.0?
w

wide-salesclerk-68871

01/15/2021, 5:59 PM
yes
r

ripe-musician-59933

01/15/2021, 6:01 PM
That explains it then: pre-1.64.0, the buffer size limit wasn't enforced at all, so the tap could keep going and the buffer would keep growing until the target would die (usually for OOM reasons) and Meltano would fail with the
pipe closed by peer or os.write(pipe, data) raised exception.
error you saw. So I wouldn't read too much into the "1.1M on the first run, 100k on subsequent runs" thing
100K rows appears to be what currently fits into your configured buffer size
w

wide-salesclerk-68871

01/15/2021, 6:03 PM
Cool. So for now raising the buffer size is the way to go. Makes sense. Makes me a bit nervous for the initial run on some even larger tables we have though
I guess I can just keep making the buffer bigger and bigger for now
r

ripe-musician-59933

01/15/2021, 6:04 PM
Yeah, or rip off the bandaid and contribute reconnection logic 😉
w

wide-salesclerk-68871

01/15/2021, 6:06 PM
Made a note to take a look. I’m happy to try, but it might take me a minute to figure out where to start. If you have any ideas of where to start searching feel free to point me in the right direction 🙂
I want to make sure I understand the problem correctly. • The
tap
writes the data to a buffer • The
target
reads from the buffer too slowly • The
tap's
connection times out while it waits for the
target
to finish.
r

ripe-musician-59933

01/15/2021, 6:18 PM
The stacktrace in https://meltano.slack.com/archives/CFG3C3C66/p1610570783210400?thread_ts=1610560265.206500&amp;cid=CFG3C3C66 points at
cursor.fetchone()
at https://github.com/singer-io/tap-mysql/blob/master/tap_mysql/sync_strategies/common.py#L255 as the call that triggers the
pymysql.err.OperationalError
, so that's where you'd likely want to add a
try/except
block to handle the error by attempting to reconnect. That
sync_query
method is called here: https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L55-L56, wrapped by contextmanagers that create the connection and cursor, so you'd want to have
sync_query
communicate to its parent method that it needs to run the whole thing again, starting past the row it did manage to get to successfully. Fortunately, the replication key value of that row would already have been stored in state (https://github.com/singer-io/tap-mysql/blob/master/tap_mysql/sync_strategies/common.py#L248-L251), so just letting the logic get back to https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L24 and try the whole thing again may be enough for it to pick up where it left off. That means that you may want to rescue from the
OperationError
(with the correct disconnect error ID) around https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L75 instead of inside
sync_query
, ensure that
iterate_limit
is not set to
False
, and let the loop continue as normal.
sync_query
is also called from https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/full_table.py#L260, where you'd want to do something similar: handle the error and make it go back to
with connect_with_backoff
, and makes sure you get into the
perform_resumable_sync
branch with the correct pk clause matching how far the sync got the last time: https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/full_table.py#L255
I want to make sure I understand the problem correctly.
The 
tap
 writes the data to a buffer
The 
target
 reads from the buffer too slowly
• The buffer gets filled up completely • The tap is blocked on its
write
call until half the buffer is depleted
The 
tap's
 connection times out while it waits for the 
target
 to finish.
More accurately: while it waits for target to work through half the buffer and for Meltano to unblock the tap's
write
call
As long as it takes the target to work through half the buffer is as long as the tap will be blocked on the write call
As for contributing better disconnection handling to tap-mysql: I suggest contributing to pipelinewise-tap-mysql instead of the singer-io variant since it's much more actively maintained
w

wide-salesclerk-68871

01/15/2021, 6:36 PM
Fair. The problem is that we can’t use that tap until it respects column selection properly (the other thread we’ve been pinging back and forth in)
r

ripe-musician-59933

01/15/2021, 6:37 PM
Right... Fortunately pipelinewise-tap-mysql is based on singer-io/tap-mysql, so your disconnection fix in the latter may be easily portable to the former
w

wide-salesclerk-68871

01/15/2021, 6:41 PM
rad. Thanks again for all your help!
r

ripe-musician-59933

01/15/2021, 6:41 PM
My pleasure!
w

wide-salesclerk-68871

01/19/2021, 5:11 PM
Hi again! Got this working, but then it blew up again. I think I just need to up the
BUFFER
again, but I wanted to get set up locally to test where I should set it. I got set up and got a different error. For whatever reason when I run this on my local docker images,
meltano
doesn’t seem to be respecting the
BUFFER
. The RAM usage just keeps going up until the containers crash and then it fails:
Copy code
File "/usr/local/lib/python3.7/asyncio/streams.py", line 202, in _drain_helper
    raise ConnectionResetError('Connection lost')
I can see this happening if I watch the
docker stats
. Anyway, I realize this is likely an issue with my local setup but if you have any other ideas please let me know. Thanks!
r

ripe-musician-59933

01/19/2021, 5:14 PM
@wide-salesclerk-68871 Hmm, we saw that error in 1.64.0: https://gitlab.com/meltano/meltano/-/issues/2500, and I (thought I) fixed the root cause in 1.64.1. I assume you're still on the latest version? Have you pulled the latest
meltano/meltano:latest-python3.7
Docker image as well?
w

wide-salesclerk-68871

01/19/2021, 5:17 PM
I’m on
1.65.0
. We’re running our own docker image (mono-repo using the
apache/airflow:1.10.11-python3.7
image). For what’s it’s work the
job
table in the metadata db is showing a failed run.
Also, I don’t think this was happening on Production, which seems weird
r

ripe-musician-59933

01/19/2021, 5:21 PM
@wide-salesclerk-68871 Wanna jump on a quick Zoom call so I can see this for myself and help debug?
w

wide-salesclerk-68871

01/19/2021, 5:21 PM
sure!
r

ripe-musician-59933

01/19/2021, 5:23 PM
The issue turned out to be that `target-postgres`'s
bath_size_rows
setting (https://meltano.com/plugins/loaders/postgres--transferwise.html#batch-size-rows) was set to the default value of 100,000, meaning that it would keep that many rows in memory before flushing to Postgres. Because the rows were relatively wide, that ended up quickly exceeding the 2GB memory limit on the Docker container, resulting in the process being killed and
meltano elt
dying along with it. Dropping
batch_size_rows
down to 1000 resulted in maximum target memory usage of about 500MB, and the issue disappeared.
w

wide-salesclerk-68871

01/20/2021, 9:22 PM
Hi! I’m back 😉. No more ram issues, but the
batch_size_rows
setting is causing another problem. It’s not syncing the last batch of rows (that is less than 10k). For example: •
batch_size_rows = 10000
• source table row count = 10,500 • After elt the target table will have 10,000 rows. I’m guessing this is an issue with the pipelinewise target?
r

ripe-musician-59933

01/20/2021, 9:26 PM
w

wide-salesclerk-68871

01/20/2021, 9:28 PM
Interesting. I’ll do some digging, but it’s pretty consistent. In fact if the table is smaller than the batch size nothing gets synced
r

ripe-musician-59933

01/20/2021, 9:29 PM
Well, that's not right. I wonder if Meltano is not letting the target finish once the tap is finished, for some reason
Wanna jump on another call?
w

wide-salesclerk-68871

01/20/2021, 9:31 PM
Can’t right now. Will also need to set up some dummy data if we’re going to do a call again (per my security team 😬 )
Dummy schema more accurately
the data was dummy data
r

ripe-musician-59933

01/20/2021, 9:31 PM
Hehe all right
Can you run
meltano elt
again with the
PYTHONASYNCIODEBUG=1
env var? I'd like to see what it prints just before the pipeline completes, when presumably the tap finishes first, and then some time later, the target. It may tell us whether the target is getting killed early for some reason
w

wide-salesclerk-68871

01/20/2021, 9:33 PM
👍
r

ripe-musician-59933

01/20/2021, 9:40 PM
If the tap finishes before the target, Meltano will flush all remaining buffered tap output, close the target's stdin, wait for the target to finish generating output, and then wait for its process to finish: https://gitlab.com/meltano/meltano/-/blob/master/src/meltano/core/runner/singer.py#L210-219 When the target's stdin is closed, it will know it's not going to receive any more input and break out of this loop once it's worked through its own input buffer: https://github.com/transferwise/pipelinewise-target-postgres/blob/master/target_postgres/__init__.py#L100 That shouldn't cause the target to immediately quit though, since it should fall through to that logic I just linked to: https://github.com/transferwise/pipelinewise-target-postgres/blob/master/target_postgres/__init__.py#L237
I'm not sure what would cause the target to not make it to that point at all
w

wide-salesclerk-68871

01/20/2021, 10:21 PM
Nevermind. I got it
It flushes all of the tables at the end of the run
it looks good now
r

ripe-musician-59933

01/20/2021, 10:32 PM
Ah all right, I'm glad to hear that! Was the thing that tripped you up the fact that it would already move to a different table before finishing up the previous one(s)?
You may want to enable the
flush_all_streams
setting so that it flushes all streams when one batch is full: https://meltano.com/plugins/loaders/postgres--transferwise.html#flush-all-streams
I didn't realize it keeps a batch per stream
Right now, memory usage could theoretically build up to
<number of streams> * <batch size rows - 1>
w

wide-salesclerk-68871

01/20/2021, 10:53 PM
Ah all right, I’m glad to hear that! Was the thing that tripped you up the fact that it would already move to a different table before finishing up the previous one(s)?
Yep! Taking a look at the
flush_all_streams
setting. Thanks!
👍 1
r

ripe-musician-59933

01/22/2021, 9:48 PM
I've created a new issue for the
ConnectionResetError
you saw: https://gitlab.com/meltano/meltano/-/issues/2526