wide-salesclerk-6887101/13/2021, 5:51 PM
Not sure what it means
pipe closed by peer or os.write(pipe, data) raised exception.
ripe-musician-5993301/13/2021, 5:51 PM
wide-salesclerk-6887101/13/2021, 5:53 PM
ripe-musician-5993301/13/2021, 5:54 PM
wide-salesclerk-6887101/13/2021, 5:55 PM
ripe-musician-5993301/13/2021, 5:56 PM
wide-salesclerk-6887101/13/2021, 5:58 PM
replication. It was replicating one of 5 tables when it failed. After upgrading I reran it and it skipped the table it was replicating when I it failed.
is the only mention of it in the run. Not sure if this is expected or not. I’m planning on trying a
tap-mysql | INFO LOG_BASED stream fake_schema-table_name requires full historical sync
, but the other tables worked great so I’d like to avoid it if possible.
can be pretty helpful. I could seperate the large tables into their own plugins allowing me to full-refresh individually if needed
again it’s giving me this error:
tap-mysql | CRITICAL 'NoneType' object has no attribute 'settimeout'
tap-mysql | pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
ripe-musician-5993301/13/2021, 8:44 PM
in combination with
to only fully refresh a specific stream.
wide-salesclerk-6887101/13/2021, 8:45 PM
ripe-musician-5993301/13/2021, 8:45 PM
looks like a legitimate bug in tap-mysql; it would help to see the rest of the stack trace if there is any
'NoneType' object has no attribute 'settimeout'
wide-salesclerk-6887101/13/2021, 8:46 PM
tap-mysql | CRITICAL 'NoneType' object has no attribute 'settimeout' tap-mysql | Traceback (most recent call last): tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/tap_mysql/sync_strategies/full_table.py", line 266, in sync_table tap-mysql | params) tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/tap_mysql/sync_strategies/common.py", line 255, in sync_query tap-mysql | row = cursor.fetchone() tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 469, in fetchone tap-mysql | row = self.read_next() tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/cursors.py", line 464, in read_next tap-mysql | return self._conv_row(self._result._read_rowdata_packet_unbuffered()) tap-mysql | File "/opt/meltano/rails/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 1160, in _read_rowdata_packet_unbuffered tap-mysql | packet = self.connection._read_packet() tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 674, in _read_packet tap-mysql | recv_data = self._read_bytes(bytes_to_read) tap-mysql | File "/opt/meltano/.meltano/extractors/tap-mysql/venv/lib/python3.7/site-packages/pymysql/connections.py", line 707, in _read_bytes tap-mysql | CR.CR_SERVER_LOST, "Lost connection to MySQL server during query") tap-mysql | pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
ripe-musician-5993301/13/2021, 8:47 PM
wide-salesclerk-6887101/13/2021, 8:52 PM
ripe-musician-5993301/13/2021, 8:54 PM
wide-salesclerk-6887101/13/2021, 9:08 PM
and got the same error. I was planning on going a bit higher, but it seemed like x5 would be enough
ripe-musician-5993301/15/2021, 4:46 PM
wide-salesclerk-6887101/15/2021, 5:21 PM
ripe-musician-5993301/15/2021, 5:22 PM
wide-salesclerk-6887101/15/2021, 5:23 PM
ripe-musician-5993301/15/2021, 5:26 PM
wide-salesclerk-6887101/15/2021, 5:49 PM
the first time (the beginning of this thread). I do think it’s a pretty wide table though so that might be it. It looked like the tap was chunking rows about 100,000 at a time. I would assume that would be help control the size of the buffer? That’s why I thought it weird that it was able to handle 1.1m the first time (11 chunks) but won’t even run a single “chunk” the second time. But maybe I’m missing something basic
ripe-musician-5993301/15/2021, 5:52 PM
wide-salesclerk-6887101/15/2021, 5:59 PM
ripe-musician-5993301/15/2021, 6:01 PM
error you saw. So I wouldn't read too much into the "1.1M on the first run, 100k on subsequent runs" thing
pipe closed by peer or os.write(pipe, data) raised exception.
wide-salesclerk-6887101/15/2021, 6:03 PM
ripe-musician-5993301/15/2021, 6:04 PM
wide-salesclerk-6887101/15/2021, 6:06 PM
writes the data to a buffer • The
reads from the buffer too slowly • The
connection times out while it waits for the
ripe-musician-5993301/15/2021, 6:18 PM
at https://github.com/singer-io/tap-mysql/blob/master/tap_mysql/sync_strategies/common.py#L255 as the call that triggers the
, so that's where you'd likely want to add a
block to handle the error by attempting to reconnect. That
method is called here: https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L55-L56, wrapped by contextmanagers that create the connection and cursor, so you'd want to have
communicate to its parent method that it needs to run the whole thing again, starting past the row it did manage to get to successfully. Fortunately, the replication key value of that row would already have been stored in state (https://github.com/singer-io/tap-mysql/blob/master/tap_mysql/sync_strategies/common.py#L248-L251), so just letting the logic get back to https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L24 and try the whole thing again may be enough for it to pick up where it left off. That means that you may want to rescue from the
(with the correct disconnect error ID) around https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/incremental.py#L75 instead of inside
, ensure that
is not set to
, and let the loop continue as normal.
is also called from https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/full_table.py#L260, where you'd want to do something similar: handle the error and make it go back to
, and makes sure you get into the
branch with the correct pk clause matching how far the sync got the last time: https://github.com/singer-io/tap-mysql/blob/f6b0277a0020764b0834aa77651895a7cc550ad7/tap_mysql/sync_strategies/full_table.py#L255
I want to make sure I understand the problem correctly.
Thewrites the data to a buffer
The• The buffer gets filled up completely • The tap is blocked on itsreads from the buffer too slowly
call until half the buffer is depleted
TheMore accurately: while it waits for target to work through half the buffer and for Meltano to unblock the tap'sconnection times out while it waits for the
wide-salesclerk-6887101/15/2021, 6:36 PM
ripe-musician-5993301/15/2021, 6:37 PM
wide-salesclerk-6887101/15/2021, 6:41 PM
ripe-musician-5993301/15/2021, 6:41 PM
wide-salesclerk-6887101/19/2021, 5:11 PM
again, but I wanted to get set up locally to test where I should set it. I got set up and got a different error. For whatever reason when I run this on my local docker images,
doesn’t seem to be respecting the
. The RAM usage just keeps going up until the containers crash and then it fails:
I can see this happening if I watch the
File "/usr/local/lib/python3.7/asyncio/streams.py", line 202, in _drain_helper raise ConnectionResetError('Connection lost')
. Anyway, I realize this is likely an issue with my local setup but if you have any other ideas please let me know. Thanks!
ripe-musician-5993301/19/2021, 5:14 PM
Docker image as well?
wide-salesclerk-6887101/19/2021, 5:17 PM
. We’re running our own docker image (mono-repo using the
image). For what’s it’s work the
table in the metadata db is showing a failed run.
ripe-musician-5993301/19/2021, 5:21 PM
wide-salesclerk-6887101/19/2021, 5:21 PM
setting (https://meltano.com/plugins/loaders/postgres--transferwise.html#batch-size-rows) was set to the default value of 100,000, meaning that it would keep that many rows in memory before flushing to Postgres. Because the rows were relatively wide, that ended up quickly exceeding the 2GB memory limit on the Docker container, resulting in the process being killed and
dying along with it. Dropping
down to 1000 resulted in maximum target memory usage of about 500MB, and the issue disappeared.
wide-salesclerk-6887101/20/2021, 9:22 PM
setting is causing another problem. It’s not syncing the last batch of rows (that is less than 10k). For example: •
• source table row count = 10,500 • After elt the target table will have 10,000 rows. I’m guessing this is an issue with the pipelinewise target?
batch_size_rows = 10000
ripe-musician-5993301/20/2021, 9:26 PM
wide-salesclerk-6887101/20/2021, 9:28 PM
ripe-musician-5993301/20/2021, 9:29 PM
wide-salesclerk-6887101/20/2021, 9:31 PM
ripe-musician-5993301/20/2021, 9:31 PM
again with the
env var? I'd like to see what it prints just before the pipeline completes, when presumably the tap finishes first, and then some time later, the target. It may tell us whether the target is getting killed early for some reason
wide-salesclerk-6887101/20/2021, 9:33 PM
ripe-musician-5993301/20/2021, 9:40 PM
wide-salesclerk-6887101/20/2021, 10:21 PM
ripe-musician-5993301/20/2021, 10:32 PM
<number of streams> * <batch size rows - 1>
wide-salesclerk-6887101/20/2021, 10:53 PM
Ah all right, I’m glad to hear that! Was the thing that tripped you up the fact that it would already move to a different table before finishing up the previous one(s)?Yep! Taking a look at the