Matt Menzenski
02/16/2023, 3:01 AM2023-02-15T17:24:04.365968Z [error ] (psycopg2.OperationalError) could not receive data from server: Connection timed out
SSL SYSCALL error: Connection timed out
[SQL: SELECT runs.state AS runs_state, runs.id AS runs_id, runs.job_name AS runs_job_name, runs.run_id AS runs_run_id, runs.started_at AS runs_started_at, runs.last_heartbeat_at AS runs_last_heartbeat_at, runs.ended_at AS runs_ended_at, runs.payload AS runs_payload, runs.payload_flags AS runs_payload_flags, runs.trigger AS runs_trigger
FROM runs
WHERE runs.state = %(param_1)s AND (runs.last_heartbeat_at IS NOT NULL AND runs.last_heartbeat_at < %(last_heartbeat_at_1)s OR runs.last_heartbeat_at IS NULL AND runs.started_at < %(started_at_1)s) AND runs.job_name = %(job_name_1)s]
[parameters: {'param_1': 'RUNNING', 'last_heartbeat_at_1': datetime.datetime(2023, 2, 15, 17, 2, 51, 894075), 'started_at_1': datetime.datetime(2023, 2, 14, 17, 7, 51, 894075), 'job_name_1': 'dev:tap-google-analytics-1-to-target-redshift'}]
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
christoph
02/16/2023, 3:09 AMchristoph
02/16/2023, 3:27 AMMatt Menzenski
02/16/2023, 5:09 AMI’ll take a stab at it and guess this is a “containerized” environment?I’m running these jobs in GitHub actions, but I’m not running them within a container there.
Matt Menzenski
02/16/2023, 5:11 AMtap-google-analytics
- it’s the first (historical) load so there’s a lot of data, maybe there’s a long-running API call? thinkspinchristoph
02/16/2023, 5:32 AMI’m running these jobs in GitHub actions, but I’m not running them within a container there.I was just putting a hypothesis out there that some container networking might be what's tearing down the idle TCP stream. Sounds like that's not the case. Is there a good old "Enterprise Firewall" between the Github Runner and the AWS RDS instance? Or is it a Microsoft managed runner?
christoph
02/16/2023, 5:38 AMThis issue occurs for me consistently when trying to load google analytics data withWell, the hypothesis here is that the Meltano System DB connection (managed via SQLalchemy) is being kept open for the whole duration of the tap-to-target invocation (i.e. the "run", not sure what the correct terminology of it is 😂) And then some stateful network device sees that the TCP stream from the Github runner to the AWS RDS endpoint hasn't really had any traffic going on (because meltano is waiting for that one long-running stream to finish running so that it can save the bookmark for it in the state table) and the network device probably has some idle TCP session logic that tears the session down. And when the long-running stream finally completes and meltano wants write that state information for the stream into the system db, that's when the TCP stack on the Github runner says: "nope. sorry, that TCP socket is not there anymore" I could be wildly off with this speculation, but that is what is smells like to me.- it’s the first (historical) load so there’s a lot of data, maybe there’s a long-running API call?tap-google-analytics