dylan_just
09/09/2024, 12:48 PMdylan_just
09/09/2024, 12:48 PM9
2024-09-09T21:55:59.825+10:00
2024-09-09 11:55:59,824 | INFO | tap-postgres | Skipping deselected stream 'public-secure_collection_item_files'.
10
2024-09-09T21:55:59.825+10:00
2024-09-09 11:55:59,825 | INFO | tap-postgres | Skipping deselected stream 'public-secure_collection_items'.
11
2024-09-09T21:55:59.825+10:00
2024-09-09 11:55:59,825 | INFO | tap-postgres | Skipping deselected stream 'public-secure_collection_permissions'.
12
2024-09-09T21:55:59.824+10:00
2024-09-09 11:55:59,823 | INFO | singer_sdk.metrics | METRIC: {"type": "timer", "metric": "sync_duration", "value": 0.22330546379089355, "tags": {"stream": "public-practitioners", "context": {}, "status": "succeeded"}}
13
2024-09-09T21:55:59.824+10:00
2024-09-09 11:55:59,824 | INFO | singer_sdk.metrics | METRIC: {"type": "counter", "metric": "record_count", "value": 2890, "tags": {"stream": "public-practitioners", "context": {}}}
14
2024-09-09T21:55:59.824+10:00
2024-09-09 11:55:59,824 | INFO | tap-postgres | Skipping deselected stream 'public-practitioners_services'.
15
2024-09-09T21:55:59.824+10:00
2024-09-09 11:55:59,824 | INFO | tap-postgres | Skipping deselected stream 'public-practitioners_unavailabilities'.
16
2024-09-09T21:55:59.817+10:00
2024-09-09 11:55:59,816 | INFO | target-redshift | Target 'target-redshift' is listening for input from tap.
17
2024-09-09T21:55:59.817+10:00
2024-09-09 11:55:59,816 | INFO | target-redshift | Initializing 'target-redshift' target sink...
18
2024-09-09T21:55:59.817+10:00
2024-09-09 11:55:59,817 | INFO | target-redshift.public-forms | Initializing target sink for stream 'public-forms'...
19
2024-09-09T21:55:59.601+10:00
2024-09-09 11:55:59,600 | INFO | tap-postgres.public-practitioners | Beginning full_table sync of 'public-practitioners'...
20
2024-09-09T21:55:59.601+10:00
2024-09-09 11:55:59,600 | INFO | tap-postgres.public-practitioners | Tap has custom mapper. Using 1 provided map(s).
21
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,599 | INFO | tap-postgres | Skipping deselected stream 'public-forms_access'.
22
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,599 | INFO | tap-postgres | Skipping deselected stream 'public-instances'.
23
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,599 | INFO | tap-postgres | Skipping deselected stream 'public-keys'.
24
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,599 | INFO | tap-postgres | Skipping deselected stream 'public-knex_migrations'.
25
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,600 | INFO | tap-postgres | Skipping deselected stream 'public-knex_migrations_lock'.
26
2024-09-09T21:55:59.600+10:00
2024-09-09 11:55:59,600 | INFO | tap-postgres | Skipping deselected stream 'public-notifications'.
27
2024-09-09T21:55:59.599+10:00
2024-09-09 11:55:59,599 | INFO | singer_sdk.metrics | METRIC: {"type": "timer", "metric": "sync_duration", "value": 0.13203024864196777, "tags": {"stream": "public-forms", "context": {}, "status": "succeeded"}}
28
2024-09-09T21:55:59.599+10:00
2024-09-09 11:55:59,599 | INFO | singer_sdk.metrics | METRIC: {"type": "counter", "metric": "record_count", "value": 1, "tags": {"stream": "public-forms", "context": {}}}
29
2024-09-09T21:55:59.467+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres | Skipping deselected stream 'public-form_requests'.
30
2024-09-09T21:55:59.467+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres.public-forms | Beginning incremental sync of 'public-forms'...
31
2024-09-09T21:55:59.467+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres.public-forms | Tap has custom mapper. Using 1 provided map(s).
32
2024-09-09T21:55:59.466+10:00
2024-09-09 11:55:59,465 | INFO | tap-postgres | Skipping deselected stream 'public-audit_collection_events'.
33
2024-09-09T21:55:59.466+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres | Skipping deselected stream 'public-bookings'.
34
2024-09-09T21:55:59.466+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres | Skipping deselected stream 'public-communications'.
35
2024-09-09T21:55:59.466+10:00
2024-09-09 11:55:59,466 | INFO | tap-postgres | Skipping deselected stream 'public-form_request_forms'.
36
2024-09-09T21:55:55.941+10:00
Environment 'au' is active
37
2024-09-09T21:55:54.791+10:00
running tap [tap-agnodice-v3] with mappers: [mapper-agnodice] to target [target-warehouse-redshift]
38
2024-09-09T21:55:54.791+10:00
Executing: pipenv run .meltano/run/bin --environment=au run --no-install tap-agnodice-v3 mapper-agnodice target-warehouse-redshift
39
2024-09-09T21:55:53.501+10:00
> meltano-pipeline@1.0.0 el
40
2024-09-09T21:55:53.501+10:00
> ts-node ./scripts/extractLoad.ts
41
2024-09-09T21:55:53.298+10:00
> meltano-pipeline@1.0.0 elt
dylan_just
09/09/2024, 12:48 PMdylan_just
09/09/2024, 12:49 PMdylan_just
09/09/2024, 12:50 PMvisch
09/09/2024, 1:05 PMmeltano run tap-postgres target-csv
what I would do is enable debugging logs to see if I can find any clues, to get to the bottom most quickly, I"d try to get the stack trace from whatever is calling the application.
Most of the time when I see these things with folks with their custom orchestration scripts they do something wrong with process handling and end up having a full buffer from stdout or something OS-related they missed when implementing the subprocess flow. Complete guess but that's the most common thing I've seen.dylan_just
09/09/2024, 1:07 PMpipenv run .meltano/run/bin --environment=au run --no-install tap-agnodice-v3 mapper-agnodice target-warehouse-redshift
dylan_just
09/09/2024, 1:08 PMvisch
09/09/2024, 1:08 PMpipenv run .meltano/run/bin --environment=au run --no-install tap-agnodice-v3 mapper-agnodice target-warehouse-redshift
isn't meltanolabs-tap-postgres => ticketswap-target-redshift
which is one part of my multiple layers of confusiondylan_just
09/09/2024, 1:09 PMvisch
09/09/2024, 1:10 PMdylan_just
09/09/2024, 1:12 PMvisch
09/09/2024, 1:13 PMvisch
09/09/2024, 1:14 PMdylan_just
09/09/2024, 1:16 PMdylan_just
09/09/2024, 1:27 PMenable debugging logs to see if I can find any cluesNot sure exactly what you mean by this. Is this like setting a verbose debug level in meltano or the tap?
haleemur_ali
09/09/2024, 1:43 PMpipenv run .meltano/run/bin --log-level=debug --environment=au ...
dylan_just
09/09/2024, 11:10 PMdylan_just
09/10/2024, 1:20 AM1
2024-09-10T09:56:27.282+10:00
head producer completed first as expected
2
2024-09-10T09:56:27.211+10:00
head producer completed first as expected
3
2024-09-10T09:56:27.211+10:00
waiting for process completion or exception
4
2024-09-10T09:56:27.185+10:00
{"type":"STATE","value":{"bookmarks":{"public-forms":{"replication_key":"updated_at","replication_key_value":"2024-09-05T04:53:09.535000+00:00"},"public-practitioners":{}}}}
haleemur_ali
09/11/2024, 2:40 AMhaleemur_ali
09/11/2024, 2:41 AMdylan_just
09/11/2024, 2:43 AMhaleemur_ali
09/11/2024, 2:49 AMdylan_just
09/11/2024, 2:51 AMhaleemur_ali
09/11/2024, 3:06 AMdylan_just
09/11/2024, 3:10 AMhaleemur_ali
09/11/2024, 3:20 AMdylan_just
09/11/2024, 3:22 AMdylan_just
09/11/2024, 3:30 AMhaleemur_ali
09/11/2024, 12:21 PMconnect_args
. i'm not sure you can specify the timeout in the url.
https://stackoverflow.com/questions/35640726/how-to-set-connection-timeout-in-sqlalchemyvisch
09/11/2024, 1:18 PMdylan_just
09/11/2024, 10:55 PMCharles Feduke
09/12/2024, 12:28 AMconnect_timeout
you should be able to pass it in via the URL something like this:
<postgresql+psycopg://url/database?connect_timeout=10>
where 10 represents 10 seconds. I use this to set my application_name
so things can be identifiedEdgar Ramírez (Arch.dev)
09/12/2024, 6:40 PMconnect_timeout
, neither in the sqlalchemy docs nor in the postgres docs so I wonder if it actually works.Charles Feduke
09/12/2024, 7:04 PMCharles Feduke
09/12/2024, 7:04 PMCharles Feduke
09/12/2024, 7:05 PMEdgar Ramírez (Arch.dev)
09/12/2024, 7:05 PMhaleemur_ali
09/12/2024, 7:05 PMCharles Feduke
09/12/2024, 7:07 PMcreate_engine(str, kwargs)
where a connection string is passed wholesaleCharles Feduke
09/12/2024, 7:09 PM.env
and by retrieving SSM parameters in an Airflow plugin as a single string like this:
MELTANO_DATABASE_URI=postgresql+psycopg://$DATA_WAREHOUSE_DB_USER:$DATA_WAREHOUSE_DB_PASSWORD@$DATA_WAREHOUSE_DB_HOST:5432/some_database_here?options=-csearch_path%3Ddata_warehouse_state
and this has worked just fine, I definitely have state in production and its been running for a few months 🙂 I use application_name
in some of the other connection strings in tap-postgres but that’s a different use case entirelyCharles Feduke
09/12/2024, 7:10 PMconnect_timeout
is not an option, which just means it’s not part of the options=-c…
stuffCharles Feduke
09/12/2024, 7:10 PMCharles Feduke
09/12/2024, 7:21 PMMELTANO_DATABASE_URI="postgresql+psycopg://$DATA_WAREHOUSE_DB_USER:$DATA_WAREHOUSE_DB_PASSWORD@$DATA_WAREHOUSE_DB_HOST:5432/some_database_here?connect_timeout=1&options=-csearch_path%3Ddata_warehouse_state"
(note quotes around the variable value because I added a &
)
and disconnected from the VPN so I could not resolve my staging data warehouse via host name, and then ran a tap->target and immediately got:
2024-09-12T19:11:36.571551Z [info ] Environment 'stg' is active
2024-09-12T19:11:36.735105Z [info ] DB connection failed. Will retry after 5s. Attempt 1/3
2024-09-12T19:11:41.751013Z [info ] DB connection failed. Will retry after 5s. Attempt 2/3
2024-09-12T19:11:46.764098Z [info ] DB connection failed. Will retry after 5s. Attempt 3/3
then I changed it to connect_timeout=10
and verified the connection attempt time was more or less 10 seconds between each of the 5s retries. Then I connected to the VPN and ran the tap->target successfully. So, at least via the MELTANO_DATABASE_URI
this connection string parameter option does work.Charles Feduke
09/12/2024, 7:24 PMpostgresql+psycopg2://
as your connection string and then you can do everything via the PG_
variables… if that was a thing you wanted to do