e
```meltano | Running extract & load... tap-mycustomtap | Installing dependencies from lock file tap-mycustomtap | tap-mycustomtap | No dependencies to install or update tap-mycustomtap | tap-mycustomtap | Installing the current project: tap-mycustomtap (0.0.1) tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=tap-mycustomtap v0.0.1, Meltano SDK v0.3.6) tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=Skipping parse of env var settings... tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=Config validation passed with 0 errors and 0 warnings. tap-mycustomtap | time=2021-09-16 091707 name=root level=INFO message=Operator '{MAPPER_ELSE_OPTION}=None' was not found. Unmapped streams will be included in output. tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=Beginning full_table sync of 'my_target_table'... tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=Tap has custom mapper. Using 1 provided map(s). tap-mycustomtap | time=2021-09-16 091707 name=tap-mycustomtap level=INFO message=TESTING LOGGING STATEMENT??!! target-postgres | time=2021-09-16 091707 name=target_postgres level=INFO message=Table '"my_target_table"' exists tap-mycustomtap | time=2021-09-16 091708 name=tap-mycustomtap level=WARNING message=Property 'alphabet_partition' was present in the 'my_target_table' stream but not found in catalog schema. Ignoring. tap-mycustomtap | time=2021-09-16 091732 name=tap-mycustomtap level=INFO message=TESTING LOGGING STATEMENT??!! tap-mycustomtap | time=2021-09-16 094002 name=tap-mycustomtap level=INFO message=TESTING LOGGING STATEMENT??!! tap-mycustomtap | time=2021-09-17 050613 name=tap-mycustomtap level=INFO message=INFO METRIC: {'type': 'counter', 'metric': 'record_count', 'value': 602512, 'tags': {'stream': 'my_target_table'}} target-postgres | time=2021-09-17 050613 name=target_postgres level=INFO message=Loading 909 rows into 'tap_mycustomtap."my_target_table"' target-postgres | Traceback (most recent call last): target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> target-postgres | sys.exit(main()) target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 373, in main target-postgres | persist_lines(config, singer_messages) target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 239, in persist_lines target-postgres | flushed_state = flush_streams(records_to_load, row_count, stream_to_sync, config, state, flushed_state) target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 288, in flush_streams target-postgres | Parallel()(delayed(load_stream_batch)( target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/joblib/parallel.py", line 1029, in call target-postgres | if self.dispatch_one_batch(iterator): target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/joblib/parallel.py", line 847, in dispatch_one_batch target-postgres | self._dispatch(tasks) target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/joblib/parallel.py", line 765, in _dispatch target-postgres | job = self._backend.apply_async(batch, callback=cb) target-postgres | File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/…
full Log of run
big thanks to Meltano team for all the help getting me here and @visch your logging tip helped a bunch you can see it doing it's thing
Copy code
tap-mycustomtap        | time=2021-09-16 09:17:32 name=tap-mycustomtap level=INFO message=TESTING LOGGING STATEMENT??!!
oh one other thing I will try.. is to stop doing "full sync" runs... these are likely the reason it's not adding ANY records from 1 context to the next
seems transferwise doesn't support anything but full table mirroring.. I might setup each of the 3 variants of postgres and test further somehow
v
The speed issue I'd wait before you jump to C++ to figure out what the core issue is, use a performance tool to see what's taking so much time First is to get the data flowing properly For me when I'm debugging those pesky row issues (1 row out of millions) I try hard to setup the tap in a way to detect those issues and not let them happen again. For the first run through I find where it's failing and try to just wrap it in a try catch (throw in Python I think) and just swallow the error, to be sure that's the only issue I have left
A lot of time if my tap will run successfully but it's failing on the target side the easy thing to do is to make a big out file from your tap, ie meltano invoke tap > out
and then just pipe it into your target cat out | target-blah If your data is small enough this works It's real nice to isolate issues (singer framework helps with a huge class of errors just from the piping functionality and breaking it apart yourself)
e
@visch the issue is my code.. so sadly there's no need to review further really ( my first python based version was quite brute force) I've actually written this weekend the C++ version and ecstatic to see.. I am performing on par with back of the napkin figures I'm going to demo to a few more people but.. pretty soon ready to start showing how I setup meltano and push more updates to the open source charm that I use
what I've noticed is.. when I use the C++ version of the vendors API.. everything runs smoother.. I think this is 2 fold 1. this piece is more tested than their python API 2. my code is cognizant that you need to pace the calls 1 call per second per client.. so far with that logic in place it's been super smooth sailing
if my next calculations are correct.. I will be able to query the entire dataset in about 45 minutes.. but i need to do some juju magic to instantiate multiple instances of this client.. but to the tap it will just be a different port to spread queries out amongst it's list of queries
OR I spend some effort and create a thread pool in C++.. and handle it that way.. with a pool of clients.. and to the tap it's all just one end point.. this is likely the most desirable way but takes more time
To Be Continued.. again @visch big thanks on that logging thing
I owe you a beverage of your choice
v
haha no Beverage needed ... yet! I do want to come visit Sweden again in the next few years skål 😄