I have an install with several extractors inheriti...
# plugins-general
e
I have an install with several extractors inheriting from tap-postgres, transferwise variant. Now, I want to try out 1 new extractor that is my own fork for transferwise-tap-postgres. Can I just do it like this?
Copy code
- name tap-postgres-experiment
  inherit_from: tap-postgres
  variant: transferwise
  pip_url: git+<https://github.com/ers81239/pipelinewise-tap-postgres.git@time_based_sync>
  config: ....
I've tried this and then
meltano install extractor  tap-postgres-experiment
but I get
Extractor 'tap-postgres-cb_stats' is not known to Meltano
.
d
@edward_smith Can you try removing the
variant: transferwise
?
I assume that's already mentioned under the
tap-postgres
definition being inherited from
e
I actually don't have an entry with
-name tap-postgres
in meltano.yml at this point, although all of the other extractors are working with that same inherit line in their config. I'll get to trying some things on this in a couple hours.
I have "base" 4 extractors which are named after the database server that they connect to, and then I have 10 extractors which inherit from the 4 base extractors with each one handling some or all of the tables from their assigned/inherited database config. The 4 inherit from "tap-postgres" although there is no extractor by that name in the file.
I'm having trouble making progress here... is it supported to have 2 variants of the same tap installed at the same time? It seems that I may need to either add mine as a customer extractor or use another instance of meltano
d
is it supported to have 2 variants of the same tap installed at the same time?
Yes, that's fine. Can you share some more of your
meltano.yml
? The error you shared mentions
tap-postgres-cb_stats
but I haven't seen the definition for that one
e
Sure... I have 4 like this:
Copy code
extractors:
  - name: tap-postgres-cb_tipping
    inherit_from: tap-postgres
    variant: transferwise
    config:
      host: 10.128.32.63
      user: data
      dbname: cb_tipping
each with a different name/host/credentials
Then a bunch that follow this pattern of inheriting from one of the above and then naming specific tables to sync. This is done to parallelize the sync across the 5 available db connections:
Copy code
- name: cb_tipping-0
    inherit_from: tap-postgres-cb_tipping
    metadata:
      public-tipping_bonustokensclaimed:
        replication-method: INCREMENTAL
        replication-key: id
    select:
    - public-tipping_bonustokensclaimed.*
cb_tipping-0 through cb_tipping-4 each have around 10 tables defined.
(edited) The above is all working well, but I have 1 table that can't sync using the stock methods, so I'm building a new sync strategy:
Copy code
- name: tap-postgres-cb_stats
    variant: transferwise
    pip_url: git+<https://github.com/ers81239/pipelinewise-tap-postgres.git@time_based_sync>
    config:
      host: 10.128.41.10
      user: data
      dbname: cb_stats
      log_level: DEBUG
d
@edward_smith Can you add
inherit_from: tap-postgres
to that
tap-postgres-cb_stats
definition?
e
Sure
And then
meltano install tap-postgres-cb_stats
?
d
Yep
meltano install extractor tap-postgres-cb_stats
, that is
e
Right... ok, that ran... which I figured I would have tried this combo already, but maybe not.... let me see if it got the right version
Sure did, thanks!
d
🎉
e
Its running now... overall my sense is that this would be simpler if the name parameter were deconflicted from the desired plugin... possibly by adding a
plugin
configuration item so that you could have:
Copy code
-name     production-db-tap
 plugin:  tap-postgres
 variant: pipelinewise
 pip_url: http+git.......
d
Yeah, I can see that. The current behavior is described here: https://meltano.com/docs/project.html#shadowing-plugin-definitions. Want to create an issue for that? 🙂
e
Ahhh... so
inherit_from
basically works like
plugin
as I was describing. That makes sense.... something I kept coming back to which is a little confusing is this sentence in the docs:
To add a new plugin to your project that inherits from an existing plugin, so that it can reuse the same package but override (parts of) its configuration, you can use the 
--inherit-from
 option on 
meltano add
In this case, I'm inheriting but using a different package
d
Hmm right, since you're overriding
pip_url
, "same package" isn't strictly true anymore, even though from Meltano's perspective you're using the same "base package definition" for
tap-postgres
/`variant: transferwise` , but you're just overriding the specific URL to get the package from
e
Right. Seems I'm not out of the woods...I have:
Copy code
- name: tap-postgres-cb_stats
    inherit_from: tap-postgres
    variant: transferwise
    pip_url: git+<https://github.com/ers81239/pipelinewise-tap-postgres.git@time_based_sync>
    config:
      host: 10.128.41.10
      user: data
      dbname: cb_stats
  - name: cb_stats-0
    inherit_from: tap-postgres-cb_stats
    metadata:
      public-stats_broadcasterstats:
        replication-method: TIME-BASED
        replication-key: time
        replication-time-interval: 10 MINUTES
    select:
    - public-stats_broadcasterstats.*
But when I run:
Copy code
meltano --log-level debug elt cb_stats-0 target-snowflake --job_id cb_stats-0
Copy code
cb_stats-0             |   File "/home/ec2-user/meltano-projects/cb-tipping-1/.meltano/extractors/cb_stats-0/venv/lib64/python3.7/site-packages/tap_postgres/__init__.py", line 106, in sync_method_for_streams
cb_stats-0             |     raise Exception("Unrecognized replication_method {}".format(replication_method))
However, the correct package is installed in
/.meltano/extractors/tap-postgres-cb_stats/venv/lib/python3.7/site-packages/tap_postgres/
not
.meltano/extractors/cb_stats-0/venv/lib64/python3.7/site-packages/tap_postgres
I was able to fix this by also running
meltano install extractor cb_stats-0
... this is that same case we discussed yesterday where the installed version and the configured version are out of sync.
d
Yeah, that's an annoying limitation right now, having to remember to reinstall all of the descendant plugins when updating the URL for the parent. https://gitlab.com/meltano/meltano/-/issues/2701 and https://gitlab.com/meltano/meltano/-/issues/2490 are related in that sense
e
Cool... thanks for the help, and this new replication strategy seems to be working dandy:
Copy code
cb_stats-0       | time=2021-05-07 15:39:11 name=tap_postgres level=INFO message=next replication key value: 2019-02-22 02:55:00+00:00
cb_stats-0       | time=2021-05-07 15:39:11 name=tap_postgres level=INFO message=select statement: SELECT  ...
cb_stats-0       |                                     FROM "public"."stats_broadcasterstats"
cb_stats-0       |                                     WHERE  "time"  >= '2019-02-22 02:55:00+00:00'::timestamp with time zone
cb_stats-0       |                                     AND    "time"  < '2019-02-22 02:55:00+00:00'::timestamp with time zone + INTERVAL '10 MINUTES' with itersize 20000
cb_stats-0       | time=2021-05-07 15:39:18 name=tap_postgres level=INFO message=next replication key value: 2019-02-22 03:05:00+00:00
cb_stats-0       | time=2021-05-07 15:39:18 name=tap_postgres level=INFO message=select statement: SELECT  .....
cb_stats-0       |                                     FROM "public"."stats_broadcasterstats"
cb_stats-0       |                                     WHERE  "time"  >= '2019-02-22 03:05:00+00:00'::timestamp with time zone
cb_stats-0       |                                     AND    "time"  < '2019-02-22 03:05:00+00:00'::timestamp with time zone + INTERVAL '10 MINUTES' with itersize 20000
cb_stats-0       | time=2021-05-07 15:39:21 name=singer level=INFO message=METRIC: {"type": "counter", "metric": "record_count", "value": 113262, "tags": {}}
This was built because the database servers have 5 minute query timeouts and this table is huge and can't be sorted in any way in that time.
d
Nice! Do you expect to contribute that back to tap-postgres?
e
Yes, I figured I'd run it for a bit and work out any kinks, but it is already public here: https://github.com/ers81239/pipelinewise-tap-postgres
This was just a detour from tap-redis, that one is still coming, too 🙂
d
Nice 🙂 Looking forward to having it on the brand-new Hub 😄 https://meltano.slack.com/archives/CFG3C3C66/p1620400483433100
@edward_smith If we have time later in Demo Day, maybe you want to demo your tap-postgres fork or tap-redis?
e
I could definitely do a really short code walkthrough for the tap-postgres TIME_BASED replication strategy