Hi All, I have a more general question. When work...
# getting-started
d
Hi All, I have a more general question. When working with taps, is it normal for the catalog I generate to have issues for my targets? What I mean is that I am using tap-googleads with a target-postgres. If I generate the catalog and run a test with a target-jsonl, all works fine. However, if I use the stock catalog and try this with postgres, I need to go in and make lots of changes. Some examples of changes are column naming notation and selection of primary keys. I expected a catalog generation to generate something that all loaders could use but I'm starting to see that is not the case. I do not mind making these updates but want to make sure it it the proper work flow. The end result seems to be a pretty customized catalog file that I need to reference from meltano.yml.
a
I have found this specifically with tap-googleads, but as a minor tap developer I can see how easy it is to happen. I did not do a custom catalog but I had to customize metadata in
<http://meltano.to|meltano.yml
>to define primary keys. I suspect most developers of taps are working with a specific target in mind - I know I am guilty of this and just tend to test with target-jsonl and target-postgres as postgres is what I use in production. The issue becomes even more tricky when json flattening is involved. Perhaps some good practice around testing would help. It's relatively simple to spin up local postgres in docker, but I would have no idea how to test my taps with bigquery/snowflake as I don't have those services provisioned.
e
Some examples of changes are column naming notation
If you're a tap developer, I think something like #2631 might help. Rather than having the target try to normalize column names or fail, nudge the tap developer into doing that upstream.
and selection of primary keys.
This is a bit tougher, but I think target-postgres handles missing primary keys by doing append-only. Is that not the case?
a
I think this pr resolved the PK issues on googleads: https://github.com/Matatika/tap-googleads/pull/69/files
e
Ah, it just hasn't been tagged
a
Love it when I get to remove something from
meltano.yml
🙂
1
d
In my experience we've come across similar situations. We either go in and make the required updates to make the target compatible or store the data as csv or jsonl and then from there send to our destination (which for us is s3). The latter ends up being a two step ELT process in the sense that we have a two step process to land the file where required. Admittedly we've only done the latter in development environments.
m
hello everyone, i'm trying Meltano for the past few days and it's a really good solution 😉. Lately, i've tried to use the tap-googleads with the target-postgres and it works fine except i cannot specify the final table schema i want here is what i have inside my meltano.yml (i don't know if what i'm doing is correct)
Copy code
plugins:
  extractors:
  - name: tap-googleads
    variant: singer-io
    pip_url: git+<https://github.com/singer-io/tap-google-ads.git>
    use_cached_catalog: false
    select:
    # CAMPAIGN INFORMATIONS
    - campaign_performance_report.customer_id
    - campaign_performance_report.customer_descriptive_name
    - campaign_performance_report.campaign_id
    - campaign_performance_report.campaign_name
etc.
    select_filter:
# Same columns
i checked the
tap.properties.json
and all my fields are marked as "available" while when i do a
meltano select
, only the fields I have in my yml are marked as selected. so when i do this
meltano run tap-googleads-2 target-postgres --full-refresh --refresh-catalog --no-install
, all columns are used when creating the destination table, while i just want the columns I selected using the
meltano select
command. What should i do to only have the columns i specified inside my yml config ?
e
Unfortunately I think that tap doesn't prune unselected fields from the schema: https://github.com/singer-io/tap-google-ads/blob/0f0bf57a1baa6ad1c719e58f3f8a47b4a5d35518/tap_google_ads/sync.py#L117 The Singer SDK would handle that automatically, but singer-io/tap-google-ads is not based on it: https://github.com/meltano/sdk/blob/de71fed2919d1ccc2dde72b1af27c660a4eb1114/singer_sdk/helpers/_catalog.py#L19-L27 I wonder if Meltano should do this for all taps (https://github.com/meltano/meltano/issues/2430).
m
okay thank you @Edgar Ramírez (Arch.dev) for the clear info ! I didn't have those in mind. Would love to see a google-ads tap based on the Singer SDK. Maybe i can help or is it on the way ?
a
m
Yes but this one doesn't have a keyword stream like the other one (only click_view which is not helpful in my case)
👍 1
👀 1
a
Understood. I have proposed a few extra stream definitions but they were not merged - quite sensibly I think as everyone would have their own requirements. Discussion here: https://github.com/Matatika/tap-googleads/pull/34 Instead you would need to fork it and add your own definitions. My stream definitions are shared in the discussions: https://github.com/Matatika/tap-googleads/discussions
And thanks for the link to singer-io version, I might add some of my own streams from these definitions!
e
Maybe i can help or is it on the way ?
Matatika's is the main one at the moment, but like Andy mentioned it's missing a few things.
d
I also noticed that the keywords streams is missing from the Matatika version. I started exploring the Airbyte variant but have run into an RPC error when doing catalog discovery. Has anyone else on this thread tried the googleads Airbyte variant with success?
e
I think @Pat Nadolny (Arch) has.
d
Andy, thanks for the info on the streams. I was able to fork the repo and add your streams. Good stuff and I have them flowing through. The conversion and pmax stuff will be a big help for us. I was also able to create a couple of streams for keyword and key phrase performance. This is exactly what we need right now. Thanks.
🙌 1
r
Given that a couple of people are asking for a keywords stream here, we would be open to a PR to add it. 🙂
🙌 1
p
I started exploring the Airbyte variant but have run into an RPC error when doing catalog discovery
I've used both variants. I have also seen the Airbyte RPC error and I've noticed that its way slower and requires more requests, I'm not sure if its actually getting more data or what. The matatika variant worked great and ran fast but the stream coverage is lower as mentioned so it depends what you need. Adding the custom_queries config to the matatika variant like the airbyte variant has would be nice https://hub.meltano.com/extractors/tap-googleads--airbyte/#airbyte_config-custom_queries-setting. Then users can define their own streams if they dont exist yet
👍 1
r
a
@dbname reading back through this, would you mind sharing your keyword and phrase performance stream definitions in to the discussions here? https://github.com/Matatika/tap-googleads/discussions
1
r
I also asked for the keyword performance stream as a PR back: https://github.com/Matatika/tap-googleads/pull/88 Worth mentioning that @Pat Nadolny (Arch) is working on https://github.com/Matatika/tap-googleads/pull/93 as well. 😎
a
hot diggity dog - looks like some nice improvements