Hello everybody! I am starting a new project soon ...
# getting-started
a
Hello everybody! I am starting a new project soon where data extracted from
Facebbok-ads
,
Criteo
and
Bing-ads
needs to be loaded into
BigQuery
. Of course, I would love to get the chance to use Meltano and finally make my collegues undestand why it is so great! Since I am no expert, I would like to know how to get all the official documentation available for these taps, what kind of data can be extracted from them (expecially
Bing-ads
) and if these 3 taps can be called inside the same
Meltano.yml.
Thanks in advance!
p
Hey @andrea_radaelli! Glad to hear you're trying out meltano. Meltano docs are https://docs.meltano.com/ and you can find all the connectors on MeltanoHub https://hub.meltano.com/. Sometimes clicking through to the tap repo README gives you more info as well. In terms of listing streams you can run something like
meltano select tap-<x> --list --all
to list the available streams to select from (see docs) and specifically for Bing the default variant looks like it uses these reports https://github.com/singer-io/tap-bing-ads/blob/master/tap_bing_ads/reports.py#LL1
Also for Criteo the variants listed on the hub look a little stale but I found https://github.com/edgarrmondragon/tap-criteo @edgar_ramirez_mondragon should I add that to the hub? Any insights into the variant options and why you chose to build your own?
a
Hello @pat_nadolny and thanks for your reply, I followed your suggestions and found everything! From the linked documentation you provided (here), I see that `meltano select tap-<x> --list --all`command is for listing all streams available for a specific tap instance (e.g. for `tap-facebook`you need to provide some required keys before being able to run a
select
command). What I would really love to find, is a complete description of all available streams (and their composition) that a tap can provide before configuring a connection to a specific instance. Researching, I found this documentation which is somehow similar to what I'm trying to find. I also searched the Git Repositories and found these JSON schemas (tap-facebook, tap-criteo and none for tap-bing-ads), are these schemas a representation of a stream's composition?
p
@andrea_radaelli glad to hear!
What I would really love to find, is a complete description of all available streams (and their composition) that a tap can provide before configuring a connection to a specific instance.
There are two challenges with this right now: 1. Some connectors use dynamic schema generation based on the data thats retrieved or by requesting a schema from the source. It looks like bing is doing that so you wouldnt be able to list a schema without credentials. There might be ways around this like hard coding the schema but for a lot of these dynamic schema source its not possible (i.e. tap-csv or tap-google-sheets always needs to view the data first). 2. Some connectors unnecessarily require credentials prior to allowing you to successfully discover the streams/schemas even if the schemas are static like tap-facebook. This is something that the SDK resolves but there are still many popular taps that are not built on the SDK standard.
It sounds like you found https://hub.meltano.com/singer/spec but for context the way that taps are run manually outside of meltano is by first discovering the schema i.e. catalog, then passing that catalog to the tap when trying to run a sync (again meltano handles all of this for you). That catalog is what you found here, it is the output of running the
--discover
command. The discovery command is what I'm referring to in my answer above, some taps use static schema discovery (i.e. facebook) and some use dynamic.
a
@pat_nadolny
Some connectors use dynamic schema generation based on the data thats retrieved or by requesting a schema from the source. It looks like bing is doing that
This is both interesting and useful! Thankyou so much