Hi, I am interested in the Meltano and Datahub int...
# plugins-general
e
Hi, I am interested in the Meltano and Datahub integration feature. I see that I can use Meltano Utility to run datahub ingest commands with meltano commands, what is the benefit of doing this? What is the difference between this and the normal execution of the datahub ingest command?
For example, can we run extract and load from s3 to redshift, and then run datahub ingest from s3 and redshift to extract metadata from the load results?
s
Hi @eburairu great question. The benefit is integration into any kind of workflow. You typically want to refresh your datahub catalog after refreshing data, with the meltano utility (that just wraps the datahub CLI!) you can do things like 1. ingest Rest API into S3 2. refresh datahub to reflect this 3. ingest data from S4 datalake into Snowflake, run dbt over it 4. refresh datahub (using the dbt connector) again to reflect this as well 5. ...
e
It would have been a waste of resources to build Airflow or Argo workflows just to run dbt or datahub ingestion, so I think it's great!
However, how should meltono execution be controlled? Is it still best practice to use a workflow engine such as Airflow?
s
Well, meltano has a pretty slim footprint on that. You can use Airflow, but you can also just use an existing prefect or dagster (cloud) orchestrator. If you're just starting out, you can go with a big fat instance and use cron until your needs rise to the occasion to deploy airflow as well.
If you want to go even slimmer, GitHub Actions/Azure/Gitlab version of that are nice quick ways for getting started.
(In that case you would use GitHub Actions schedules = basically cron to execute)
e
Thank you! Our dev environment includes Airflow and Argo Workflows built on Kubernetes. We will first try to implement ETL pipeline and metadata extraction to DataHub using Meltano. I had heard that Meltano was an ETL tool without a GUI, but it turns out that it is no longer just an ETL tool. 😮
s
Well no, we got some awesome other stuff like great expectations support, you can deploy BI tools, use jupyter notebooks from Meltano. But what we're building out with full power right now are the extract & load capabilities. (Although I think dataHub for instance ties very well into that).
e
That's fantastic! I look forward to trying out what is possible with Meltano. Since there are still few articles about Meltano on Japanese technical article pages, I plan to post the implementation details.
We tried Meltano, but it did not work in our environment because of errors with unmaintained loaders such as target-redshift.
I think Meltano itself is a great product, but I felt it was important to know who and how to manage the plugins.