Hey team very off topic so please direct me towards the best Meltano #singer-tap-development

Hey team; very off-topic, so please direct me towa...

Stéphane Burwash

05/04/2022, 2:36 PM

Hey team; very off-topic, so please direct me towards the best channel. I'm just hard-pressed to find a better team of Data Engineers 😛 After creating a solid basis for our EL pipeline using Meltano, our next big issue in our company is applying our T directly in our warehouse (GBQ). The goal here is to apply a separation between our raw directly imported from the source and our resulting dashboard. Would anyone have any insights on what are best practices for pipeline implementation Between your warehouse and your dashboards? This may include: • Creating additional views with dbt • Manipulating complicated data using python • Applying auditing • Using tools to distribute load Thank you so much!

dan_ladd

05/04/2022, 2:51 PM

Not sure I can dictate best practices, but I can tell you a high level what we do 😅 We use Airflow (composer), to orchestrate our pipelines. In short a datsource specific DAG would: • Run the meltano ELT job • Run the resulting downstream dbt models For auditing, we export all query logs to BQ. Users only have read access, so any DDL has to be done via git/CICD. Pretty much all gcp infra is managed via terraform (Including table schemas). We also use Data Catalog's policy tags to limit access to certain columns.

pablo_seibelt

05/05/2022, 9:55 AM

DBT's styleguide can be a good starting point @Stéphane Burwash https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md

pablo_seibelt

05/05/2022, 9:56 AM

Also more generally their best practices page is a good reference too https://docs.getdbt.com/docs/guides/best-practices

pablo_seibelt

05/05/2022, 9:57 AM

If possible, try to avoid the temptation of using Python to manipulate data, usually with some thought you can manipulate everything using SQL and then you have a unified place for all transformations

pablo_seibelt

05/05/2022, 9:57 AM

A notable exception is obviously Machine Learning

Stéphane Burwash

05/05/2022, 12:41 PM

Thank you so much 😄

Open in Slack

Previous Next