Hey everyone, I am a 22 yo new grad working as a D...
# getting-started
i
Hey everyone, I am a 22 yo new grad working as a DE. I have been working on a python project doing transformation on specific CSV formats using pandas as the data structure intermediary, which ultimately is loaded into a staging environment within Postgres (using SQLalchemy). I came across Meltano from episode #141 of the Data Engineering Podcast. The Data team as it stands now is just me (awesome for learning, terrifying for a suite of reasons), and I think meltano might fit the use case of abstracting a lot of the airflow work as I transition v0.0 of this project to run on something more than my local machine. A few questions: 1. from the pod, it seems the main use of meltano Orchestration is based around singer pipelines — would I want to transition my code into a custom singer tap to get the best of meltano? (i strongly doubt that anyone beyond my firm would find the code beneficial). 2. as I believe we only need the abstraction of airflow in this meltano use case, is using meltano for a more complex option than vanilla airflow?
e
Welcome @ian_mulhern! If your data is in CSV format, there's already a since tap-csv so you probably don't need to write a custom tap. You'd use Meltano to load the raw data from CSV files into your postgres db via Singer, and later transform the data inside postgres using SQL (Meltano uses dbt). You can run the complete ELT pipeline just with the CLI and the Airflow integration writes the Meltano DAG for you, based on a schedule. I'd argue that using Meltano + Airflow is a far less complex solution than Custom ETL + Airflow.