https://meltano.com/ logo
#announcements
Title
# announcements
p

proud-pillow-55935

03/23/2021, 4:54 PM
Hello all! My team is new to Meltano and Airflow, but we're trying to set up our data pipelines in GCP to be triggered when CSVs are uploaded to GCS (so not necessary on a schedule). We're looking to use Airflow as our orchestrator (as we have more pipelines and tooling we want to do down the line), but we're wondering what the best course of action might be. It looks like we can develop our Airflow dags within Meltano looking at this and this. But I also was looking at the Airflow
DockerOperator
that could potentially be used as well as the
KubernetesPodOperator
to actually trigger our Meltano pipeline to run in a container. Based on this use case, where have people found success/where would people recommend we start? We are currently running Airflow in GCP's managed service (aka Composer). Thank you!
g

great-gold-98639

03/23/2021, 5:23 PM
Hey Ricky, we're actually using composer and the
KubernetesPodOperator
to trigger our meltano pipelines and then store the state externally. I would say if you have an existing composer instance, it makes sense to use it and not have to add additional infra for meltano. https://meltano.slack.com/archives/C01QM86B83A/p1614871705229100
👍 2
Happy to connect/discuss more if you have any questions
👍 2
p

proud-pillow-55935

03/23/2021, 8:11 PM
Hi @great-gold-98639 thanks for reaching out this is really helpful. We're trying out using
KubernetesPodOperator
to run our Meltano pipeline. Having trouble connecting to the database currently, but I think that is just a networking issue
Hey @great-gold-98639 - how we're you able to get the logs for a Meltano elt job running in the pod? I'm trying to debug why it isn't working and it seems like the pod never terminates.
g

great-gold-98639

03/23/2021, 9:07 PM
Yea, i struggled with this since the logs would only output for me after the job finished. So I do something like the following when triggering the meltano command
Copy code
ps = subprocess.Popen(
            "meltano elt {tap} {target}..."
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT)

for line in ps.stdout:
    <http://logging.info|logging.info>(line)
That way it streams the logs back out of the pod as they happen
👍 2
p

proud-pillow-55935

03/23/2021, 9:37 PM
Do you run this in your DAG code?
Or is this baked into a custom Meltano image @great-gold-98639
💯 1
g

great-gold-98639

03/23/2021, 10:10 PM
No, we are creating a custom container that "wraps" around the meltano image. This is the in entrypoint for the container. Basically we have a Dockerfile that looks like
Copy code
FROM meltano/meltano:v1.70.0

ENV APP /meltano_image
WORKDIR $APP

# Copy some shell scripts like installing taps/targets
COPY helpers/ helpers/
# Copy our meltano directory (contains our meltano.yml, catalogs, etc)
COPY meltano/ meltano/
# Our entrypoint file that parses arguments from the KubernetesPodOperator, and runs the meltano command
COPY meltano.py meltano.py

RUN exec ./helpers/install_taps.sh

ENTRYPOINT ["python", "meltano.py"]
We then build the image and push it to gcr.
🙌 1
Then we reference that image in the KubernetesPodOperator
p

proud-pillow-55935

03/23/2021, 10:12 PM
Thank you!!
Hey @great-gold-98639 I'm still not seeing any logs to the Airflow logs console... do we need to configure anything else in Airflow? Or in the KubernetesPodOperator? I see the logs when I run it locally in Docker
g

great-gold-98639

03/24/2021, 3:14 PM
Hmm, even when the job finishes? Can you see the logs if you go to the pod in GKE? https://console.cloud.google.com/kubernetes/workload
p

proud-pillow-55935

03/24/2021, 4:32 PM
No I see it after it finishes but just was hoping to see it while it is running -> https://medium.com/bluecore-engineering/kubernetes-pod-logging-in-the-airflow-ui-ed9ca6f37e9d looks like this article addresses why. Might need to dig deeper to get it during runtime
Hey @great-gold-98639 - were able to get Meltano to return a non zero status code when it failed when running Meltano as a subprocess?
g

great-gold-98639

03/30/2021, 8:37 PM
Nope, havent looked into that yet
p

proud-pillow-55935

03/30/2021, 8:48 PM
@great-gold-98639 how do you determine that your pipelines succeeded?
g

great-gold-98639

03/30/2021, 8:58 PM
Seems I don't as of now 😅 . Something else I'll have to implement