Good morning meltano peeps! I had a question rega...
# infra-deployment
s
Good morning meltano peeps! I had a question regarding a failure in my airflow deployment in prod. I had a the pleasure of waking up this morning to see that all of my jobs in meltano failed. Now, here are a few facts: • All of these jobs were set for midnight • When I run them manually, they work • We are on ECS fargate • The error I get is something along the lines of "missing http or https", but when I google the error on stack overflow it says that this is a BS error that appears when tasks actually couldn't start Do any of you know what happened? Where should I investigate?
a
Hard to be sure but this smells like an interim network or firewall issue to me...
Is there anyone in your org who could have been performing maintenance on the network at that time?
Because it's ECS, it's not like the whole network can go down (although doesn't hurt to check AWS availability logs) - more likely I'd suspect an SSL cert expiration or some kind of networking/routing rules change...
Are you okay to wait until tonight to see if the issue has self healed?
s
I'm definitely ok with it, it's just a very weird issue; when looking at my logs in grafana, I'm getting a blindspot at midnight every day
(8pm utc)
This is the day I introduced a
dbt:run --select staging.<name_of_source>+
for all of my sources; could this be cause my issue? Concurrent uses of dbt?
Update: We are receiving no logs in grafana since a large update. I think this is because we are experience an out-of-memory error. As logs are buffered, they are simply flushed
a
Geez. Yikes. Yeah, that'd do it. 😬
s
I just have no idea how to work around this 😛 do I need to allocate more memory to my individual airflow threads?
a
Is everything running on the same container? If so, you would likely need to stagger the start times or increase the size of the container. Alternatively, if each task runs in its own container, you'll avoid the contention of running all the things at the same time.
s
Everything is in 3 separate task definitions (airflow-scheduler, airflow-webserver, meltano-ui) on ecs fargate
I'm currently allocating 768 of CPU and 2048 of memory; maybe Ill up those values, see what it looks like
If not; is it possible to allocate more worker to airflow in meltano?
It worked 😄