Hi. I'm working with PII data, so in order to avoi...
# docker
j
Hi. I'm working with PII data, so in order to avoid a lot of paperwork I want to deploy Meltano in an environment which we're already clear for in terms of compliance audits. That means Heroku. Part of the EL workflow is to access our prod database and strip/mask the PII before loading it in another database (a new Heroku Postgres deployment with different credentials etc.) I got
tap-postgres
and
target-postgres
up and running pretty quickly and I'm ready to try deploying it to Heroku, but I'm wondering about the
.meltano
directory: besides the meltano.db SQLite database, does it contain anything that must be persisted? This docs page lists what's in the directory: • I can live without the log files for prod (or potentially find a way to get the logs themselves extracted and loaded somehow) • I suppose the `venv`s of the needed Python packages could be created at
docker build
-time? I know Meltano supports pluggable system databases, and I'm planning on just using letting Meltano have a schema in my BI database for that. Other than that, what else do I need to know for a stateless Docker deployment (on Heroku, in my case)?
Seems I'm confusing System Database and State Backend somewhat
c
I’ve done something similar using Amazon’s Managed Workflows for Apache Airflow (MWAA) and this solution has been running in production for about a month - nothing is necessary in the .meltano directory if you manage to move the state database somewhere (mine is hosted by the target PostgreSQL database). The only issue I ran into is that Meltano did not create the destination schema for the target warehouse during initialization in production - could have been a me thing I suppose. (I had created it manually in our staging environment.) For the virtual environments you can run
meltano install
and it will create them. I had to do some manual creation of the hosting venv for meltano itself because I’m running on MWAA which runs Airflow (also uses Python), so after I create a venv and install meltano and copy my directory to the target machine, I then execute meltano install using the project directory as the working directory and everything works fine. This would be, I think, easier and cleaner in a Docker environment. (Mine has to be repeatable for each MWAA worker that AWS brings up, and AWS doesn’t allow you to define a Docker image for MWAA.) So based on your question in the other channel, basically if you set
MELTANO_DATABASE_URI
and
meltano install
during image creation I think you’ll have a decent start at least. The logs you should be able to configure to go to some agent that sends them to centralized logging. I didn’t have to do any of that for MWAA, as it comes configured to send the logs to CloudWatch by default, and when Meltano runs as a part of Airflow, it uses the Airflow logging mechanism.
❤️ 3
j
Thanks for the very detailed response! I appreciate it
v
besides the meltano.db SQLite database, does it contain anything that must be persisted?
no. I don't use incremental syncs for a lot of our stuff so I don't even keep the db file
👀 1
e
Seems I'm confusing System Database and State Backend somewhat
We could certainly improve the conceptual docs to make it clear what purpose each serves, and what's their overlap.
j
Yeah, I did find a few places that mentioned the "old" way of using dbt (the one that wasn't called
utility
) and some other things, too. I should fork and send a PR with some docs work, but yesterday I just wanted to get something to work 😅
❤️ 3
dancingpenguin 2