Hey guys, i have a similar problem like Guro Khun...
# troubleshooting
n
Hey guys, i have a similar problem like Guro Khundadze yesterday. I have an AWS-Repo where i am pushing my meltano code. After this a pipeline starts and dockers the repo. Then I am running elt pipelines with AWS Batch. I am trying to work with AWS Secrets or/and Environment variables, but how can i set the .env file if i am not allowed to push it (because of gitignore and security issues)? I am trying to hide the Postgres username and password. Right now I am doing it inside the Dockerfile with
Copy code
ENV MELTANO_DATABASE_URI <postgresql://username:password@host>:Port/db
Is there any way to use the .env file without pushing it to the repo? I already use environment variables in the meltano.yml file for a tap config, but i can't use them to define the database uri. Is anyone having an idea to solve this problem? Is it possible to set the meltano config in the meltano.yml? Thank you guys ! Greetings Nick
a
Hi, @nick_muller. This is certainly doable. I'm not sure about AWS batch specifically but previously I've passed secrets from AWS Parameter Store and AWS Secrets Manager to environment variables in AWS ECS / Fargate. There should be a way to do similar mapping with AWS Batch.
d
Yeah, you typically wouldn't store this sensitive information in
meltano.yml
or the repo, but you'd configure it directly in your production environment using its environment variable/secret manager
Is it possible to set the meltano config in the meltano.yml?
If you don't mind having sensitive info in
meltano.yml
, you can add
database_uri: <postgresql://etc>
to the top of the file
a
Agreed with Douwe's comments above. And this doc has references on how to send secure env vars to AWS Batch. https://docs.aws.amazon.com/batch/latest/userguide/specifying-sensitive-data.html (Confirmed AWS Batch supports both options: Parameter Store and Secrets Manager.)
n
Hi, thank you for your answers. I already tried to work with the docs from amazon. @douwe_maan meltano.yml: This is a good thing, because the environment variables are working in the meltano.yml file, but not in the Dockerfile. So i will put
database_uri: postgresql://$username...
to the top of the meltano.yml file (without sensitive data). Do I need to add some things before or just database_uri to the top? I will give you guys some feedback here.
d
@nick_muller Ah, yeah, you can leave the sensitive bits out and reference them using env vars. good idea. The
database_uri
key/value pair just needs to be at the top level of
meltano.yml
, it doesn't necessarily need to be the very first one in the file. If you add it there, you should see your value show up in
meltano config meltano
n
Allright, gonna try it tomorrow. Thank you !
It's not working like intended, because my docker image is not able to build because it always wants to check the database connection when running 'meltano install'. But I solved the problem now, not 100 %, but it is working. I set a global env in the batch job for MELTANO_DATABASE_URI. Greetings Nick
d
@nick_muller All right, you typically wouldn't need the external database URI at image build / install time, just when running in production
c
Use Chamber to grab the set of vars from param store: https://github.com/segmentio/chamber
FYI I am also running batch+fargate
a
@chris_kings-lynne - can you say more about how you are using Chamber? Do you have a bootstrap script in your Docker/Fargate image to load those from Parameter Store, or are you loading them by another means? As mentioned above, I’ve previously mapped secrets via parameter store but those mappings were at the Fargate level, specified in my terraform code. I haven’t used Chamber before myself.
c
Yeah bootstrap.
Chamber is pretty good for hydrating a set of env vars into a container
a
Good to know! Thanks!
c
I’m not using a bootstrap script any more and doing this in my DOckerfile
Copy code
ARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE

WORKDIR /project

# Install chamber
RUN curl -s <https://packagecloud.io/install/repositories/segment/chamber/script.deb.sh> | /bin/bash \
    && apt-get install -y chamber

# Install any additional requirements
COPY ./requirements.txt . 
RUN pip install -r requirements.txt

# Install all plugins into the `.meltano` directory
COPY ./meltano.yml . 
RUN meltano install

# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Tell Chamber to use the account default KMS key, and where to find it
ENV AWS_REGION ap-southeast-2
ENV CHAMBER_KMS_KEY_ALIAS aws/ssm

# Copy over remaining project files
COPY . .

# Expose default port used by `meltano ui`
EXPOSE 5000

ENTRYPOINT ["chamber", "exec", "meltano/tap-salesforce", "meltano/target-redshift", "--", "meltano"]
If you’re wondering how chamber hydrates those two services, you can do this:
Copy code
root@b866c0eb4b30:/project# chamber export --format dotenv meltano/tap-salesforce meltano/target-redshift
TAP_SALESFORCE_CLIENT_ID="xxx"
TAP_SALESFORCE_CLIENT_SECRET="xxxE"
TAP_SALESFORCE_REFRESH_TOKEN="xxx"
TAP_SALESFORCE_START_DATE="2021-04-01T00:00:00Z"
TARGET_REDSHIFT_DBNAME="xxx"
TARGET_REDSHIFT_DEFAULT_TARGET_SCHEMA_SELECT_PERMISSION="xxx"
TARGET_REDSHIFT_HOST="xxx"
TARGET_REDSHIFT_PASSWORD="xxx"
TARGET_REDSHIFT_PORT="xxx"
TARGET_REDSHIFT_S3_BUCKET="xxx"
TARGET_REDSHIFT_S3_KEY_PREFIX="xxx"
TARGET_REDSHIFT_USER="xxx"
And that just looks like this in parameter store:
message has been deleted
It might be easier to have them all just in
/meltano
rather than per target and tap
But I’m trying to get a container where I will pass in the tap and target name, it will hydrate just those two envs, and then run the elt
my way also I can restrict the task roles access to only the tap and target secrets that they need to run
a
This is very cool, @chris_kings-lynne and I think others could benefit from this approach. My first thought was to see if we could add it to the base docker image, but since Chamber is AWS specific, I’m not sure what the best integration path is - whether it’s to add a tutorial or sample docker file, or ___… Do you have thoughts/ideas on how we might make this a reusable pattern for Meltano AWS users? I’m not sure yet which direction we should go, but it still would be great to log an issue on this if we don’t have one already.
a
This is awesome @chris_kings-lynne - Thank you for the comprehensive explanation and example code!
c
Update: mycurrent ENTRYPOINT is this:
ENTRYPOINT ["chamber", "exec", "meltano", "meltano/target-redshift", "meltano/tap-salesforce", "--", "meltano"]
So basically I have meltano-wide variables in SSM under /meltano and I put the tap last so that I can override target variables per-tap
Note that the exact target and tap are passed in as batch env vars too
so basically the container has everything in it it needs to run any tap/target combination
I’ll put together a full solution one day, but an (old) gist I posted is here https://gist.github.com/chriskl/495ff766f9dfa8e8f6fcd00fae6e58f3
a
@chris_kings-lynne - In that gist, you have the
ENTRYPOINT
empty - is that so you can specify it at run time, since ENVs can’t be used in ENTRYPOINT ?
(iirc)
c
No it’s just old as, during my development 🙂
Current entrypoint is
ENTRYPOINT ["chamber", "exec", "meltano", "meltano/target-redshift", "meltano/tap-salesforce", "--", "meltano"]
a
Got it. Do you happen to know how you might inject variable parameter store keys into the container run time?
I guess maybe by executing it all under a sh
c
Not sure I follow - isn’t that what Chamber does?
a
I mean that I don’t think one could just do
ENTRYPOINT ["chamber", "exec", "$PARAM_STORE_PREFIX", "--", "meltano"]
Though I think maybe
ENTRYPOINT ["/bin/bash", "-c", "chamber", "exec", "$PARAM_STORE_PREFIX", "--", "meltano"]
might work.
c
ahhh…I do that in my AWS Batch job definition…
eg. line
139
a
@chris_kings-lynne - Do you have to do anything special to the container or batch job definition (presumably during CI/CD) in order to feed credentials to chamber? I keep getting this “Error: Failed to list store contents: NoCredentialProviders: no valid providers in chain. Deprecated.” error and I suspect it has something to do with the container not finding the IAM instance role.. or whatever the equiv might be in Fargate (not terribly familiar with Fargate). Just curious if that sticks out as something obvious to you.
c
Might be the roles in line 55 and 68 in the blob above? Chamber gets the env vars for me at runtime, not CI/CD time
a
Ah, I bet I’m mixing up the ExecuteRole and JobRole in the job definition.
Yep! That was it. I was missing a JobRole and had to add ssm policy.
One of these days ill learn CloudFormation or TerraForm.
c
ExecuteRole and JobRole is the most confusing thing ever 😞. The Execution Role is used by the batch service to start up fargate jobs. The JobRole is what the actual container runs as…i thihnk
a
Yes. Ditto what @chris_kings-lynne says above. I think of “Execution Role” as getting everything you need ready to handoff to the Job Role. So, Execution Role would need access to creds (if mapping to env vars of the container), the docker image, any anything else needed to launch a container - then the container itself needs permission to read/write from S3/Redshift, any crds not already loaded to env vars, etc.