Hey guys i have a similar problem like Guro Khundadze yester Meltano #troubleshooting

Hey guys, i have a similar problem like Guro Khun...

nick_muller

04/20/2021, 5:53 PM

Hey guys, i have a similar problem like Guro Khundadze yesterday. I have an AWS-Repo where i am pushing my meltano code. After this a pipeline starts and dockers the repo. Then I am running elt pipelines with AWS Batch. I am trying to work with AWS Secrets or/and Environment variables, but how can i set the .env file if i am not allowed to push it (because of gitignore and security issues)? I am trying to hide the Postgres username and password. Right now I am doing it inside the Dockerfile with

Copy code

ENV MELTANO_DATABASE_URI <postgresql://username:password@host>:Port/db

Is there any way to use the .env file without pushing it to the repo? I already use environment variables in the meltano.yml file for a tap config, but i can't use them to define the database uri. Is anyone having an idea to solve this problem? Is it possible to set the meltano config in the meltano.yml? Thank you guys ! Greetings Nick

aaronsteers

04/20/2021, 6:48 PM

Hi, @nick_muller. This is certainly doable. I'm not sure about AWS batch specifically but previously I've passed secrets from AWS Parameter Store and AWS Secrets Manager to environment variables in AWS ECS / Fargate. There should be a way to do similar mapping with AWS Batch.

douwe_maan

04/20/2021, 7:05 PM

Yeah, you typically wouldn't store this sensitive information in

meltano.yml

or the repo, but you'd configure it directly in your production environment using its environment variable/secret manager

douwe_maan

04/20/2021, 7:05 PM

Is it possible to set the meltano config in the meltano.yml?

If you don't mind having sensitive info in

meltano.yml

, you can add

database_uri: <postgresql://etc>

to the top of the file

aaronsteers

04/20/2021, 7:08 PM

Agreed with Douwe's comments above. And this doc has references on how to send secure env vars to AWS Batch. https://docs.aws.amazon.com/batch/latest/userguide/specifying-sensitive-data.html (Confirmed AWS Batch supports both options: Parameter Store and Secrets Manager.)

nick_muller

04/20/2021, 7:14 PM

Hi, thank you for your answers. I already tried to work with the docs from amazon. @douwe_maan meltano.yml: This is a good thing, because the environment variables are working in the meltano.yml file, but not in the Dockerfile. So i will put

database_uri: postgresql://$username...

to the top of the meltano.yml file (without sensitive data). Do I need to add some things before or just database_uri to the top? I will give you guys some feedback here.

douwe_maan

04/20/2021, 7:16 PM

@nick_muller Ah, yeah, you can leave the sensitive bits out and reference them using env vars. good idea. The

database_uri

key/value pair just needs to be at the top level of

meltano.yml

, it doesn't necessarily need to be the very first one in the file. If you add it there, you should see your value show up in

meltano config meltano

nick_muller

04/20/2021, 7:16 PM

Allright, gonna try it tomorrow. Thank you !

nick_muller

04/22/2021, 6:13 AM

It's not working like intended, because my docker image is not able to build because it always wants to check the database connection when running 'meltano install'. But I solved the problem now, not 100 %, but it is working. I set a global env in the batch job for MELTANO_DATABASE_URI. Greetings Nick

douwe_maan

04/22/2021, 3:18 PM

@nick_muller All right, you typically wouldn't need the external database URI at image build / install time, just when running in production

chris_kings-lynne

04/26/2021, 11:14 AM

Use Chamber to grab the set of vars from param store: https://github.com/segmentio/chamber

chris_kings-lynne

04/26/2021, 11:17 AM

FYI I am also running batch+fargate

aaronsteers

04/26/2021, 4:46 PM

@chris_kings-lynne - can you say more about how you are using Chamber? Do you have a bootstrap script in your Docker/Fargate image to load those from Parameter Store, or are you loading them by another means? As mentioned above, I’ve previously mapped secrets via parameter store but those mappings were at the Fargate level, specified in my terraform code. I haven’t used Chamber before myself.

chris_kings-lynne

04/28/2021, 1:02 AM

Yeah bootstrap.

chris_kings-lynne

04/28/2021, 1:02 AM

Chamber is pretty good for hydrating a set of env vars into a container

aaronsteers

04/28/2021, 3:10 PM

Good to know! Thanks!

chris_kings-lynne

04/29/2021, 5:58 AM

I’m not using a bootstrap script any more and doing this in my DOckerfile

chris_kings-lynne

04/29/2021, 5:58 AM

Copy code

ARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE

WORKDIR /project

# Install chamber
RUN curl -s <https://packagecloud.io/install/repositories/segment/chamber/script.deb.sh> | /bin/bash \
    && apt-get install -y chamber

# Install any additional requirements
COPY ./requirements.txt . 
RUN pip install -r requirements.txt

# Install all plugins into the `.meltano` directory
COPY ./meltano.yml . 
RUN meltano install

# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Tell Chamber to use the account default KMS key, and where to find it
ENV AWS_REGION ap-southeast-2
ENV CHAMBER_KMS_KEY_ALIAS aws/ssm

# Copy over remaining project files
COPY . .

# Expose default port used by `meltano ui`
EXPOSE 5000

ENTRYPOINT ["chamber", "exec", "meltano/tap-salesforce", "meltano/target-redshift", "--", "meltano"]

chris_kings-lynne

04/29/2021, 5:59 AM

If you’re wondering how chamber hydrates those two services, you can do this:

chris_kings-lynne

04/29/2021, 6:00 AM

Copy code

root@b866c0eb4b30:/project# chamber export --format dotenv meltano/tap-salesforce meltano/target-redshift
TAP_SALESFORCE_CLIENT_ID="xxx"
TAP_SALESFORCE_CLIENT_SECRET="xxxE"
TAP_SALESFORCE_REFRESH_TOKEN="xxx"
TAP_SALESFORCE_START_DATE="2021-04-01T00:00:00Z"
TARGET_REDSHIFT_DBNAME="xxx"
TARGET_REDSHIFT_DEFAULT_TARGET_SCHEMA_SELECT_PERMISSION="xxx"
TARGET_REDSHIFT_HOST="xxx"
TARGET_REDSHIFT_PASSWORD="xxx"
TARGET_REDSHIFT_PORT="xxx"
TARGET_REDSHIFT_S3_BUCKET="xxx"
TARGET_REDSHIFT_S3_KEY_PREFIX="xxx"
TARGET_REDSHIFT_USER="xxx"

chris_kings-lynne

04/29/2021, 6:00 AM

And that just looks like this in parameter store:

chris_kings-lynne

04/29/2021, 6:01 AM

message has been deleted

chris_kings-lynne

04/29/2021, 6:01 AM

It might be easier to have them all just in

/meltano

rather than per target and tap

chris_kings-lynne

04/29/2021, 6:01 AM

But I’m trying to get a container where I will pass in the tap and target name, it will hydrate just those two envs, and then run the elt

chris_kings-lynne

04/29/2021, 6:05 AM

my way also I can restrict the task roles access to only the tap and target secrets that they need to run

chris_kings-lynne

04/29/2021, 6:06 AM

here is a good read https://aws.amazon.com/blogs/mt/the-right-way-to-store-secrets-using-parameter-store/

aaronsteers

04/29/2021, 4:40 PM

This is very cool, @chris_kings-lynne and I think others could benefit from this approach. My first thought was to see if we could add it to the base docker image, but since Chamber is AWS specific, I’m not sure what the best integration path is - whether it’s to add a tutorial or sample docker file, or ___… Do you have thoughts/ideas on how we might make this a reusable pattern for Meltano AWS users? I’m not sure yet which direction we should go, but it still would be great to log an issue on this if we don’t have one already.

andrew_stewart

06/04/2021, 5:23 PM

This is awesome @chris_kings-lynne - Thank you for the comprehensive explanation and example code!

chris_kings-lynne

06/06/2021, 2:18 AM

Update: mycurrent ENTRYPOINT is this:

ENTRYPOINT ["chamber", "exec", "meltano", "meltano/target-redshift", "meltano/tap-salesforce", "--", "meltano"]

chris_kings-lynne

06/06/2021, 2:18 AM

So basically I have meltano-wide variables in SSM under /meltano and I put the tap last so that I can override target variables per-tap

chris_kings-lynne

06/06/2021, 2:19 AM

Note that the exact target and tap are passed in as batch env vars too

chris_kings-lynne

06/06/2021, 2:19 AM

so basically the container has everything in it it needs to run any tap/target combination

chris_kings-lynne

06/06/2021, 2:21 AM

I’ll put together a full solution one day, but an (old) gist I posted is here https://gist.github.com/chriskl/495ff766f9dfa8e8f6fcd00fae6e58f3

andrew_stewart

06/07/2021, 6:23 PM

@chris_kings-lynne - In that gist, you have the

ENTRYPOINT

empty - is that so you can specify it at run time, since ENVs can’t be used in ENTRYPOINT ?

andrew_stewart

06/07/2021, 6:23 PM

(iirc)

chris_kings-lynne

06/08/2021, 1:47 AM

No it’s just old as, during my development 🙂

chris_kings-lynne

06/08/2021, 1:48 AM

Current entrypoint is

ENTRYPOINT ["chamber", "exec", "meltano", "meltano/target-redshift", "meltano/tap-salesforce", "--", "meltano"]

andrew_stewart

06/08/2021, 1:55 AM

Got it. Do you happen to know how you might inject variable parameter store keys into the container run time?

andrew_stewart

06/08/2021, 1:55 AM

I guess maybe by executing it all under a sh

chris_kings-lynne

06/08/2021, 7:11 AM

Not sure I follow - isn’t that what Chamber does?

andrew_stewart

06/08/2021, 4:22 PM

I mean that I don’t think one could just do

ENTRYPOINT ["chamber", "exec", "$PARAM_STORE_PREFIX", "--", "meltano"]

andrew_stewart

06/08/2021, 4:23 PM

Though I think maybe

ENTRYPOINT ["/bin/bash", "-c", "chamber", "exec", "$PARAM_STORE_PREFIX", "--", "meltano"]

might work.

chris_kings-lynne

06/09/2021, 1:39 AM

ahhh…I do that in my AWS Batch job definition…

meltano-job.yaml

chris_kings-lynne

06/09/2021, 1:40 AM

eg. line

andrew_stewart

06/15/2021, 10:22 PM

@chris_kings-lynne - Do you have to do anything special to the container or batch job definition (presumably during CI/CD) in order to feed credentials to chamber? I keep getting this “Error: Failed to list store contents: NoCredentialProviders: no valid providers in chain. Deprecated.” error and I suspect it has something to do with the container not finding the IAM instance role.. or whatever the equiv might be in Fargate (not terribly familiar with Fargate). Just curious if that sticks out as something obvious to you.

chris_kings-lynne

06/16/2021, 1:19 AM

Might be the roles in line 55 and 68 in the blob above? Chamber gets the env vars for me at runtime, not CI/CD time

andrew_stewart

06/16/2021, 2:07 AM

Ah, I bet I’m mixing up the ExecuteRole and JobRole in the job definition.

andrew_stewart

06/16/2021, 2:39 AM

Yep! That was it. I was missing a JobRole and had to add ssm policy.

andrew_stewart

06/16/2021, 2:40 AM

One of these days ill learn CloudFormation or TerraForm.

chris_kings-lynne

06/16/2021, 3:54 AM

ExecuteRole and JobRole is the most confusing thing ever 😞. The Execution Role is used by the batch service to start up fargate jobs. The JobRole is what the actual container runs as…i thihnk

aaronsteers

06/16/2021, 4:54 AM

Yes. Ditto what @chris_kings-lynne says above. I think of “Execution Role” as getting everything you need ready to handoff to the Job Role. So, Execution Role would need access to creds (if mapping to env vars of the container), the docker image, any anything else needed to launch a container - then the container itself needs permission to read/write from S3/Redshift, any crds not already loaded to env vars, etc.

Open in Slack

Previous Next