par_degerman
08/24/2021, 7:56 AMken_payne
08/24/2021, 1:09 PMpar_degerman
08/24/2021, 1:44 PMcloud_sql_proxy
in a container next to meltano using docker compose).
One gotcha I ran into is that meltano ui
require a local project folder. I have up until now simply used a Dockerfile that sets up the base project (which is shared between all meltano runners as well as the UI). However, the runners make slight changes at runtime (I do a series of meltano select
when starting the container, before starting the pipeline with meltano elt
), so the project directories are not entirely identical. Will this be a problem?ken_payne
08/24/2021, 2:22 PMselect
commands dynamic? Do they change from one run to the next?
My only concern with doing that would be with state management. Some well-know targets/loaders have a bug whereby they do not emit state for unselected streams even if state is present for those streams in the received state message ๐คฆโโ๏ธ This would only be a problem if you were dynamically removing and re-adding streams on a regular basis and only with some targets.
In general the approach I have used in the past is to 'bake' select criteria into our meltano.yml
project file under the select
extra key in extractor plugin config, making use of *
wildcards to catch streams matching specific criteria (schemas, prefixes etc.). This key is documented here. Each deployment has a commit id used as the docker tag and changes are pushed through CI/CD with new commit ids. All that runs in production is meltano run ...
. Would be interested if your use case doesn't fit this model ๐edgar_ramirez_mondragon
08/24/2021, 3:11 PMTAP_MYSQL__SELECT='["my_stream.*"]' meltano elt tap-mysql target-bigquery
par_degerman
08/24/2021, 5:08 PMFROM meltano/meltano:latest
ARG PROJECT_ID
WORKDIR /meltano
RUN meltano init $PROJECT_ID && cd $PROJECT_ID && \
meltano add extractor tap-mysql && \
meltano config tap-mysql set host cloud-sql-proxy && \
meltano config tap-mysql set user meltano && \
meltano config tap-mysql set _metadata "*" replication-method LOG_BASED && \
meltano add loader target-bigquery && \
DATASET_ID=$(echo $PROJECT_ID | tr '-' '_') && \
meltano config target-bigquery set dataset_id $DATASET_ID && \
meltano config target-bigquery set project_id data-analytics && \
meltano config target-bigquery set location EU
COPY entrypoint.sh .
RUN chmod u+x entrypoint.sh
ENV PROJECT_ID=$PROJECT_ID
ENTRYPOINT [ "./entrypoint.sh" ]
And an entrypoint.sh that looks like this:
#!/bin/sh
set -x
if [ -z "$PROJECT_ID" ]; then
echo FATAL: Project is not defined, exiting...
exit 1
fi
cd $PROJECT_ID
if [ ! $? -eq 0 ]; then
echo FATAL: Cannot change into project directory, exiting...
exit 1
fi
echo "Starting ETL pipeline $PIPELINE_NAME..."
cd $PIPELINE_NAME
meltano select tap-mysql "*-Event" "*"
meltano select tap-mysql "*-EventDataKey" "*"
meltano elt tap-mysql target-bigquery --job_id=$PIPELINE_NAME
exec "$@"
#EOF
Again, as a total n00b, I don't know if this is not according to best practices. I'm eager to learn and hear your inputpar_degerman
08/24/2021, 5:12 PMken_payne
08/24/2021, 5:22 PMmeltano.yml
which, provided secrets are injected using env vars (rather than hard coded), is safe to check in to version control. If nothing else, that does mean you can inspect and modify the yml without needing to connect to a running docker image. But as you are mostly adding new streams, you shouldn't have any trouble with state clearing, so all should just work as expected for you ๐par_degerman
08/25/2021, 6:15 AMmeltano install
?
3. Issuing the meltano elt
command when starting the containerken_payne
08/25/2021, 7:33 AM