matt_elgazar
08/18/2023, 1:55 PMstate-id
when using meltano run
or meltano elt
. I have a docker image that builds and calls meltano run tap-mongodb target-snowflake
once per hour in production. When I push an update to that repository, the docker image rebuilds, causing it to run the entire pipeline from scratch even though INCREMENTAL and LOG_BASED are set (for different collections). I don’t see it loading the data into the target, making me think that meltano did not keep track of the state-id
. What are the steps I should take to keep track of the state-id even if the docker image rebuilds?edgar_ramirez_mondragon
08/18/2023, 2:03 PM.meltano/meltano.db
, so it's most likely getting lost after the container is removed. You may want to use an external state backend: https://docs.meltano.com/concepts/state_backendsmatt_elgazar
08/18/2023, 2:25 PMstate_backend.uri: azure://<your container_name>/<prefix for state JSON blobs>
• and the connection string state_backend.azure.connection_string
Once I have these then I should be able to authenticate the azure db connection by setting the state backend in meltano.yml? Are there any other steps I need to take?
extractors:
- name: tap-mongodb
namespace: tap_mongodb
state_backend:
type: remote
uri: azure://<your container_name>/<prefix for state JSON blobs> # get this from .env ${TAP_MONGODB_AZURE_MELTANO_STATE}
connection_string: ${TAP_MONGODB_STATE_CONNECTION_STRING}
Andy Carter
08/18/2023, 2:33 PMmeltano config meltano set state_backend.uri
get persisted? They're not in my meltano.yml
, somewhere else?matt_elgazar
08/18/2023, 2:49 PMRUN meltano config meltano set state_backend.uri <azure://<your container_name>/<prefix for state JSON blobs>
and then added the connection string in .env as AZURE_STORAGE_CONNECTION_STRING
? I looks like I don’t need to do it this way, and should theoretically be able to do this in meltano.yml using my example above (but not sure).edgar_ramirez_mondragon
08/18/2023, 2:59 PM.env
by that command (see the setting definition)
@matt_elgazar You can pass both as env vars to the container:
• MELTANO_STATE_BACKEND_URI
• MELTANO_STATE_BACKEND_AZURE_CONNECTION_STRINGAndy Carter
08/18/2023, 3:13 PMmatt_elgazar
08/18/2023, 3:29 PMedgar_ramirez_mondragon
08/18/2023, 4:16 PMproject_id: ...
state_backend:
uri: azure://<your container_name>/<prefix for state JSON blobs>
azure:
connection_string: ...
Or just pass the env vars I mentioned above. That should work too.Andy Carter
08/18/2023, 7:22 PMconnection_string
is not present, will it default back to using DefaultAzureCredential? Is that what return BlobServiceClient()
suggests? I would love to turn off the access via connection string to make my IT team happier.edgar_ramirez_mondragon
08/18/2023, 8:27 PM>>> from azure.storage.blob import BlobServiceClient
>>> BlobServiceClient()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: BlobServiceClient.__init__() missing 1 required positional argument: 'account_url'
Can you log an issue? I can add details to it later 🙂user
08/21/2023, 10:23 PMmatt_elgazar
08/22/2023, 4:07 PMcollection3
) under the select but it doesn’t appear to be working.
version: 1
send_anonymous_usage_stats: true
project_id: tap-mongodb
default_environment: dev
state_backend:
type: remote
uri: ${AZURE_TAP_MONGODB_STATE_URI}
azure:
connection_string: ${AZURE_TAP_MONGODB_STATE_CONNECTION_STRING}
plugins:
extractors:
- name: tap-mongodb
namespace: tap_mongodb
pip_url: git+<https://github.com/melgazar9/tap-mongodb.git@20738c1272ff12eb403abb9f8019200e5acd573f>
capabilities:
- state
- catalog
- discover
- about
- stream-maps
config:
add_record_metadata: true
allow_modify_change_streams: true
select:
- 'collection1.*'
- 'collection2.*'
- 'collection3.*'
environments:
- name: testing
config:
plugins:
extractors:
- name: tap-mongodb
config:
mongodb_connection_string: ${TESTING_MONGODB_CONNECTION_STRING}
database: Testing
select:
- '!collection3.*' # bug in mongodb data - I want to disregard collection3 when environment = 'testing'
loaders:
- name: target-snowflake
env:
TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA: MONGODB_TESTING
user
08/22/2023, 4:19 PMselect
arrays are not additive across environment, which means you have to be explicit about what you want selected in that environment:
Base plugin def:
select:
- 'collection1.*'
- 'collection2.*'
- 'collection3.*'
testing
environment:
select:
- 'collection1.*'
- 'collection2.*'
matt_elgazar
08/22/2023, 4:20 PMedgar_ramirez_mondragon
08/22/2023, 5:46 PM