jose
11/17/2022, 1:07 AMdocker run -v $(pwd):/project -w /project meltano/meltano --log-level=debug elt tap-postgres target-bigquery
If i run this from the UI, i get the message "No state was found, complete import." on the first run, but then on the second time i try to run it i get:
2022-11-17T00:54:50.284064Z [info ] Found state from 2022-11-17 00:49:10.989026.
Any idea of what i could be missing here?Sven Balnojan
11/21/2022, 12:05 PMWill Da Silva (Arch)
11/21/2022, 2:00 PMjose
11/21/2022, 2:47 PMdocker run -v $(pwd):/project -w /project meltano/meltano state listjose
11/21/2022, 2:47 PMjose
11/21/2022, 2:50 PMdocker run -v $(pwd):/project -w /project meltano/meltano --log-level=debug elt tap-postgres target-bigquery-xxx
Looking at the log i can see this in the end:jose
11/21/2022, 2:51 PMjose
11/21/2022, 2:54 PMSven Balnojan
11/21/2022, 3:16 PMcody_hanson
11/21/2022, 3:35 PMstate.json log line is expected. Meltano writes the state it finds in its backend to a state.json file for the tap to read. If there is no state to write, then you'll see that log line.
2. Could you share the configuration for default_replication_method ? If you're doing log-based replication, ensure that your source postgres database is configured correctly for log-based replication. If you're doing incremental replication, make sure that you've set the correct replication_key for each of your streams. This behavior could also be caused by the tap not having any replication keys configured.jose
11/21/2022, 4:40 PMcody_hanson
11/21/2022, 5:21 PMWhy running this from the UI and from command line had a different behavior?That's a really good question and would probably be worth filing a bug report. I'll also loop in <!subteam^S02BCD9FFEF> in case anyone else with more in-depth knowledge of the UI has any insight on this.
I'm assuming that in the first run, the process don't use the log and just does a full dump of the table and because of that does not need the wal2jsonThis seems likely. My guess would be that the underlying issue is the same (that log-based replication isn't working) but for whatever reason, that error is being swallowed on the CLI. Let me know how incremental replication works and whether or not the issue with log-based replication persists after your environment is fully configured for it.
ken_payne
11/21/2022, 7:15 PMtap-postgres are you using? Guessing its the pipelinewise (PPW) one? This behaviour sounds like PPW fallback to me 🤔 The PPW taps contain logic for intelligently selecting the replication method, intended for running backfills using FULL_TABLE and then switching to LOG_BASED thereafter. Here is the relevant logic in the pipelinewise-tap-postgres codebase - it checks for specific values in the bookmark and "falls back" if they aren't found. So it may be that i) Meltano cannot access your state backend to retrieve state (and so is using an empty sqlite database locally) or ii) your state file is missing the bookmark values expected by the tap and is erroneously marking the stream as FULL_TABLE. Given your recent experience with the UI and the waltojson error, it seems the tap is correctly receiving state and attempting to read logs via the UI container. So I would check network connectivity/permissions between your workers calling the Meltano cli and your DB, and also verify you are correctly injecting the Meltano backend database URI secret in prod 🙂
Why running this from the UI and from command line had a different behavior?This can happen when using different system databases or state backends, as this is where state is stored between invocations of Meltano. By default, Meltano will use a sqlite database in the local
.meltano folder if no state backend or system database is configured. It may be that the UI is using a sqlite database, or the workers are using a sqlite database or both 😅 This can be solved by ensuring both the UI and the machines running commands via the CLI have access and are configured to use the same system database or state backend.jose
11/22/2022, 12:13 AMjose
11/22/2022, 12:17 AM--state-id=postgres-to-bigquery
I was running this command: docker run -v $(pwd):/project -w /project meltano/meltano --log-level=debug elt tap-postgres target-bigquery
and if i run it like this: docker run -v $(pwd):/project -w /project meltano/meltano --log-level=debug elt tap-postgres target-bigquery --state-id=postgres-to-bigquery
i'm getting the same error as in the ui 🙂.ken_payne
11/22/2022, 10:06 AMmeltano elt has largely been superseded by meltano run which automatically handles state id's 😅 We have rough plans to remove elt in v3 of Meltano (expected some time in 2023). If you haven't already given the run command a try, I recommend checking it out 👍jose
11/22/2022, 10:48 AMken_payne
11/22/2022, 10:53 AMmeltano elt when you began with your project? We have been updating our docs and tutorials to point everyone to meltano run, but we may have missed some 😅jose
11/22/2022, 11:42 AMjose
11/24/2022, 12:54 AMdocker run -v $(pwd):/project -w /project meltano/meltano config tap-postgres set metadata.dw-table1_report.replication-method "INCREMENTAL"
docker run -v $(pwd):/project -w /project meltano/meltano config tap-postgres set metadata.dw-table1_report.replication-key date_changed
I've also set the state backend uri and database uri to an external postgres db:
docker run -v $(pwd):/project -w /project meltano/meltano config meltano set state_backend.uri <postgresql://user:pass@server12345.aws.com:5432/dbtest?options=-csearch_path%3Dmelty>
docker run -v $(pwd):/project -w /project meltano/meltano config meltano set database_uri <postgresql://user:pass@server12345.aws.com:5432/dbtest?options=-csearch_path%3Dmelty>
i'm using this command to run:
docker run -v $(pwd):/project -w /project meltano/meltano --log-level=debug run tap-postgres target-bigquery
I have the table already loaded in bigquery and want only to upload the deltas, basically everything that was updated since the last run.
When ths process was running it just did a full load again and didn't considered the state.
In the log file i can see this:
2022-11-24T00:38:39.402260Z [debug ] Could not find state.json in /project/.meltano/extractors/tap-postgres/state.json, skipping.
2022-11-24T00:38:39.411062Z [info ] Found state from 2022-11-24 00:23:12.677537.
Here it run and uploaded all the records again
In the end i'm getting this:
2022-11-24T00:39:41.568131Z [info ] time=2022-11-24 00:39:41 name=target_bigquery level=INFO message=Emitting state {"bookmarks": {"dw-table1_report": {"last_replication_method": "INCREMENTAL", "replication_key": "date_changed", "version": 1669119374130, "replication_key_value": "2022-11-23T09:10:47.560000+00:00"}}, "currently_syncing": null} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stderr string_id=target-bigquery
2022-11-24T00:39:41.577153Z [info ] Incremental state has been updated at 2022-11-24 00:39:41.576912.
2022-11-24T00:39:41.577484Z [debug ] Incremental state: {'bookmarks': {'dw-table1_report': {'last_replication_method': 'INCREMENTAL', 'replication_key': 'date_changed', 'version': 1669119374130, 'replication_key_value': '2022-11-23T09:10:47.560000+00:00'}}, 'currently_syncing': None}
2022-11-24T00:39:41.577649Z [info ] {"bookmarks": {"dw-table1_report": {"last_replication_method": "INCREMENTAL", "replication_key": "date_changed", "version": 1669119374130, "replication_key_value": "2022-11-23T09:10:47.560000+00:00"}}, "currently_syncing": null} cmd_type=elb consumer=True name=target-bigquery producer=False stdio=stdout string_id=target-bigquery
The most recent update in the table has a date_changed = 2022-11-23T09:10:47.560
If i run the process again it just does a full table again and the log file will be similar to what i've posted above.
Any Idea of what could be happening ?jose
11/24/2022, 12:58 AMjose
11/24/2022, 12:59 AMjose
11/24/2022, 1:02 AMdocker run -v $(pwd):/project -w /project meltano/meltano state get dev:tap-postgres-to-target-bigquery
I'm getting this info, that is in line with what can be seen in the log:
2022-11-24T00:59:48.720352Z [info ] Found state from 2022-11-24 00:38:39.151956.
{"singer_state": {"bookmarks": {"dw-table1_report": {"last_replication_method": "INCREMENTAL", "replication_key": "date_changed", "version": 1669119374130, "replication_key_value": "2022-11-23T09:10:47.560000+00:00"}}, "currently_syncing": null}}abhishek_ajmera
11/24/2022, 4:31 AMmeltano elt with a --state-id flag given instead of meltano run and see if you specifying the state helps?jose
11/24/2022, 9:56 AMadd_metadata_columns: true
credentials_path: /project/access.json
dataset_id: ds_test
default_target_schema: ds_test
location: europe-west6
primary_key_required: false
project_id: project-dev
For what i can see in BigQuery project history, Meltano is uploading an avro file and then create a temp table with the content. After it runs a merge command between the temp table and the final table.
To be honest i would prefer this to do an append in the existing table, rather than a merge. The merge makes some assumption in terms of uniqueness that i would prefer to control. Do you know if there's a way of doing it?
I'll try the meltano elt and let you knowjose
11/24/2022, 10:15 AMmeltano run and had the same behaviour as the meltano eltabhishek_ajmera
11/24/2022, 11:26 AM