ashutosh_shanker
10/10/2023, 7:05 AMashutosh_shanker
10/10/2023, 7:05 AM| Target sink for 'public-users_user' is full. Draining... cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:36:47.574519Z [info ] 2023-10-10 12:06:47,574 | INFO | target-s3 | key: mds-bh/local/ashu/parquet/public-users_user/20231010-0636 cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:36:58.701592Z [info ] 2023-10-10 12:06:58,701 | INFO | target-s3 | Target 'target-s3' completed reading 91093 lines of input (1 schemas, 91080 records, 0 batch manifests, 11 state messages). cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:36:58.738839Z [info ] 2023-10-10 12:06:58,738 | INFO | target-s3 | key: mds-bh/local/ashu/parquet/public-users_user/20231010-0636 cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:37:03.868288Z [info ] 2023-10-10 12:07:03,867 | INFO | target-s3 | Cleaning up public-users_user cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:37:03.868871Z [info ] 2023-10-10 12:07:03,868 | INFO | target-s3 | Emitting completed target state {"bookmarks": {"public-users_user": {"last_replication_method": "INCREMENTAL", "replication_key": "id", "version": 1696919723638, "replication_key_value": 114042}}, "currently_syncing": null} cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-10-10T06:37:03.900582Z [info ] Writing state to AWS S3
2023-10-10T06:37:06.421520Z [info ] smart_open.s3.MultipartWriter('mds-bh', 'dev/meltano/state/dev:tap-postgres-to-target-s3/lock'): uploading part_num: 1, 17 bytes (total 0.000GB)
2023-10-10T06:37:07.579636Z [info ] smart_open.s3.MultipartWriter('mds-bh', 'dev/meltano/state/dev:tap-postgres-to-target-s3/state.json'): uploading part_num: 1, 239 bytes (total 0.000GB)
2023-10-10T06:37:08.707181Z [info ] Incremental state has been updated at 2023-10-10 06:37:08.706740.
2023-10-10T06:37:08.723602Z [info ] Block run completed. block_type=ExtractLoadBlocks err=None set_number=0 success=True
ashutosh_shanker
10/10/2023, 7:06 AM- name: target-s3
variant: crowemi
pip_url: git+<https://github.com/crowemi/target-s3.git>
config:
append_date_to_filename: true
append_date_to_filename_grain: minute
append_date_to_prefix: false
cloud_provider:
aws:
aws_bucket: mds-bh
aws_region: us-east-2
include_process_date: false
format:
format_parquet:
validate: false
format_type: parquet
prefix: local/ashu/parquet
ashutosh_shanker
10/10/2023, 8:38 AMname: target-s3-parquet
variant: jkausti
user
10/10/2023, 12:58 PMrun
then state is tracked based on the combination of tap-x + target-y so the second time you run it the tap will be incremental and have state (although idk if tap-postgres does incremental automatically 🤔), if you switch the target to jsonl there will be no state and all records syncing would be expected.user
10/10/2023, 12:59 PMuser
10/10/2023, 1:01 PMashutosh_shanker
10/11/2023, 7:07 AM- name: target-s3
variant: crowemi
pip_url: git+<https://github.com/crowemi/target-s3.git>
config:
append_date_to_filename: true
append_date_to_filename_grain: minute
append_date_to_prefix: false
cloud_provider:
aws:
aws_bucket: mds-bh
aws_region: us-east-2
include_process_date: false
format:
format_parquet:
validate: false
format_type: parquet
prefix: local/ashu/parquet
- name: target-jsonl
variant: andyh1203
pip_url: target-jsonl
config:
destination_path: ${MELTANO_PROJECT_ROOT}/output/
- name: target-s3-parquet
variant: jkausti
pip_url: git+<https://github.com/jkausti/target-s3.git>
config:
aws_region: us-east-2
filetype: parquet
path: mds-bh/local/ashu1/parquet
ashutosh_shanker
10/11/2023, 7:08 AMextractors:
- name: tap-postgres
variant: transferwise
pip_url: pipelinewise-tap-postgres
config:
host: XXXX
port: XXXX
user: XXXX
dbname: XXX
filter_schemas: public
default_replication_method: FULL_TABLE
select:
- public-users_user.*
#- public-customer_customer.*
metadata:
public-users_user:
replication-method: INCREMENTAL
replication-key: id
ashutosh_shanker
10/11/2023, 7:09 AMashutosh_shanker
10/11/2023, 7:10 AMashutosh_shanker
10/11/2023, 7:12 AMashutosh_shanker
10/11/2023, 7:12 AMmeltano run tap-postgres column-remover target-s3
ashutosh_shanker
10/11/2023, 7:13 AMuser
10/11/2023, 2:05 PMit is incremental, so I removed state.json and cleaned up the s3 parquet folder too@ashutosh_shanker when you say you removed the state.json, what exactly did you do? The state is managed by meltano so you'd have to remove it from your state backend which defaults to the local sqlite db. Alternatively you can run with the
--full-refresh
flag like meltano run tap-postgres column-remover target-s3 --full-refresh
to clear it automatically.
Other than that I'd probably recommend opening an issue in the target repository to see if the maintainer has any recommendations.ashutosh_shanker
10/11/2023, 4:05 PM