hey there just got working on meltano and I have a quick YES Meltano #troubleshooting

hey there! just got working on meltano and I have ...

minh

10/06/2022, 7:23 PM

hey there! just got working on meltano and I have a quick YES/NO question. I'm replicating a mysql DB to GCS. my config looks like this

Copy code

version: 1
default_environment: dev
project_id: 5174ce55-6bbf-4c90-a65d-f6b31afd4a40
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        select:
        - '*.*'
        metadata:
          '*':
            replication-method: LOG_BASED
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mysql
    variant: transferwise
    pip_url: pipelinewise-tap-mysql
    config:
      host: redacted
      port: redacted
      user: redacted
      database: redacted
      ssl: false
  loaders:
  - name: target-gcs
    variant: datateer
    pip_url: git+<https://github.com/Datateer/target-gcs.git>
    config:
      bucket_name: redacted
      credentials_file: redacted
      date_format: redacted
      key_prefix: ingest/

Now I see files in the destination but it keeps failing with this error

level=CRITICAL message='NoneType' object has no attribute 'settimeout'

- I have seen some threads regarding this and i'll try out some session arguments. However, the question I have is that if i run

meltano run tap-mysql target-gcs

over and over, will it pick up from where it left off? The reason I'm asking is because I see something like this at the end of a failed job

Copy code

4', 'Content-Range': 'bytes 37224448-37486591/*'} cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs
2022-10-06T19:14:50.800926Z [info     ] time=2022-10-06 15:14:50 name=target-gcs level=INFO message=Target 'target-gcs' completed reading 144531 lines of input (144384 records, 145 state messages). cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs
2022-10-06T19:14:50.806808Z [info     ] time=2022-10-06 15:14:50 name=target-gcs level=INFO message=Emitting completed target state {"currently_syncing": "dev-foo", "bookmarks": {"dev-admins": {"version": 1665083536009, "log_file": "mysql-bin.000680", "log_pos": 31760098}, "dev-uses": {"version": 1665083537415, "log_file": "mysql-bin.000680", "log_pos": 31763039}, "dev-buzz": {"version": 1665083539568, "log_file": "mysql-bin.000680", "log_pos": 31801221}, "dev-foo": {"version": 1665083541240, "log_file": "mysql-bin.000680", "log_pos": 31854385, "max_pk_values": {"id": 24264758}, "last_pk_fetched": {"id": 274233}}}} cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs

and when I rerun it again I will see something like

Copy code

2022-10-06T19:13:54.825587Z [info     ] time=2022-10-06 15:13:54 name=tap_mysql level=INFO message=LOG_BASED stream dev-foo will resume its historical sync cmd_type=elb consumer=False name=tap-mysql producer=True stdio=stderr string_id=tap-mysql

But running it again and again, I see that the

log_pos

is the same as the previous run

thomas_briggs

10/06/2022, 8:44 PM

It should checkpoint when the target writes a batch and then resume from that point on the next run. So where exactly it'll resume from depends on where the error occurs in the data, how the target behaves, etc. Target behavior varies by target, so... it's hard to say. 😕

44 Views

Open in Slack

Previous Next