hey there! just got working on meltano and I have ...
# troubleshooting
m
hey there! just got working on meltano and I have a quick YES/NO question. I'm replicating a mysql DB to GCS. my config looks like this
Copy code
version: 1
default_environment: dev
project_id: 5174ce55-6bbf-4c90-a65d-f6b31afd4a40
environments:
- name: dev
  config:
    plugins:
      extractors:
      - name: tap-mysql
        select:
        - '*.*'
        metadata:
          '*':
            replication-method: LOG_BASED
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mysql
    variant: transferwise
    pip_url: pipelinewise-tap-mysql
    config:
      host: redacted
      port: redacted
      user: redacted
      database: redacted
      ssl: false
  loaders:
  - name: target-gcs
    variant: datateer
    pip_url: git+<https://github.com/Datateer/target-gcs.git>
    config:
      bucket_name: redacted
      credentials_file: redacted
      date_format: redacted
      key_prefix: ingest/
Now I see files in the destination but it keeps failing with this error
level=CRITICAL message='NoneType' object has no attribute 'settimeout'
- I have seen some threads regarding this and i'll try out some session arguments. However, the question I have is that if i run
meltano run tap-mysql target-gcs
over and over, will it pick up from where it left off? The reason I'm asking is because I see something like this at the end of a failed job
Copy code
4', 'Content-Range': 'bytes 37224448-37486591/*'} cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs
2022-10-06T19:14:50.800926Z [info     ] time=2022-10-06 15:14:50 name=target-gcs level=INFO message=Target 'target-gcs' completed reading 144531 lines of input (144384 records, 145 state messages). cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs
2022-10-06T19:14:50.806808Z [info     ] time=2022-10-06 15:14:50 name=target-gcs level=INFO message=Emitting completed target state {"currently_syncing": "dev-foo", "bookmarks": {"dev-admins": {"version": 1665083536009, "log_file": "mysql-bin.000680", "log_pos": 31760098}, "dev-uses": {"version": 1665083537415, "log_file": "mysql-bin.000680", "log_pos": 31763039}, "dev-buzz": {"version": 1665083539568, "log_file": "mysql-bin.000680", "log_pos": 31801221}, "dev-foo": {"version": 1665083541240, "log_file": "mysql-bin.000680", "log_pos": 31854385, "max_pk_values": {"id": 24264758}, "last_pk_fetched": {"id": 274233}}}} cmd_type=elb consumer=True name=target-gcs producer=False stdio=stderr string_id=target-gcs
and when I rerun it again I will see something like
Copy code
2022-10-06T19:13:54.825587Z [info     ] time=2022-10-06 15:13:54 name=tap_mysql level=INFO message=LOG_BASED stream dev-foo will resume its historical sync cmd_type=elb consumer=False name=tap-mysql producer=True stdio=stderr string_id=tap-mysql
But running it again and again, I see that the
log_pos
is the same as the previous run
t
It should checkpoint when the target writes a batch and then resume from that point on the next run. So where exactly it'll resume from depends on where the error occurs in the data, how the target behaves, etc. Target behavior varies by target, so... it's hard to say. 😕