Hello, I've been trying to get the Meltano EL proc...
# troubleshooting
j
Hello, I've been trying to get the Meltano EL process up and running for a few weeks, but it hasn't been as straightforward as I had hoped. While I’ve successfully run the pipeline, I’m not getting the right number of records. I've tried optimizing my code, but it hasn’t worked, and I'm puzzled about how to fix this so I can move forward with the project. It also takes forever to build, and I’m not sure what I’m doing wrong. Please find my meltano.yml file below.
Copy code
version: 1
default_environment: dev
project_id: 5dbc4547-80ec-4699-9dc3-2dc4a1569786
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-mysql
    variant: transferwise
    pip_url: 
      git+<https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql>
    config:
      session_sqls:
        - SET @@session.max_execution_time=0       # No limit
        - SET @@session.net_read_timeout=7200      # 1 hour
        - SET @@session.net_write_timeout=7200     # 1 hour
          # Set other session variables to the default PPW ones
        - SET @@session.time_zone="+0:00"
        - SET @@session.wait_timeout=28800
        - SET @@session.innodb_lock_wait_timeout=7200
    select:
      - '*.*'          # Select all tables first
      - '!*_audit*' 
    metadata:
      '*.*':    # Apply metadata to all non-excluded tables
        replication-method: LOG_BASED
        replication-key: id
        id:
          is-replication-key: true

  mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    executable: meltano-map-transform
    mappings:
    - name: rename_stream
      config:
        stream_maps:
          '*':
            __alias__: __stream_name__ .replace("smartterm-", "")

  loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      schema_resolver_version: 1
      denormalized: true
      upsert: true
      partition_granularity: month
      fail_fast: false
      timeout: 3600
      flattening_enabled: true
      flattening_max_depth: 3
      stream_maps:
        "*":  # Apply this transformation to all streams
          "*":  # Apply this transformation to all properties (columns) in the stream
            __else__: "__NULL__"  # Replace NULL values with "__NULL__"
r
I’m not getting the right number of records
Is there a pattern you can identify for the data you are missing?
It also takes forever to build
As in the EL process, or are you trying to Dockerise the whole project?
j
@Reuben (Matatika) Firstly, I want to ensure that each table returns the exact same number of records and that no table is empty, which isn't the case at the moment. Before the tables were flattened, I could see the correct records in JSON format, but the issue was that subsequent runs caused record duplication. To address this, I decided to flatten the tables to remove duplicates. While this worked in preventing duplication, it resulted in incorrect records for most tables, and in some cases, no records at all. Additionally, I noticed that the replication method LOG_BASED re-replicates all the tables every time the pipeline runs. This increases processing time and occasionally causes the pipeline to break. Finally, the EL process is already containerized using Docker and managed with docker-compose to run the entire process, as shown below
Copy code
├── Makefile
├── containers
│   ├── Dockerfile
│   └── requirements.txt
├── data_pipeline
│   ├── README.md
│   ├── analyze
│   ├── extract
│   ├── load
│   ├── meltano.yml
│   ├── notebook
│   ├── orchestrate
│   ├── output
│   ├── plugins
│   │   ├── extractors
│   │   │   └── tap-mysql--transferwise.lock
│   │   ├── loaders
│   │   │   └── target-bigquery--z3z1ma.lock
│   │   └── mappers
│   │       └── meltano-map-transformer--meltano.lock
│   ├── requirements.txt
│   └── transform
└── docker-compose.yml
I tried one of the tables that returned empty and below is a snippet of the code. I'm not sure what this means `ACTIVATE_VERSION message received but not implemented by this target.`:
Copy code
2025-02-11T17:27:59.837423Z [info     ] 2025-02-11 17:27:59,834 | INFO     | target-bigquery      | Setting up clients cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-bigquery:test name=target-bigquery producer=False run_id=82931c72-da02-44f5-ba7a-df0c6252b737 stdio=stderr string_id=target-bigquery
2025-02-11T17:27:59.838866Z [info     ] 2025-02-11 17:27:59,835 | WARNING  | target-bigquery      | ACTIVATE_VERSION message received but not implemented by this target. Ignoring. cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-bigquery:test name=target-bigquery producer=False run_id=82931c72-da02-44f5-ba7a-df0c6252b737 stdio=stderr string_id=target-bigquery
2025-02-11T17:27:59.975044Z [info     ] 2025-02-11 17:27:59,974 | WARNING  | target-bigquery      | ACTIVATE_VERSION message received but not implemented by this target. Ignoring. cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-bigquery:test name=target-bigquery producer=False run_id=82931c72-da02-44f5-ba7a-df0c6252b737 stdio=stderr string_id=target-bigquery
2025-02-11T17:27:59.975865Z [info     ] 2025-02-11 17:27:59,975 | INFO     | target-bigquery      | Target 'target-bigquery' completed reading 116 lines of input (109 records, (0 batch manifests, 4 state messages). cmd_type=elb consumer=True job_name=dev:tap-mysql-to-target-bigquery:test name=target-bigquery producer=False run_id=82931c72-da02-44f5-ba7a-df0c6252b737 stdio=stderr string_id=target-bigquery
r
> I'm not sure what this means
ACTIVATE_VERSION message received but not implemented by this target.
> Pretty sure it's not related to your issue(s), but if you are interested: https://hub.meltano.com/singer/docs/#activate-version
I don't see how flattening would have solved the duplication problem, unless you were trying to set primary keys from nested properties (which it doesn't look like you are). If you run the tap
Copy code
meltano invoke tap-mysql > tap.out
in
tap.out
, you should see some
STATE
messages that have
key_properties
defined - these should be the same as the primary keys in your MySQL tables, and should be used by the SQL target in the destination tables too.
Can you verify if there are primary keys set in the BigQuery tables?
👍 1