Hi Team, i have setup a meltano EL job, which is r...
# troubleshooting
s
Hi Team, i have setup a meltano EL job, which is replicating data from postgres to clickhouse. I am to successfully run the replication job and data is being ingested into clickhouse. But the issue is that on continuous runs its duplicating the data. lets say if i have 5 rows in the next run it will be 10 then 15 and so on what might be the issue ? I am using LOG_BASED replication
m
Hi @Syed Hamza Raza Kazmi! I'm facing exactly the same issue. Have you manage to solve it?
s
nope
m
😞 thanks. I was planning to create an issue (but I'm not really sure to who tap-postgres or target-clickhouse) unless you already create one?
s
no i am not using meltano switch to Airbyte
i did not had a great experience with meltano
m
yeah. We were testing Airbyte already but heard some good opinions about Meltano and we decided to give it a try. Looks like we need to fall back to Airbyte 🤷
s
Airbyte i would say definitely much better option
m
Yup. We were a bit unhappy that the data loaded from Postgres to Clickhouse is JSON encoded and we had to create views to decode it, so Meltano looked promising
b
Hello @Michal & @Syed Hamza Raza Kazmi, have you tried setting the engine_type as 'ReplacingMergeTree' and load method as 'upsert' ?
e.g. loaders: - name: target-clickhouse config: load_method: upsert engine_type: ReplacingMergeTree
m
Hello @brian_njenga thanks for the hints, but it didn't help. I initially got 4 rows in my table. I ran FULL_TABLE sync first and then changed to LOG_BASED with the settings you posted. Unfortunately every time I run Meltano it adds latest records even though I didn't edit it
b
Maybe a final suggestion @Michal, have the default_replication_method in tap-postgres as INCREMENTAL, then do following 1. Run -> meltano state list 2. Run -> meltano state clear <state(s) listed above> 3. Add optimize_after: true to target-clickhouse in meltano.yml Then try running again and see if this is resolved.
m
Thanks @brian_njenga. Thanks a lot! It help but.. for my small table - it worked well. When I did try running it on a bigger one with some real data we use, with partitioning enabled (I don't know if it changes smth in Meltano) it, unfortunately, didn't to the job. I also had to set
Copy code
<merge_tree>
      <allow_nullable_key>true</allow_nullable_key>
  </merge_tree>
in config to allow nullable keys, as Meltano complained about it when running initial FULL_TABLE. Then, when I changed to LOG_BASED it stopped appending new records. I also noticed interesting thing. Looks like Meltano is not saving the bookmarks, like in my small table from previous test
Copy code
INFO     | target-clickhouse    | Emitting completed target state {"bookmarks": {"<schema_name>-document": {}}}
cat .meltano/run/tap-postgres/state.json
{
  "bookmarks": {
    "<schema_name>-document": {}
  }
}
and here is the file for working example:
Copy code
cat .meltano/run/tap-postgres/state.json
{
  "bookmarks": {
    "<schema_name>-mgtest": {
      "replication_key": "_sdc_lsn",
      "replication_key_value": 9546101697424
    }
  }
}
I don't know what's going on. I did try clearing state, as you suggested, but no luck 🤷