Is the replication key lowercase With my hubspot tap I recei Meltano #singer-tap-development

Is the replication key lowercase? With my hubspot ...

Stéphane Burwash

04/26/2022, 2:41 PM

Is the replication key lowercase? With my hubspot tap, I receive a value of updatedAt which I set as a replication key, but I'm still doing full page replications it seems instead of the incremental:

Copy code

select:
    - '*.*'
    metadata:
      '*':
        replication-method: INCREMENTAL
        replication-key: updatedAt

edgar_ramirez_mondragon

04/26/2022, 2:43 PM

Hi @Stéphane Burwash! Are you using

meltano run

Stéphane Burwash

04/26/2022, 2:44 PM

No, meltano elt, more specifically

meltano elt tap-hubspot target-stitch--hubspot --job_id=blablabla

edgar_ramirez_mondragon

04/26/2022, 3:04 PM

Ok so you're passing a job_id, that was gonna be my next suggestion 😅. It's uncommon to override the metadata for an API tap since that's usually baked in, so the tap may just be ignoring it.

edgar_ramirez_mondragon

04/26/2022, 3:05 PM

If you dump the catalog, you may be able to if it at least looks right:

meltano invoke --dump=catalog tap-hubspot

Stéphane Burwash

04/26/2022, 3:15 PM

I created my own tap with the sdk, so should I have set the replication method directly in my tap? I think my catalog file look good, here is what's at the top:

Copy code

"streams": [
    {
      "tap_stream_id": "companies",
      "replication_key": "updatedAt",
      "replication_method": "INCREMENTAL",
      "key_properties": [
        "id"
      ],

Is there a way to view the current state file? I could make a comparison with the data coming in

edgar_ramirez_mondragon

04/26/2022, 3:23 PM

I created my own tap with the sdk, so should I have set the replication method directly in my tap?

Yeah, but with the SDK that's done automatically if you define a replication_key in the stream

Stéphane Burwash

04/26/2022, 3:37 PM

I removed the metadata, but I'm still getting the issue, which is weird; with the bigquery loader, I'm even appending full tables instead of updating, which I thought was impossible

Stéphane Burwash

04/26/2022, 3:38 PM

I went from 4255 entries to 8510

Stéphane Burwash

04/26/2022, 3:38 PM

Which I'm guessing is linked to my replication key

edgar_ramirez_mondragon

04/26/2022, 3:38 PM

which bigquery loader are you using?

Stéphane Burwash

04/26/2022, 3:39 PM

Ruslan's, https://github.com/adswerve/target-bigquery.git@0.12.1

edgar_ramirez_mondragon

04/26/2022, 3:39 PM

yup, there's a

replication_method

config option: https://github.com/adswerve/target-bigquery#step-3-configure

Stéphane Burwash

04/26/2022, 3:54 PM

Ok thanks! I shall look into it and get back to you 😄 Hopefully this is fixed shortly

edgar_ramirez_mondragon

04/26/2022, 3:57 PM

cool. do let me know

Stéphane Burwash

04/26/2022, 5:19 PM

Well sadly I can't find the source of my error; it's also harder to test since I'm performing all of my testing directly on bigquery, while my main tap actually goes through stitch, which is really our main issue since it's costing us a lot of rows every 15 minutes. Also, I've been having an issue with the start_date, where whatever value I put, the tap will always sync the entire table. Could this be linked to my issue?

edgar_ramirez_mondragon

04/26/2022, 7:10 PM

Also, I've been having an issue with the start_date, where whatever value I put, the tap will always sync the entire table

is this in the custom tap you made with the sdk? it may just be that you need to implement the actual use of the bookmark in a url param or whatever the api expects, like here: https://github.com/MeltanoLabs/tap-stackexchange/blob/9a27f873c27c181c24271a250d8c94e275c32b8e/tap_stackexchange/client.py#L118

Stéphane Burwash

04/26/2022, 7:36 PM

Awesome, thank you so much! Back to my replication issue then 😉

Stéphane Burwash

04/27/2022, 1:17 PM

Update: I'm back to square one. I created a testing api endpoint on stitch, but every time I sync a table (ex: owners in hubspot) it's counting the data as loaded. So for 132 owners, I'm now up to 396 loaded rows in stitch (but only 132 rows in gbq, my final warehouse) Does this mean the issue is with meltano, or stitch?

edgar_ramirez_mondragon

04/27/2022, 3:01 PM

I guess that means stitch has processed the same 132 rows three times? If you're upserting in bq 132 seems right. Although it's not running incrementally it seems

Stéphane Burwash

04/27/2022, 3:02 PM

Copy code

1m{"type": "STATE", "value": {"bookmarks": {"owners": {"replication_key": "updatedAt", "replication_key_value": "2022-04-27T14:13:53.871Z", "replication_key_signpost": "2022-04-27T14:49:56.738716+00:00", "starting_replication_value": "2022-04-27T14:13:53.871Z", "progress_markers": {"Note": "Progress is not resumable if interrupted.", "replication_key": "updatedAt", "replication_key_value": "2020-03-10T06:42:02.879Z"}}}}}[0m [36mcmd_type[0m=[35mextractor[0m [36mjob_id[0m=[35mtest_hubspot-to-bigquery[0m [36mname[0m=[35mtap-hubspot (out)[0m [36mrun_id[0m=[35m3628d47c-8497-429c-a1af-fd957390ebc8[0m [36mstdio[0m=[35mstdout[0m

Stéphane Burwash

04/27/2022, 3:03 PM

Well it seems my replication key is set properly, and the state exists

Stéphane Burwash

04/27/2022, 3:03 PM

1mIncremental state has been updated at 2022-04-27 144957.473453.[0m

Stéphane Burwash

04/27/2022, 3:03 PM

And At the end it says my incremental state has been updated, I just can't see anywhere where it was actually considered 😛

edgar_ramirez_mondragon

04/27/2022, 3:06 PM

ok I just realized I had even starred your tap-hubspot repo 🤦‍♂️. I'm looking at it now...

edgar_ramirez_mondragon

04/27/2022, 3:16 PM

so it seems you're using that state anywhere to query the api. You still need to call get_starting_timestamp and use the value in the streams that can be filtered. And looking at other variants of the tap, it seems like some endpoints like

owners

don't really support filtering so it's after the fact: https://github.com/singer-io/tap-hubspot/blob/master/tap_hubspot/__init__.py#L862-L863

Stéphane Burwash

04/27/2022, 3:19 PM

So I should manually be managing state? I thought the sdk managed that under the hood no?

Stéphane Burwash

04/27/2022, 3:23 PM

But ok awesome! From here I should be able to adapt my code. Do you have a code example of someone manual managing state?

edgar_ramirez_mondragon

04/27/2022, 3:27 PM

So I should manually be managing state?

At least reading state, yes. That is actually expected. For "faking" incremental replication on streams that don't really support filtering in the upstream system, we have an issue: https://gitlab.com/meltano/sdk/-/issues/227

edgar_ramirez_mondragon

04/27/2022, 3:30 PM

But ok awesome! From here I should be able to adapt my code. Do you have a code example of someone manual managing state?

Yup. Something like https://github.com/MeltanoLabs/tap-stackexchange/blob/main/tap_stackexchange/client.py#L117-L118 except you'll have to use it in

post_process

most likely to filter out unwanted records

Stéphane Burwash

04/27/2022, 3:31 PM

As always, you help is infinitely helpful, thank you so much! Ill update you when I'm done

Stéphane Burwash

04/27/2022, 4:24 PM

Copy code

def post_process(self, row: dict, context: Optional[dict]) -> dict:
        """As needed, append or transform raw data to match expected structure.
        Returns row, or None if row is to be excluded"""
        if self.replication_key:
            if row['updatedAt'] < self.get_starting_replication_key_value(context):
                return None
        return row

After pain and suffering, I think we got it! So just to recap, with the sdk, we need to manage most interactions / settings (ex: start date, replication, etc.)

Stéphane Burwash

04/27/2022, 4:24 PM

Again, thank you so much @edgar_ramirez_mondragon

edgar_ramirez_mondragon

04/27/2022, 5:44 PM

So just to recap, with the sdk, we need to manage most interactions / settings (ex: start date, replication, etc.)

Yeah. Unfortunately we (team and community) haven't come up with the right abstractions that might work declaratively for any sort filtering the source might do, so it's left for the dev to implement 🙂

3 Views

Open in Slack

Previous Next