Stéphane Burwash
04/26/2022, 2:41 PMselect:
- '*.*'
metadata:
'*':
replication-method: INCREMENTAL
replication-key: updatedAt
edgar_ramirez_mondragon
04/26/2022, 2:43 PMmeltano run
?Stéphane Burwash
04/26/2022, 2:44 PMmeltano elt tap-hubspot target-stitch--hubspot --job_id=blablabla
edgar_ramirez_mondragon
04/26/2022, 3:04 PMedgar_ramirez_mondragon
04/26/2022, 3:05 PMmeltano invoke --dump=catalog tap-hubspot
Stéphane Burwash
04/26/2022, 3:15 PM"streams": [
{
"tap_stream_id": "companies",
"replication_key": "updatedAt",
"replication_method": "INCREMENTAL",
"key_properties": [
"id"
],
Is there a way to view the current state file? I could make a comparison with the data coming inedgar_ramirez_mondragon
04/26/2022, 3:23 PMI created my own tap with the sdk, so should I have set the replication method directly in my tap?Yeah, but with the SDK that's done automatically if you define a replication_key in the stream
Stéphane Burwash
04/26/2022, 3:37 PMStéphane Burwash
04/26/2022, 3:38 PMStéphane Burwash
04/26/2022, 3:38 PMedgar_ramirez_mondragon
04/26/2022, 3:38 PMStéphane Burwash
04/26/2022, 3:39 PMedgar_ramirez_mondragon
04/26/2022, 3:39 PMreplication_method
config option: https://github.com/adswerve/target-bigquery#step-3-configureStéphane Burwash
04/26/2022, 3:54 PMedgar_ramirez_mondragon
04/26/2022, 3:57 PMStéphane Burwash
04/26/2022, 5:19 PMedgar_ramirez_mondragon
04/26/2022, 7:10 PMAlso, I've been having an issue with the start_date, where whatever value I put, the tap will always sync the entire tableis this in the custom tap you made with the sdk? it may just be that you need to implement the actual use of the bookmark in a url param or whatever the api expects, like here: https://github.com/MeltanoLabs/tap-stackexchange/blob/9a27f873c27c181c24271a250d8c94e275c32b8e/tap_stackexchange/client.py#L118
Stéphane Burwash
04/26/2022, 7:36 PMStéphane Burwash
04/27/2022, 1:17 PMedgar_ramirez_mondragon
04/27/2022, 3:01 PMStéphane Burwash
04/27/2022, 3:02 PM1m{"type": "STATE", "value": {"bookmarks": {"owners": {"replication_key": "updatedAt", "replication_key_value": "2022-04-27T14:13:53.871Z", "replication_key_signpost": "2022-04-27T14:49:56.738716+00:00", "starting_replication_value": "2022-04-27T14:13:53.871Z", "progress_markers": {"Note": "Progress is not resumable if interrupted.", "replication_key": "updatedAt", "replication_key_value": "2020-03-10T06:42:02.879Z"}}}}}[0m [36mcmd_type[0m=[35mextractor[0m [36mjob_id[0m=[35mtest_hubspot-to-bigquery[0m [36mname[0m=[35mtap-hubspot (out)[0m [36mrun_id[0m=[35m3628d47c-8497-429c-a1af-fd957390ebc8[0m [36mstdio[0m=[35mstdout[0m
Stéphane Burwash
04/27/2022, 3:03 PMStéphane Burwash
04/27/2022, 3:03 PMStéphane Burwash
04/27/2022, 3:03 PMedgar_ramirez_mondragon
04/27/2022, 3:06 PMedgar_ramirez_mondragon
04/27/2022, 3:16 PMowners
don't really support filtering so it's after the fact: https://github.com/singer-io/tap-hubspot/blob/master/tap_hubspot/__init__.py#L862-L863Stéphane Burwash
04/27/2022, 3:19 PMStéphane Burwash
04/27/2022, 3:23 PMedgar_ramirez_mondragon
04/27/2022, 3:27 PMSo I should manually be managing state?At least reading state, yes. That is actually expected. For "faking" incremental replication on streams that don't really support filtering in the upstream system, we have an issue: https://gitlab.com/meltano/sdk/-/issues/227
edgar_ramirez_mondragon
04/27/2022, 3:30 PMBut ok awesome! From here I should be able to adapt my code. Do you have a code example of someone manual managing state?Yup. Something like https://github.com/MeltanoLabs/tap-stackexchange/blob/main/tap_stackexchange/client.py#L117-L118 except you'll have to use it in
post_process
most likely to filter out unwanted recordsStéphane Burwash
04/27/2022, 3:31 PMStéphane Burwash
04/27/2022, 4:24 PMdef post_process(self, row: dict, context: Optional[dict]) -> dict:
"""As needed, append or transform raw data to match expected structure.
Returns row, or None if row is to be excluded"""
if self.replication_key:
if row['updatedAt'] < self.get_starting_replication_key_value(context):
return None
return row
After pain and suffering, I think we got it! So just to recap, with the sdk, we need to manage most interactions / settings (ex: start date, replication, etc.)Stéphane Burwash
04/27/2022, 4:24 PMedgar_ramirez_mondragon
04/27/2022, 5:44 PMSo just to recap, with the sdk, we need to manage most interactions / settings (ex: start date, replication, etc.)Yeah. Unfortunately we (team and community) haven't come up with the right abstractions that might work declaratively for any sort filtering the source might do, so it's left for the dev to implement 🙂