Stéphane Burwash
10/21/2022, 8:39 PM_sdc_received_at
flag, so it is possible to establish which record is the "most recent"
Now my question is: What would be better->
1. Delete the old, deprecated records in my raw data
2. Only query the new, updated records in my staging data
Any other suggestions would be appreciatedtaylor
10/21/2022, 9:02 PMchristoph
10/22/2022, 12:05 AMStéphane Burwash
10/24/2022, 12:44 PMtaylor
10/24/2022, 1:59 PMchristoph
10/24/2022, 8:29 PMStéphane Burwash
10/25/2022, 3:38 PMwith source as (
select
*,
row_number() over(partition by id order by _sdc_extracted_at desc) as row_number
FROM {{source('hubspot', 'associations_deals_contacts')}}
),
renamed as (
select
CAST(id as STRING) as id,
CAST(toObjectId as STRING) as toObjectId,
associationTypes.value.category as association_types_category,
associationTypes.value.typeId as association_types_type_id,
associationTypes.value.label as association_types_label,
_sdc_extracted_at,
from source,
UNNEST(associationTypes) as associationTypes
where row_number=1
)
select
id,
toObjectId,
association_types_category,
association_types_type_id,
association_types_label,
_sdc_extracted_at
from renamed
taylor
10/25/2022, 3:54 PMStéphane Burwash
10/25/2022, 3:54 PMStéphane Burwash
10/25/2022, 3:54 PMtaylor
10/25/2022, 3:54 PMStéphane Burwash
10/25/2022, 3:55 PMtaylor
10/25/2022, 3:55 PMStéphane Burwash
10/25/2022, 3:57 PMStéphane Burwash
10/25/2022, 3:58 PMtaylor
10/25/2022, 3:58 PMwith source as (
SELECT *
FROM {{source('hubspot', 'associations_deals_contacts')}}
QUALIFY rank() over(partition by id order by _sdc_extracted_at desc) = 1
),
as your first CTE and drop the where clause in the second oneStéphane Burwash
10/25/2022, 3:59 PMStéphane Burwash
10/25/2022, 6:06 PMtaylor
10/25/2022, 6:51 PMtaylor
10/25/2022, 6:52 PMtaylor
10/25/2022, 6:52 PMchristoph
10/25/2022, 8:24 PMdbt_utils.deduplicate
😁alexander_butler
10/25/2022, 8:41 PMqualify
is better. This code in dbt utils comes from before they introduced the keyword I think.
Before "qualify" was introduced, you would typically put the row_number() over ... as rn
in a subquery and then the outer query would `select * except(rn) where rn = 1`; and in that case, what they did in dbt utils may be more performant. But I dont think it is in todays BQ.
Its anecdotal on my end and not properly benchmarked so try both for yourself 😄 and let us know