biel_llobera
10/16/2023, 3:28 PMdata_interval_start
and data_interval_end
? We usually use a functional data engineering approach where we like our data to be written to the data lake in "partitions". Eg: Today we get all data from a database between 2023-10-15:00:00
(data_interval_start) and 2023-10-16:00:00
(data_interval_end). In addition these intervals should be used as filters for key-based replication. This metadata enables us to track, partition, delete and redo loads. This is something implicit in Airflow's philosophy and described here https://maximebeauchemin.medium.com/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a. Is there a way to accomplish this?