julian_knight
09/16/2022, 8:17 PMpayment_methods
first, and then later transactions
. transactions
may have a row with a payment_method_id
which was inserted after copying the payment_methods
table. This isn’t much of a problem from a data analysis perspective but it breaks DBT source data tests.
Would wrapping the whole tap operation in a database transaction solve this? Would it create performance problems to do a lot of reads in a transaction? Is there a different solution people use for this? We could just remove our DBT foreign key tests but I’d prefer data to be a little stale but consistentvisch
09/17/2022, 3:45 AMvisch
09/17/2022, 3:46 AMaaron_phethean
09/17/2022, 7:11 AMjulian_knight
09/17/2022, 2:47 PMvisch
09/17/2022, 4:16 PMaaronsteers
09/20/2022, 3:13 PMpayment_methods
after the rest of the tables. Similarly, you could also just sync those few trouble-tables again after the main load completes. In theory, you could use a strategy like this to avoid orphaned fact records that otherwise would fail referential integrity checks.julian_knight
09/20/2022, 4:01 PMWould you consider a sync strategy where the ‘fact-like’ tables always sync first and ‘dim-like’ tables sync afterwards?I don’t love this solution. The application is always evolving and adding new relationships, so complicating the pipeline configuration to match sounds like large a maintenance burden. It also seems a bit of a design smell; the transform layer is where dealing with quirks like this should go; the EL layer should be largely agnostic to the source data, instead replicating it as closely as possible and moving on. Additionally, the source data isn’t oriented around star schema, so deciding which relations should go where is a hard problem, and likely not limited to just 2 copy cycles, but potentially several. If I can’t use transactions, I’d instead lean towards one of these solutions: • using transforms to filter out data that is newer than when the EL started • take a snapshot of the database before running EL • ignore it
aaronsteers
09/21/2022, 5:09 PM