Hey folks, one of our EL pipelines comes from an a...
# singer-tap-development
j
Hey folks, one of our EL pipelines comes from an application database, so it’s always being updated. It doesn’t happen often since we run our pipelines on off-hours, but sometime this can cause race conditions where two related tables might get sync’d in different states depending on the order. For example, maybe we copy
payment_methods
first, and then later
transactions
.
transactions
may have a row with a
payment_method_id
which was inserted after copying the
payment_methods
table. This isn’t much of a problem from a data analysis perspective but it breaks DBT source data tests. Would wrapping the whole tap operation in a database transaction solve this? Would it create performance problems to do a lot of reads in a transaction? Is there a different solution people use for this? We could just remove our DBT foreign key tests but I’d prefer data to be a little stale but consistent