The Meltano <documentation> on the Singer internal...
# troubleshooting
m
The Meltano documentation on the Singer internals makes a guarantee of “at-least once” delivery:
The Singer Spec promises that each record in the source system will be processed successfully in the target at least once. This promises that no record will ever go missing or be omitted, but it does not guarantee that all records will be received exactly once.
Is there a similarly stated guarantee that records in a meltano stream are processed in order ? Here’s an example to make this question more concrete: • I have a Meltano tap that produces records in order • I have a Meltano target that processes records in the order in which it receives them • I am using either no inline stream maps, or an inline stream map with a 1:1 record transformation (no dropping records, no merging/splitting streams, etc) Can I guarantee that my target receives records in the same order in which my tap produces them?
v
That's an interesting one. By the singer spec itself I don't think so, but by practicality yes
Like Meltano's implementation is yes
tap-name | target-name , yes
m
Yeah I’m more interested in the practical Meltano implementation answer
and I am generally confident that the answer here is “yes”
💯 1
v
Curious what the Meltano folks say if I'm off here but I can't imagine their architecture supporting this but I'm not privy to all of those talks about what might/could be
I'd give my answer 95% confidence
Actually let me think more
target's sometimes implement parallelism. So yes the target recieves the messages in order but they may process them out of order
But that's not your question
c
I think it really depends on the tap and target implementation. Meltano and Singer don't make any guarantees about that since the only guarantee is about at-once delivery.
a
Also curious to know what sort of processes you are thinking about that depend on maintaining record order? Not doubting that they exist, just always keen to understand wider issues.
m
We land data from Meltano into a single table in Snowflake and then have a stored procedure that routes that data into specific tables. We are implementing a row number in those downstream tables and want to confirm we can rely on the incoming order or if we need to implement an explicit order numbering mechanism in the Meltano tap.
v
with that context I'd say go look at the target-snowflake implementation and verify for yourself the batching works the way you need it to, it should I think but I'd just check
m
Following up here to say that I’ve implemented an update to our tap that captures explicit ordering information from the source. We’ve loaded > 85 million records through this tap since that change was made and I can say that the source ordering matches the meltano processing order 100% of the time. There were no occurrences of it not matching. So I’m now even more confident that the answer to my original question here is “yes”.
1
👌 1