Is it a good idea to share state and data in the s...
# troubleshooting
d
Is it a good idea to share state and data in the same DB server ? Or it's better to have db server apart (for example azure blob storage, s3, etc)
s
Would love to hear other opinions on this. I could argue both ways: 1. as data engineer we had our state inside our data warehouse. Then we had some fun with it, because it logs when a run starts and ends, you can use it to set up triggers, check for freshness, put a dashboard on top, etc. 2. from best practices perspective I would say keep them separate, because you want to do very different things with them (separate what has a different purpose). If you're doing a roll back for your database, do you really also want to roll back your state?
d
I understand it like this. When you rollback your data, you rollback your state with it, and able to sync data from the last point
s
If that's how you want things done, then having state + db as one is totally fine.
d
Thank you Sven
j
The rollback use case is definitely interesting. I decided to go with AWS S3 for state, because in a cloud, it is complicated to run a PostgreSQL/MSSQL for free. I found only bit.io service, but it does not perform well... Locally, both work and perform very well(S3 with Minio).
p
I like having both in the same DB, so that I can run tests that state is consistent with what’s actually loaded. (They can become inconsistent if someone has manually deleted some data — a necessity on occasion — but didn’t make a corresponding modification to state.)
j
Thinking about to implement a specialized tap loading Meltano AWS S3 state to any target. The tests could be based on that. You would not rely on supported Meltano DB state backends, you could run tests even when your target is Snowflake, BigQuery, ...
p
Yes that’s probably a cleaner solution.