steve_ivy
09/13/2021, 3:52 PMaaronsteers
09/14/2021, 3:02 PMupdated_on
timestamp columns. (Some tables might not have them.) You can avoid future headaches by starting with a plan that includes at least some tables synced via log-based replication.
2. Streamed vs Batch. Do daily incremental volumes fit in your daily processing window using a streaming-records-from-select approach - or do you need (as of now, upcoming) batch functionality to reach your daily SLAs?
3. Initial Backfill. Same questions for (2) in the initial sync - that much data flowing record-by-record will take a while to do the initial backfill. It's certainly not an unreasonable size DB but it'll take some patience on the first run.
4. Target Loader Speed. Can you say what is your target database? Is that Redshift? There are a few different forks of Redshift targets and some may be more performant than others. If you do experience slowness, keep in mind that you might have loader slowness. (Might not become an issue at all, but bringing it up here for completeness.)steve_ivy
09/14/2021, 3:03 PMsteve_ivy
09/14/2021, 3:05 PMsteve_ivy
09/14/2021, 3:06 PMsteve_ivy
09/14/2021, 3:07 PMsteve_ivy
09/14/2021, 3:09 PMsteve_ivy
09/14/2021, 3:09 PMaaronsteers
09/14/2021, 3:20 PMI’m not familiar with “streamed-from-select” as a term so not sure how to answer you thereI just made that up, sorry. Just the difference between sending records downstream via the tried-and-true
select .. from ...
versus it's more modern file-based alternatives unload/copy from ... to ...
. If a normal select isn't fast enough, the bulk export path is something we're looking at for sources which can accept it.steve_ivy
09/14/2021, 3:21 PMsteve_ivy
09/14/2021, 3:22 PMaaronsteers
09/14/2021, 3:22 PMsteve_ivy
09/14/2021, 3:32 PMsteve_ivy
09/14/2021, 3:33 PMcopy into
for snowflake, that code has been in flux recently