Maybe a dumb question but when picking a type of compute Wou Meltano #best-practices

Join Slack

Maybe a dumb question, but when picking a type of ...

# best-practices

devon_seitz

06/29/2022, 4:39 PM

Maybe a dumb question, but when picking a type of compute. Would meltano benefit from a memory or cpu optimized host?

devon_seitz

06/29/2022, 4:39 PM

in my head memory compute would make more sense

visch

06/29/2022, 4:59 PM

compute in almost all scenarios, the buffer is low in meltano and most the taps are optimized for super low memory use

visch

06/29/2022, 4:59 PM

I honestly don't think about it when deploying things until I hit an issue

devon_seitz

06/29/2022, 5:00 PM

Cool, the reason I ask is that initial syncs are taking a very long time on tables ~10m rows in size and looking for ways to optimize. Worked with itersize and the loaders batch size, but looking for additional ways to tweak.

devon_seitz

06/29/2022, 5:03 PM

Your reply makes me think if having a lower batch/itersize + compute optimization is better than a higher batch/itersize

thomas_briggs

06/29/2022, 6:44 PM

I think the answer varies depending on the scenario. The postgres target, for example, buffers rows for all streams until there's 100k rows for a particular stream, then flushes that one stream. So if you are replicating a lot of tables with a lots of rows (which we are) then target-postgres will suck up gigs and gigs of memory.

thomas_briggs

06/29/2022, 6:46 PM

Replicating tables with a lot of rows is a different issue. I spent a bunch of time a couple months ago attempting to tune things in our environment and eventually concluded that a) the Python mysql client is slow, and b) the Python JSON parser is slow. More compute doesn't help, it's just slow. So we actually do initial setup outside meltano, then set state data appropriately and use meltano to replicate changes after that. Otherwise it would take us weeks to stand up a new environment, which is obviously not practical.

devon_seitz

06/29/2022, 8:18 PM

How did you migrate state? Copy the state file over?

thomas_briggs

06/29/2022, 8:28 PM

By tweaking the state data in the meltano database using SQL 😱 But there's a new

state

command coming (or already released? not sure) that should make that a little cleaner...

visch

06/29/2022, 8:29 PM

already released 🙂

devon_seitz

06/29/2022, 8:30 PM

jeez so big woof on me for not realizing but is state 100% controlled in the DB?

devon_seitz

06/29/2022, 8:31 PM

i was literally up last night trying to figure out how meltano maintains state after i rebuild my docker images from a clean slate

thomas_briggs

06/29/2022, 8:33 PM

It's stored in the DB when the job finishes, yeah. payload field of the job table IIRC

devon_seitz

06/29/2022, 8:33 PM

youre the best 😄

thomas_briggs

06/29/2022, 8:34 PM

That field actually contains JSON but JSON is ultimately just a string so if you're careful you can use SQL to manipulate it 😛

5 Views

Open in Slack

Previous Next