I'm curious how everyone is doing CI/CD / testing?...
# best-practices
w
I'm curious how everyone is doing CI/CD / testing? Specifically I would like to run a pipelines in QA with only samples of data from my sources (mssql/postgres) to ensure things are working. I'm currently using mappers to filter data and I also know I could modify states to accomplish only a portion of the data - but seems like there could be a better way. Any thoughts or suggestions?
đź‘€ 3
p
Check out our internal meltano project https://github.com/meltano/squared/blob/main/.github/workflows/test.yml. The approach that’s worked pretty well for us is to define start_dates that look for only 1 day or in some cases a few hours of data. This allows us to basically run all EL and dbt transformations in a reasonable amount of time. It’s not perfect because it’s a subset of data but it’s been pretty good so far. There’s also an SDK issue related to configuring taps to only sync x records which would do a better job that’s our loose start date approach.
w
ya - makes sense... my problem is I often have tables that have data that haven't changed in a while combined with frequently changing tables
and trying to avoid modifying lots of taps / some of my taps don't have start date parameter available. Would there be interest in a PR for something like this as part of the core meltano code rather than handling per tap or in SDK? My team has kind of a hacky version working we could clean up into a PR.
p
oh yeah that makes it a bit trickier then. I opened an issue in the SDK a while ago. Contributions are always welcomed! cc @Edgar RamĂ­rez (Arch.dev)
đź‘€ 1
e
I'll second what Pat said, and that I'm happy to review a PR for this. There's a stale PR that started to address some form of this, but I'm happy to break it up into smaller efforts.
b
@Edgar RamĂ­rez (Arch.dev) I think I can see a way to limit the amount of data written to the target system via the
Target
and
Sink
but I don't see a way tell the
Tap
to stop sending data from a
Stream
once you hit that limit.
e
Ah you're right. The request is about limiting tap output, and the PR is about handling record batches.
w
Cool - I’m working with a colleague Dave and we will make an attempt at a PR!
❤️ 1
fyi - still working on this ^
👍 1
@Pat Nadolny (Arch) @Edgar RamĂ­rez (Arch.dev) https://github.com/meltano/meltano/pull/8364
đź‘€ 1
❤️ 1