Cookie cutters for Targets should be more Test Driven by des Meltano #singer-target-development

Cookie cutters for Targets should be more "Test Dr...

visch

06/20/2023, 8:33 PM

Cookie cutters for Targets should be more "Test Driven" by design to drive folks to implement and produce tests that pass in ways that "make sense". The current Suite of tests doesn't do this the best (or maybe it does and I mis understand) as it's "hidden" in the SDK and not part of the code generated by the cookie cutter itself. (Not saying the Suite of tests isn't nice to use just trying to write about this feeling I've had while writing targets and shying away from using the Suite to write tests) A use case that made this clear to me today is what if you have source data that has a schema with type of

["object", "array", "string", "number", "boolean"]

, what should the target do? And the answer really is "it depends" and it depends on the target. Some targets should fail as they can't pick something suitable, some should cast the data as a

jsonb

object. To get target developers to think about this kind of thing properly, I think a single test that fails by default (raises a NotImplemented error or something in the cookiecutter) would drive folks to think about this the way we'd expect, the target dev can then decide to just skip it for now, or pick something that makes sense for them. The way to handle this so that we can test all targets (Instead of in the SDK), is a suite of tests that is "just"

jsonl

data (ie

.singer

) files that you run against all the targets that make sense targets can run against all "types" of data we'd expect to work with. The suite of tests in the SDK may make sense for this as well I'm not sure.

pat_nadolny

06/20/2023, 9:22 PM

@visch thanks for sharing your thoughts! I spent a lot of time in the last couple weeks implementing tests for target-snowflake using the test framework and I have some similar thoughts/feelings. Part of the challenge initially I think is just that the tests are relatively new so theyre almost no documentation or examples so I had a hard time figuring out how exactly I should use them. Some thoughts: 1. We should document the recommended way of using them 😄 2. The standard target tests are great but I agree that they shouldnt pass until the developer implements them. Ken and I have talked about this a bit recently but its a little misleading to see them pass when there arent any concrete assertions other than the fact that exceptions arent thrown. No exceptions thrown is a valid assertion but I did find a few bugs in target-snowflake once I started asserting row and column count/types/column names/etc. for specific cases (e.g. target snowflake was not handling duplicate records within a single batch). 3. The test data and standard tests in the SDK are nice because they provide a wide variety of test data with edge cases that I probably wouldnt be able to come up with myself. I was somewhat hesitant to use them without making a direct copy though because any change to the test data in the SDK will likely break my tests because I have strict assertions on top of that data i.e. asserting column types and names for every column. 4. Again the test data is great but in order to implement assertions I needed to dive into the SDK and test files to see what the input was before I could write my expectations for the output. Some users who are less familiar parsing singer jsonl files will struggle with this. I'm not sure exactly how to improve that, maybe default assertions that users can customize? Idk

Henning Holgersen

06/21/2023, 7:36 AM

Absolutely agree, I copied the test suite from target-postgres early on, and it has served me really well - easy to extend with new cases, really lets me do TDD if I want to. But it is too simple. It has really fooled me a couple of times, even one time when I was manually checking that the target tables ended up with rows, someone pointed out that all the values were NULL... The new tests in Snowflake are great though, I will be stealing them. I have actually been toying with the idea of including great-expectations as a dev-dependency so that I can assert things like "column contains nulls", "column contains the value 'A'", etc. But given how far Pat has come, maybe a small utility library is the better way to go? Veering slightly off topic, it just occured to me that we can use tests as a way to automatically describe functionality: Does a target handle the strange

["object", "array", "string", "number", "boolean"]

example or does it throw an error? Does a target cast json to string? I'm sure we could create a number of tests that were informative for users.

Open in Slack

Previous Next