Is anyone actively working on target athena I need to make s Meltano #singer-tap-development

Is anyone actively working on target-athena? I ne...

fred_reimer

11/03/2021, 7:53 PM

Is anyone actively working on target-athena? I need to make some changes and want to be sure they are acceptable for a merge. 1. version in pyproject.toml is 0.0.1, while latest tag is 1.4.0. I'd propose: a. bump it to 1.5.0 as a non-breaking change with added features b. or maybe 2.0 as I'd like to make breaking changes that would change the default location of files, and give options 2. Add Gitlab CI - I know it's on github, but we use gitlab 3. The naming for the files in S3 a. has a lot to be... desired. i. get_target_key in utils.py doesn't do anything with naming_convention ii. passing naming_convention to get_target_keys is actually commented out iii. As far as I can tell data_location with a completely different (static) pattern for the S3 object (bucket plus key) is used as a prefix for the target_key b. So change it so that: i. The actual real prefix (everything before the last / in the key) is configurable, might be able to make the default match what is current ii. The filename for the object (everything after the last / in the key) is more configurable, might be able to make the default match what is current The goal essentially would be to implement issue #15. The most important part is that any changes are integrated, as we do not want to maintain our own version.

taylor

11/03/2021, 7:55 PM

@pat_nadolny is actively using it and @aaronsteers wrote it, so I’d say yes - they’d be better positioned to give feedback more than me

pat_nadolny

11/03/2021, 8:44 PM

@fred_reimer yes I'm using target-athena and just merged in a fix last week! I think the mismatch in version came from the fact that its a fork of target-s3-csv originally, syncing that up seems like a great idea. I didnt know it was possible to use gitlab ci for a github project but looks like it is, definitely adding CI is great. I havent dug in too deep on the current state of the naming conventions but what you listed makes sense to me.

fred_reimer

11/03/2021, 8:46 PM

Well, the gitlab CI won't work in github, but when I push a branch I'm working on to our internal gitlab (or the public gitlab.com) it would 😉 I'll be submitting a PR in a bit.

pat_nadolny

11/03/2021, 8:48 PM

@fred_reimer Would you be interested in becoming a maintainer for this repo based on our new MeltanoLabs ownership model? cc @taylor

pat_nadolny

11/03/2021, 8:49 PM

yeah I was just checking out these docs https://docs.gitlab.com/ee/ci/ci_cd_for_external_repos/github_integration.html

fred_reimer

11/03/2021, 8:49 PM

Possibly. I'll have to see how much time I have.

fred_reimer

11/03/2021, 9:01 PM

All the CI does currently is build the package and upload it to the package registry (defined on the Gitlab server the CI is running on). Not sure if that would be useful. However, if the gitlab.com server allows use of the package registry, the Gitlab project could be created under a Meltano group, and then users could point an extra-url to that location to pick up published, Meltano curated or maintained packages for taps and targets. That's kind of what we are doing, but using our internal gitlab server for internal proprietary taps. That said, it would be interesting to expand on the CI to include the security checking and other CI tools available on gitlab...

aaronsteers

11/03/2021, 9:31 PM

@fred_reimer - yes, we'd happily accept issues and PRs on those items. To Pat's note, we'd also welcome a level of support, post-merge of any submitted MRs. And for future MRs from other contributors, we might invite you to review as well. As noted also above, we'd want to get CI tuned up and in a good place, and maybe even add an auto-PyPi publish step to streamline ongoing release / maintenance steps.

andrew_stewart

11/03/2021, 9:40 PM

woo!!! More maintainers would be great!

andrew_stewart

11/03/2021, 9:48 PM

@fred_reimer - it’s useful to be aware of the ‘archaeology’ behind this target.. as mentioned it was originally forked from pipelinewise-target-s3-csv with the Athena metastore DDL sorta bolted onto that. Hence why some things (like the pyproject.toml) might not make complete sense. I think it was also probably one of the earlier examples of a non-sdk target being converted to an sdk target.

andrew_stewart

11/03/2021, 9:51 PM

My general 2 cents is that things like file formats used (csv vs json vs parquet vs orc) and some kind of flexible-but-not-too-flexible s3 key naming convention are topics that could use a few different perspectives to help illuminate.

fred_reimer

11/03/2021, 9:52 PM

Yea, I noticed that it is not using the Meltano SDK also, and is using singer directly. I am thinking long-term we will have to redo everything in Metlano's SDK, integrate all of the partitioning, and call it a 2.0. I'm not sure small steps can be taken for this one.

andrew_stewart

11/03/2021, 9:53 PM

You’ve read my mind 🙂

andrew_stewart

11/05/2021, 8:36 PM

Would anyone want to participate in starting to jot down some ‘v2 requirements’ somewhere ?

2 Views

Open in Slack

Previous Next