Is anyone actively working on target-athena? I ne...
# singer-tap-development
f
Is anyone actively working on target-athena? I need to make some changes and want to be sure they are acceptable for a merge. 1. version in pyproject.toml is 0.0.1, while latest tag is 1.4.0. I'd propose: a. bump it to 1.5.0 as a non-breaking change with added features b. or maybe 2.0 as I'd like to make breaking changes that would change the default location of files, and give options 2. Add Gitlab CI - I know it's on github, but we use gitlab 3. The naming for the files in S3 a. has a lot to be... desired. i. get_target_key in utils.py doesn't do anything with naming_convention ii. passing naming_convention to get_target_keys is actually commented out iii. As far as I can tell data_location with a completely different (static) pattern for the S3 object (bucket plus key) is used as a prefix for the target_key b. So change it so that: i. The actual real prefix (everything before the last / in the key) is configurable, might be able to make the default match what is current ii. The filename for the object (everything after the last / in the key) is more configurable, might be able to make the default match what is current The goal essentially would be to implement issue #15. The most important part is that any changes are integrated, as we do not want to maintain our own version.
t
@pat_nadolny is actively using it and @aaronsteers wrote it, so I’d say yes - they’d be better positioned to give feedback more than me
p
@fred_reimer yes I'm using target-athena and just merged in a fix last week! I think the mismatch in version came from the fact that its a fork of target-s3-csv originally, syncing that up seems like a great idea. I didnt know it was possible to use gitlab ci for a github project but looks like it is, definitely adding CI is great. I havent dug in too deep on the current state of the naming conventions but what you listed makes sense to me.
f
Well, the gitlab CI won't work in github, but when I push a branch I'm working on to our internal gitlab (or the public gitlab.com) it would 😉 I'll be submitting a PR in a bit.
p
@fred_reimer Would you be interested in becoming a maintainer for this repo based on our new MeltanoLabs ownership model? cc @taylor
f
Possibly. I'll have to see how much time I have.
All the CI does currently is build the package and upload it to the package registry (defined on the Gitlab server the CI is running on). Not sure if that would be useful. However, if the gitlab.com server allows use of the package registry, the Gitlab project could be created under a Meltano group, and then users could point an extra-url to that location to pick up published, Meltano curated or maintained packages for taps and targets. That's kind of what we are doing, but using our internal gitlab server for internal proprietary taps. That said, it would be interesting to expand on the CI to include the security checking and other CI tools available on gitlab...
a
@fred_reimer - yes, we'd happily accept issues and PRs on those items. To Pat's note, we'd also welcome a level of support, post-merge of any submitted MRs. And for future MRs from other contributors, we might invite you to review as well. As noted also above, we'd want to get CI tuned up and in a good place, and maybe even add an auto-PyPi publish step to streamline ongoing release / maintenance steps.
a
woo!!! More maintainers would be great!
@fred_reimer - it’s useful to be aware of the ‘archaeology’ behind this target.. as mentioned it was originally forked from pipelinewise-target-s3-csv with the Athena metastore DDL sorta bolted onto that. Hence why some things (like the pyproject.toml) might not make complete sense. I think it was also probably one of the earlier examples of a non-sdk target being converted to an sdk target.
My general 2 cents is that things like file formats used (csv vs json vs parquet vs orc) and some kind of flexible-but-not-too-flexible s3 key naming convention are topics that could use a few different perspectives to help illuminate.
f
Yea, I noticed that it is not using the Meltano SDK also, and is using singer directly. I am thinking long-term we will have to redo everything in Metlano's SDK, integrate all of the partitioning, and call it a 2.0. I'm not sure small steps can be taken for this one.
a
You’ve read my mind 🙂
Would anyone want to participate in starting to jot down some ‘v2 requirements’ somewhere ?