<@U06BXFDNKJT> related to your Athena issues you m...
# troubleshooting
p
@lukas_gust related to your Athena issues you mentioned in #C01QM86B83A 🧵
Heres some info I sent to someone recently that might help you on your Athena journey. We dont have a ton of Athena users so the plugins around that could use a little love but I was using them successfully for our internal data team for a decent amount of time. I used both tap and target athena for a while before we switched over to snowflake. Heres our meltano.yml right before we migrated to using the snowflake connectors: • target-athena config - I was pinning to a specific branch but that was merged into main so it should work without that pin. A quirk that I think still exists is that the CSV object format has a bug with column sorting so using JSONL is suggested. • tap-athena config - I think when I started using this tap there was a work in progress branch migrating to the SDK that I got working in this branch. I dont remember having any issues after that. Additionally these are now listed on MeltanoHub so theres no need to include the long list of capabilities/settings like I had them. One other thing to note is that theres also target-s3-parquet that should work with Athena as well but I havent tried it myself (see my issue related to Athena).
l
Thank you! I think currently I'm just wrestling with local python versions, I'm just being lazy on that front TBH, we use docker to actually deploy so I've just gotta get it going in a container. The ZenDesk extractor is more problematic in that it isn't super actively maintained see my PR here https://github.com/twilio-labs/twilio-tap-zendesk/pull/19.
p
Related to Zendesk I found another variant that looks to be a hard fork with and I think it has the fix youre looking for. I created a PR to add it to the hub. Once that merges you can run a
meltano remove extractor tap-zendesk
then
meltano add extractor tap-zendesk --variant hotgluexyz
to get the variant I mentioned
@Matt Menzenski what variant are you using for zendesk and do you have any insights for us? I found a hotglue variant that looks like its been maintained more recently but it looks like they disabled incremental syncs for some reason. The way I'm interpreting is that singer-io was hard forked by twilio-labs (at
1.0.11
release) then maintained it from there, then hotglue hard forked singer-io (at
1.7.5
release) also and has their own fixes and improvements most recently 2 weeks ago.
Theres also https://github.com/Pathlight/tap-zendesk thats a fork of singer-io and is 116 commits ahead 🤔
m
We are using twilio-labs variant
twilio-tap-zendesk==1.0.13
p
And you've had good luck with it so far, right? Any weird quirks to be aware of. Theres also a desire to have an SDK based version in meltanolabs but I'm not sure when that will get prioritized
m
Mostly it’s been fine for us. The big issue for us has been that it doesn’t support all of the APIs we’d like it to - the Brands API specifically would be really valuable.
l
Thank you for the info
m
we are not
l
I'm currently working around it by installing my forked version, I'd really prefer not to have this solution, but it's not bad. @pat_nadolny back to Athena related troubleshooting. The target-s3-parquet loader doesn't seem to support AWS profiles and requires the AWS keys which we do not use. We use aws sso locally and our k8s pods are configured to just use an assumed role. If need be we can get temp creds using the assumed role. That being said what is the auth mechanism used in the target-athena loader?
Here is a really poor example of how I have some custom scripts running in k8s and locally with profiles for context.
Copy code
try:
    session = boto3.Session(profile_name=os.environ.get('PROFILE_NAME'))
except:
    session = boto3.Session(region_name=os.environ.get('AWS_REGION_NAME'))
u
l
Yeah that's what I was looking at as well. target-s3-parquet would work only if the access keys were not a required config
u
I personally never used target-s3-parquet but did use target-athena. Theres a few quirks with it but it works
u
Also for what its worth I wrote this boto connector base class that I'm trying to get into the SDK that should standardize how all of these AWS connectors do auth https://github.com/MeltanoLabs/tap-dynamodb/blob/main/tap_dynamodb/connectors/aws_boto_connector.py
l
Yeah that would be a good addition for aws connectors
Oh and it will work for my local run where I have a profile, but it wont run in k8s where it uses an assumed role provide, but I haven't been able to test that to absolutely sure, but reading the code it looks likely. https://meltano.slack.com/archives/C01TCRBBJD7/p1682629653977839?thread_ts=1682543632.246059&amp;cid=C01TCRBBJD7
p
Yeah that sounds right to me. A couple thoughts: 1. Since target-athena is in meltanolabs we have a lot of control to get PRs reviewed and merged quickly and I think this is a very minor update. 2. The SDK now has a SQL sink base class (using SQLAlchemy) which might make the athena target a lot simpler. For example using the SDK a sqlite target is 58 lines of code or I think target-snowflake is a more complex/optimized example
l
#1: for sure! I actually have some ideas on how to get around it in the short term. That leads to #2 which would require a bit more learning on my end if I were to contribute.
p
Yeah for sure - a short and a long term solution
@lukas_gust FYI I cleaned up tap-athena a bit and implemented the SDK's SQL connector pattern, got the tests passing, etc. It should be much more stable now!
l
Nice!! I really appreciate the work!
it was working as expected when I was testing it a couple weeks ago.
n
@pat_nadolny - I’ve circled back to working with tap-athena. There are two issues that would be helpful to add in and was wondering what you recommend as the best approach to proceed. • I don’t see a functionality for incremental key-based replication (perhaps i’m missing something that’s already baked in by the SDK) • Our use case requires switching AWS roles where security tokens are session-based. I’d like to be able to include the
aws_security_token
in the connection string as [implemented by PyAthena](https://github.com/laughingman7743/PyAthena#credentials) I’m working on a fork of
MeltanoLabs/tap-athena
but am new to tap development. I’m going to see how far I can get in addressing the two items above to submit a PR, but was wondering if you had any insights that might be relevant for addressing or tracking the issues above.
l
@niall_keleher curious about your role switching use case? Are you doing this during runtime?
n
Not exactly. We’re not switching during run time but want to pass the appropriate role credentials to tap-athena. What I’m trying to do is assume a role such that I can access data with specific credentials. I’m able to set
aws_access_key_id
,
aws_secret_access_key
,
aws_region
, and
s3_staging_dir
using env variables that are specific to the role I want to use, but when I run
meltano run tap-athena target-{x}
, I get at token error
The security token included in the request is invalid.
l
Makes sense. You should be able to set the
aws_session_token
too as an env var?
u
@niall_keleher check out https://github.com/MeltanoLabs/tap-dynamodb I added support for basically all aws auth methods and have an open PR to add it to the SDK but its been stalled for a while https://github.com/meltano/sdk/pull/1655. You can probably borrow a lot of that code. Related to incremental replciation it should work out of the box with sql stream, see the SDK default impementation https://github.com/meltano/sdk/blob/aece0a3851398b457650911e82cdcdda6f8949a6/singer_sdk/streams/sql.py#L190
n
Lukas - you're right about setting the env var. I suspect Pat's handling of auth methods will help to pass the variables for the tap-athena client. Pat, I'll take a look at tap-dynamodb. Thanks for the pointer!