Heya! How do you guys manage meltano elt jobs wher...
# getting-started
j
Heya! How do you guys manage meltano elt jobs where the source lives in AWS account A and destination in B with no way for the two to talk across. In other words, the job will not reach either the RDS running in A OR Redshift running in B.
Copy code
AWS Account (A) [Some DB] <----- meltano elt -----> [DWH] AWS account (B)
If the container runs on account A, no problem to access [Some DB], but not able to access DWH and vice versa. I'd then think to have two meltano ELT runs.. one on each side [A, B]
Copy code
# Account A
[Some DB] <----- meltano elt #1 -----> S3

# Account B
S3 <----- meltano elt #2 -----> [DWH]
But is the way? It sounds like a lot of engineering for a rather simple problem.. 🤷 edit: Reasoning behind no comms between AWS accounts is not something I have much say in.
j
you can go over the public internet or create a vpc peering
j
you can go over the public internet or create a vpc peering
Indeed! Unfortunately neither of these options are feasible by decisions I can not influence. We are very locked down 😞
j
wouldn't any tool have this problem?
j
Indeed, @jacob_matson. Unless someone has some crazy idea how to run this differently, I'm leaning towards splitting the project in two and have S3 in-between 🤷 I suppose this sort of problem is not as common as you'd have some sort of network flow allowed e.g. either A to B or B to A via VPC peering as suggested with appropriate security groups.
j
I think what you are thinking about makes sense given the constraints.
you could manage it one project though with two jobs (two different sets of creds for S3)
j
Nah, it is the network access thats the blocker, not the auth 😞 With single project running on either A or B side, would not be able to reach the source / destination on other.
j
Oh right
a
@janis_puris - Can you add detail on the specific source and destination service?
Do I read correctly that is RDS in Region A as source owned by Account A, and Redshift in Region B as destination is owned by Account B? And is it a correct inference here is that the AWS administrators have locked down network ingress on both Accounts' VPCs? Generally Redshift connectivity is allowed from outside the VPC - because analytics happens from business users and applications. Can you confirm? And can I assume you have access to a distinct user account in both accounts?
Assuming the above is basically correct, I think you've got a coupld options. First option might be something like this: 1. Assuming RDS ingress is not allowed from outside Account A, your Meltano runner will need to be invoked from Account A. 2. The tap that connects to RDS (tap-postgres, for instance) runs using postgres creds. 3. The target-redshift connection is configured with AWS creds of User B. (Assuming Redshift access is not user/pass but AWS cred-based.) 4. You likely will need to write to a bucket in region B. (This part I'm less sure about, but I don't think target-redshift today is able to upload using S3 files in a different region from the cluster.) 5. You'll need to grant S3 write access on the bucket to user B, which is running the upload.
The above assumes that Redshift access is allowed from outside the VPC, but RDS access is not. Would have to be adjusted if those assumptions are not right. Hope it helps!
j
Thank you @aaronsteers!
Generally Redshift connectivity is allowed from outside the VPC - because analytics happens from business users and applications. Can you confirm?
You bring up a very valid point here! Indeed this is something that would typically allow, however as it is now, we intend to run our BI stack self managed within the Account B (this is where DWH lives). Basically we'd not expose DWH to public network. However (!), after a sit down with the Ops, we've managed to reach an agreement where we'll have VPC peering between [A, B], then
meltano elt
will run close to source (Account A). The compromise here is that the network flow will be tightly locked down and connections will need to be init from Account A (which is where the meltano container runs) I'd love to go into more detail why we have such fencing in place, but unfortunately I can not.. regardless, there is a very good reason behind it. Anyhow, below is rough illustration on how this would look like. Previously the EC2 running meltano containers would run on Account B.
a
Nice! So it sounds like you have a path forward? Let us know how it goes! 🤓