Hi All - Is there a way to stop Meltano calling <h...
# troubleshooting
t
Hi All - Is there a way to stop Meltano calling https://hub.meltano.com when you do meltano run <tap> <target> ? I'm trying to run meltano in an air gapped environment and it's not working. I'm getting:
HTTPSConnectionPool(host='<http://hub.meltano.com|hub.meltano.com>', port=443): Max retries exceeded with url: /meltano/api/v1/plugins/extractors/index (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fdc95816820>: Failed to establish a new connection: [Errno 110] Connection timed out'))
You can get Meltano to stop calling out to discovery.meltano.com by setting the env variable to false. but you can't do this for the meltano_hub_url env which is what the above error is relating to... The docs say "This manifest is primarily used by
meltano discover
and
meltano add
. It is also used in cases where the full plugin definition is needed but no lock artifact or cached
discovery.yml
is found." So perhaps our meltano container doesn't have a local copy of the discovery.yml - I am actively exploring this now but would appreciate any other advice people have.
p
Hey @tom_saunders if you run the lock command it will keep a local copy of what its getting from the hub api in your project repo that you can check into version control, I believe if those are present then it wont make a request to the hub api. This is a good practice for production also so that you have a stable copy of those plugin definitions and they only change when you want them to.
t
Thanks Pat let me give that a go!
p
Also I should have asked, what version are you on?
t
2.8
p
ok yeah then what I said should be true
w
t
It seems to try and communicate with meltano hub when you run meltano lock so that command isn't working for me at the moment.
w
You'd have to run it outside of the air-gapped environment, then commit the lockfiles to the project repository.
t
ok understood. How easy would it be to create our own lockfiles in the air gapped env? I'm a little against the clock at the moment so just interested if there is a quick / hacky solution.
w
While strictly speaking that should be possible, that seems impractical. I suppose the project files (particularly
meltano.yml
) cannot leave the air-gapped environment?
p
you could run the lock commands locally then copy them into your air gapped deployment for a quick fix
t
I'll work something out. Once we get the lock command working and have got the changes in the air gapped env I'll let you guys know if it worked. Thanks for your responsiveness
That didn't work.
same error
p
Are you sure theyre all there in the expected folder structure? See https://github.com/meltano/squared/tree/main/data/plugins/extractors as an example of mine. I just tested locally a 2.8.0 version with no internet and I was able to run mine then altering the lock file name caused the same connection error youre getting, so it is working as I expect i.e. if lock file exists skip the request
t
It looks like this:
This was achieved by running meltano lock --all in a non-air gappede env
p
hmm yeah that looks correct to me šŸ¤” . Which one are you running when you get the error message?
t
oracle and postgres
p
Would you be able to share your meltano.yml (with sensitive stuff removed)? I wonder if theres anything unexpected thats confusing meltano. @edgar_ramirez_mondragon any ideas?
t
I might be able to share it after taking table and columns out but I'll share this snippet to illustrate our tap plugin install method:
plugins:
extractors:
- name: tap-oracle
pip_url: -e ./xxxx-tap-oracle
config:
host:
port:
user:
password:
sid:
filter_schemas:
select_filter:
We pulled the repo for the tap-oracle and made a change to allow the tap to pull blobs... This could be a cause of the issue I guess as I read that the lock command workd on non-custom plugins šŸ¤·ā€ā™‚ļø
p
That still looks fine to me, it would still consider that a non-custom plugin even though you altered the pip url
In the local project where you ran the lock command, have you tried running anything? Maybe with internet disabled to replicate the issue. I'm not sure what could be causing this
t
At the moment we don't actually have a development environment with internet. We never thought we had internet whilst we were developing and running everything and then when we deployed to prod we saw the above error. That means our dev/test servers did actually have internet whilst we were developing / testing but this has now been fixed and we don't actuallt have internet now. The packages and dependencies get added / installed during a terraform build and once built our dev/test servers don't have internet. To get the lock -all to run I just asked a platform engineer to temporarily allow the traffic before he switched it back. The pipeline did not run even after the lock was added and internet was switched back off...
a
@tom_saunders - It may be worth opening a bug with attached log when --log-level=debug is specified. While Meltano shouldn't need to reach out to the Hub after the files are locked, there may be some code that is retrieving the definition anyway "in case" it were needed. That said, if you've already locked the files, and Meltano is still reaching out to the Hub, that sounds like a bug to me. Do you mind opening something in our GitHub issue tracker? https://github.com/meltano/meltano/issues
w
CC @edgar_ramirez_mondragon
e
Yeah, I started a PR that I might need to retake: https://github.com/meltano/meltano/pull/7097
t
It looks like it could be that bug for sure. How did @pat_nadolnyget Meltano working without an internet connection though? Or does the bug only surface in certain scenarios?
a
@edgar_ramirez_mondragon - That scope seems broader than what we'd be fixing here, since I think for this context, we'd just want to make sure that locked plugins are not requiring extra lookups. I dropped this just now into the issue also as a comment, but I just want to be careful to still allow the Hub to serve plugin definitions when the project does not already have lockfiles created.
Above, I've logged a new issue for this specifically. I've also linked to #7095 and #6754 as related.
@tom_saunders - If you don't mind posting an excerpt of your log files in the above issue, it could help us in debugging. Ideally, if you can use
--log-level=debug
to help us pinpoint the part of the code where the reach-out is occurring.
t
Sure thing šŸ‘
a
Great, thanks! And definitely in Meltano v3.0, we will require all plugins to be locked before upgrading - so this will be much less of an edge case. (3.0 is still a ways off, but just wanted to put that out there.)
t
Can I just check something. Is it possible to run Meltano without internet access with the current version(s)?
Just trying to work out whether my issues is just a result of the set up / config or whether it is just a fundamental issue at the moment
a
Can I just check something. Is it possible to run Meltano without internet access with the current version(s)?
If everything is working as expected, and all plugins have lock files, and if telemetry is turned off, then yes. In theory, at least. This future feature would make that formally a tested and official capability.
t
It's getting a bit late in the uk so I'm logging off for today but will add some logs to the bug tomorrow.
c
Does setting
export MELTANO_DISCOVERY_URL=false
still apply as a possible workaround for this problem?
t
No, that was the first thing I tried. But you say "still work" does that mean if we were to go back to a certain version of meltano it would work?
What is the best way to provide you with the logs, I have tried
meltano --log-level=debug run tap-oracle target-postgres &> logfile.txt
But this file is horrible to read.
a
Hi @tom_saunders getting something live is never easy is it! Dropping a couple of ideas here in case you need help / inspiration. (We're also in the UK, be great to meet sometime!) • I guess you have the telemetry switch off already, but thought it would be worth dropping that here in case you bump into that: https://docs.meltano.com/reference/settings#send_anonymous_usage_stats • looks like the Meltano guys have spotted a bug in the hub functionality, and are looking at that, but I had a couple of workaround ideas 1. I wonder if setting the hub URL to an empty string would disable the request? https://docs.meltano.com/reference/settings#hub_url 2. I wonder if you could run up a local instance of the hub and point the hub URL to that. Either running the hub itself: https://github.com/meltano/hub or our Matatika Community Edition is a locally hosted private hub https://github.com/Matatika/matatika-ce (turn off a couple of services and it would be pretty light on resources)
t
Thanks @aaron_phethean - We did try setting the hub url to empty / false and it didn't work - it tries to validate the hub url regardless. We had thought about potentially creating our own API endpoint to mimic the behavior because we didn't realise the hub codebase was avasilable, that is interesting but probably not viable for us in our timescales and environment constraints.
a
Anything is worth a shout when you're in a go live bind!
t
I just want to let everyone know that I THINK I've found the issue. I shared a section of my meltano.yml earlier and it looked like this:
plugins:
extractors:
- name: tap-oracle
pip_url: -e ./xxxx-tap-oracle
config:
host:
port:
user:
password:
sid:
filter_schemas:
select_filter:
Well I ran a clean install on a fresh box and installed the extractor the normal way and this pipeline ran without internet so that was a relief. I noticed that when installing with
meltano add extractor tap-oracle
the meltano.yml looked a bit different, so I added the variant parameter like
plugins:
extractors:
- name: tap-oracle
variant : s7clarke10
pip_url: -e ./xxxx-tap-oracle
config:
host:
port:
user:
password:
sid:
filter_schemas:
select_filter:
and it worked. I'll hold off on pouring my a whiskey until it has run in our pre-prod env but I am in a much better mood than yesterday. šŸ˜‚
a
Wow - that's great to hear. Sounds like the behavior of reaching out to the hub is specifically when
variant
is missing or not provided... šŸ¤” That makes sense in a way because
variant
is part of how the lock file is found... but without a variant we would never find a match on the Hub, so the lookup to the Hub is still probably not worth running.
t
Yeah - I agree. I think this means that the bug you raised isn't really a bug šŸ¤·ā€ā™‚ļø.
a
Replaced by a different bug logged šŸ™‚ https://github.com/meltano/meltano/issues/7305