I’ve got two very similar runs happening in parall...
# plugins-general
j
I’ve got two very similar runs happening in parallel processes on the same machine (using dagster). The only difference between the two runs are the environment variables, otherwise I’m running the exact same tap, just targeting two different instances of mysql. I’m running tap-mysql with target-snowflake. When I run these two pipelines separately they run fine, but when they run in parallel (starting at the exact same time) I get this error:
Copy code
Cannot start plugin tap-mysql--foo: Catalog discovery failed: invalid catalog: Extra data: line 6544 column 11 (char 157140)
v
ahh yes I have hit this. Easiest fix is to run everything in a docker container and you'll be good to go. The issue (I believe) is that meltano accesses the catalog file at the same time as another job is modifying the file. It seems that sometimes this file can be in a state where it's not valid JSON. Not exactly sure why but I have it happen all the time as well. Docker just makes the problem go away as you're not both accessing the same catalog file.
j
hmmm. I suspected as much, the problem is that Dagster is already running this process in a container, so I’m not entirely sure it’s possible for meltano to kick off a separate container from the one already in play 🤷
a
Definitely sounds like a bug. Does this also happen if you run discovery first before kicking off both pipelines?
From reading the above, it sounds like both are writing out json text to the same file. If you run something like
meltano select tap-mysql --list
before invoking either, then I think (🤞) you'd start both of the other processes with the catalog already cached.
j
good question. Not sure I can test that in the current configuration because I’m using the
dagster_meltano
plugin which only supports running
meltano run
commands. I’m even struggling to get the
--log-level
set to
debug
.
Guess I have to wait for the
dagster_meltano
plugin to update its code first. I’m trying to migrate to using it because it has several advantages over my previous code, but now that I’m running into this, it might stop me until an update like that is possible …
my previous code didn’t have this problem oddly enough but it was calling subprocesses slightly differently
@aaronsteers I managed to force it to do a
meltano select tap-mysql --list
before running the other two, but unfortunately I got the same exact error
a
Separate containers or wiped execution workspace between runs perhaps?
That's very odd
If separate containers/directories, then I wouldn't expect the file conflict in the first place : 🤔
j
they are definitely not running in separate containers
I’m not sure how to confirm or deny the wiped execution workspace though 🤔
As an interesting follow up to this. I have a second dagster pipeline that is nearly identical to this one in that it runs 2 parallel meltano pipelines starting at the exact same time, the only difference is the tap itself. I get no errors when running this other tap-dynamodb based pipeline, but I do with the tap-mysql pipeline …
v
What I think I "know" about this issue is that it happens "randomly" its caused by something overwriting the catalog file, it happens when jobs are ran at the "same" time. Hard to debug as it doesn't happen all of the time. When I use different containers for each run I don't have this issue. One bandaid fix would be to have meltano create catalog files with
guid-properties.json
instead of just
properties.json
An easy fix for you (maybe?) Josh if you can would be to use a different tap name for each of the runs? Depends on how many of the same instance you need though! For me it is 100's so I didn't go that route