Hey I have a job hosted on a GCP VM running within...
# troubleshooting
m
Hey I have a job hosted on a GCP VM running within a docker container. It’s scheduled with pythons APScheduler (could probably use something more mainstream, but this has usually worked fine). When running on small data (configured with meltano.yml) it runs without issue, however when running the EL job on the full dataset, this job appears to continue “running” when looking at
htop
but looks like it’s hanging indefinitely. The data does not get populated in bigquery and when running
sudo docker-compose up --build > logs/docker_logs_$(date +%s).log
the logs stop populating after a few hours. Any idea what could be causing this?
v
By "running" if you could share the command you're using. Generally in meltano you "run" a tap/target combo so there'd actually be 3 processes spawned, 1. meltano 2. tap-whatever 3. target-whatever the target waits on data from stdin from the tap (meltano process sits in between passing data between the two) Long way of saying it's probably something up with the tap process, i'd try running
meltano invoke tap-whatever
and see if you get data or if it times out
m
I’m running
meltano el tap-yfinance target-bigquery --state-id meltano-tap-yfinance-dev
In
host.py
I have:
Copy code
@app.route('/financial-elt/yfinance/tap-yfinance-dev', methods=['GET'])
def tap_yfinance_dev():
    with app.app_context():
        run_command = f'meltano --environment=dev el tap-yfinance target-bigquery --state-id tap_yfinance_dev'
        shell_command = f'cd {os.path.join(app.root_path)}; {run_command};'
        subprocess.run(shell_command, shell=True)
        return make_response(f'Last ran project tap-yfinance-dev at {cur_timestamp()}.', 200)

scheduler.add_job(tap_yfinance_dev, trigger='cron', **cron, jitter=120)

if __name__ == "__main__":
    serve(app, host=HOST, port=PORT, threads=2)  # waitress wsgi production server
/home/melgazar9/meltano_projects/tap-yfinance
v
There's a lot of context missing (Docerk, GCP, VM, etc) and things that aren't related to meltano.
meltano --environment=dev el tap-yfinance target-bigquery --state-id tap_yfinance_dev
is the only part that's related to Meltano here. From a quick look at your file I"d say it looks like a webapp to me so you'd expect the job to run indefinitely, but I"m sure my answer is coming because of the lack of any more context . If I was debugging from where you were at right now I"d start from can I get
meltano --environment=dev el tap-yfinance target-bigquery --state-id tap_yfinance_dev
running directly?
m
Hey @visch yep that last command is what I'm running (without the state id for debugging). I cannot see any meltano logs in GCP cloud logging but I can see other python logs when using the
google-cloud-logging
library
v
I can't spend the time to help debug outside of Meltano land on this, if you can give some specefics maybe but this is what I have time for 😮