I'm struggling a bit with getting state working pr...
# singer-tap-development
p
I'm struggling a bit with getting state working properly with the SDK based tap-google-analytics. These are dynamic streams generated based on a report definition file provided by the user. The old version doesnt support state so my first iteration was to add ga_date as the replication key when its included in the dynamic report, otherwise state isnt kept and a warning logged. I tried to implement that here but when I run it in meltano it cant find the saved state even though I can see it in the DB.
running the tap manually outside meltano creates a state file like
Copy code
{"bookmarks": {"daily_active_users": {"replication_key": "ga_date", "replication_key_value": "20211103"}, ...}}
as a sanity check I ran tap-gitlab to see what its state message looks like in the meltano DB and I got something like this
Copy code
{
  "singer_state": {
    "project_foo": "2021-11-02T00:00:00Z"
  }
}
in comparison to what my tap-google-analytics is saving:
Copy code
{"singer_state": {"bookmarks": {"daily_active_users": {"replication_key": "ga_date", "replication_key_value": "20211103"}...}}}
v
How are you running meltano? Do you have a job_id in the
elt command
? (I'd assume all of these things are true but thought I'd ask šŸ˜„ ) To check does
meltano elt tap-google-analytics target-athena --job_id=123 --dump=state
return anything? I've only had any issues with this when I forget about a job_id fwiw
p
thanks for the help @visch - I'm running it with
elt
and when I connect to my system db I can see the job and its state in there. I just see a difference in the state format of other taps where state works vs my state messages. That
dump=state
command I get
Could not find state file for this pipeline
but I see it in the DB in the last successful run
{"singer_state": {"bookmarks": {"events": {"replication_key": "ga_date", "replication_key_value": "20211103"}}}}
@aaronsteers does that ^^ state message look weird to you? For something in the system db
a
@pat_nadolny - I think
{"singer_state":
is a wrapper around what looks like a valid state:
{"bookmarks": {"events": {"replication_key": "ga_date", "replication_key_value": "20211103"}}}
. To @visch’s point, can you confirm in you are using a job ID in all invocations?
The obvservation that --dump=state is not returning the value, even though you do see a value in the db, makes me think a disconnect of job id
p
@aaronsteers hmm I just tried again with a new job_id:
Copy code
meltano elt tap-google-analytics target-jsonl --job_id=new_job
meltano elt tap-google-analytics target-jsonl --job_id=new_job --dump=state
and the output is
Could not find state file for this pipeline
and the DB record is:
Copy code
"id","job_id","state","started_at","ended_at","payload","payload_flags","run_id","trigger","last_heartbeat_at"
112,new_job,SUCCESS,"2021-11-05 16:29:07.722740","2021-11-05 16:29:13.466263","{""singer_state"": {""bookmarks"": {""events"": {""replication_key"": ""ga_date"", ""replication_key_value"": ""20211104""}}}}",1,"156cfe1940d841acb068bb4e3ceafcc5",cli,"2021-11-05 16:29:13.064889"
a
I'm not sure how to read the DB record output. Could be too much or too little going to STDOUT from the target. If that's the case though, even the Meltano error message seems misleading.
p
I updated my message above so the db record is csv with headers if that help 🤷. I did find something else interesting in the --dump=state debug logs
Copy code
FileNotFoundError: [Errno 2] No such file or directory: '/Users/pnadolny/Documents/Git/GitLab/squared/data/.meltano/run/elt/new_job/a99bab1f-5bd6-4c14-b1c8-d8fa7502e3f2/state.json'
and that directory exists but not the file. Seems like the --dump=state message doesnt care about whats in the db only the directory? I did a full reinstall of meltano with no luck. Those directories are just empty every time. tap-gitlab target-jsonl does create those directories and populate with a state.json file 🄓 . I can create an issue if thats helpful but it feels like it has to be related to my tap since tap-gitlab is working
a
tap-gitlab target-jsonl does create those directories and populate with a state.json file
Bizarre! And thanks for adding the CSV headers; that helps a lot. Just to confirm - can you run your tap (and compare with the other sample tap) by running in this manner:
tap-google-analytics | target-jsonl > stateout.json
After that, I think it's probably time to log as an issue.
p
I know right?!? I still get
{"bookmarks": {"events": {"replication_key": "ga_date", "replication_key_value": "20211104"}}}
when using the tap directly. I tried tap-stack-exchange and it had the same behavior, good state in the DB but no file in the .meltano/run/elt/ folder. I'll create an issue
v
Crazy if it's a Meltano bug šŸ˜„ it'll be fun to figure this one out! @pat_nadolny I'm pretty certain the file is only created for the run, so the file gets deleted (you'd expect to not see data there, unless there's currently a job running) Could be a new bug with all the logging introduced maybe there's a test missing or something 🤷 , I haven't tried with the new versions of Meltano maybe something is up? If it's not a bug with Meltano then the only other two things I could guess from environment would be 1. Permissions issues 2. Internal DB environment issues, maybe you're using a Postgres DB and there's some difference between that config and the sqllite config. Just guessing those both probably won't help much, but good luck!!
p
@visch I've never looked into this but based on my guess and check method it does look like those
.meltano/run/elt/
files persist or at least for tap-gitlab which I was using as my non-sdk based test tap.
v
hmm got it! I could be very wrong haven't looked at it that close šŸ˜„ I'll watch that issue
a
Hi, I hope you're all doing well ! I'm not sure this is the right way to contribute in helping you solve this issue, but I encountered the same problem with the Salesforce extractor. Here's what I'm getting after upgrading to 1.87.0 (last time I did an upgrade was in march or april) and running this command :
meltano --log-level=debug elt tap-salesforce target-redshift --job_id=salesforce_to_redshift --dump=state > extract/salesforce_to_redshift.state.json
Result : ```[109027|MainThread|root] [DEBUG] Deleted configuration at /home/ubuntu/meltano/zenchef/.meltano/run/elt/salesforce_to_redshift/652b357f-52c9-4575-9cf5-153eb0a461f8/tap.6be5bef0-070f-47e3-b59e-cc5283991b7f.config.json [2021-11-08 111027,072] [109027|MainThread|meltano.cli.utils] [DEBUG] Could not find state file for this pipeline Traceback (most recent call last): File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/core/plugin_invoker.py", line 293, in dump return self.files[file_id].read_text() File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/async_generator/_util.py", line 53, in aexit await self._agen.athrow(type, value, traceback) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/core/plugin_invoker.py", line 272, in _invoke raise ExecutableNotFoundError( meltano.core.plugin_invoker.ExecutableNotFoundError: Executable 'tap-salesforce' could not be found. Extractor 'tap-salesforce' may not have been installed yet using
meltano install extractor tap-salesforce
, or the executable name may be incorrect. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/cli/elt.py", line 191, in dump_file content = await invoker.dump(file_id) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/core/plugin_invoker.py", line 298, in dump raise err.cause File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/core/plugin_invoker.py", line 270, in _invoke yield (popen_args, popen_options, popen_env) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/core/plugin_invoker.py", line 293, in dump return self.files[file_id].read_text() File "/usr/lib/python3.8/pathlib.py", line 1236, in read_text with self.open(mode='r', encoding=encoding, errors=errors) as f: File "/usr/lib/python3.8/pathlib.py", line 1222, in open return io.open(self, mode, buffering, encoding, errors, newline, File "/usr/lib/python3.8/pathlib.py", line 1078, in _opener return self._accessor.open(self, flags, mode) FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/meltano/zenchef/.meltano/run/elt/salesforce_to_redshift/652b357f-52c9-4575-9cf5-153eb0a461f8/state.json' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/cli/__init__.py", line 44, in main cli(obj={"project": None}) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/click/core.py", line 829, in call return self.main(*args, **kwargs) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/cli/params.py", line 23, in decorate return func(*args, **kwargs) File "/home/ubuntu/meltano/.venv/lib/python3.8/site-packages/meltano/…
Downgrading to meltano==1.67.0 solved my issue.
v
So the fix was to add the state capability to the tap. https://gitlab.com/meltano/meltano/-/issues/3052 is supposed to have the long term fixes?