Hello everyone! I am trying to debug an Airflow j...
# singer-taps
a
Hello everyone! I am trying to debug an Airflow job which is consuming lots of resources and being super slow. I am using
tap-zendesk:twilio
to EL a
tickets
table. So I have noticed that
generated_timestamp
is used for incremental mode. And that this is the same thing as
updated_at
which is all fine and dandy. In
meltano.yml
, my config looks like below. For testing purposes, you will notice that I set a start_date and end_date leaving us with a short interval of only 5 minutes.
- name: tap-zendesk
variant: twilio-labs
pip_url: twilio-tap-zendesk
config:
default_replication_method: INCREMENTAL
email: <mailto:zendesk-api@one.com|zendesk-api@one.com>
subdomain: mysubdomain
start_date: '2024-01-18T12:00:00Z'
end_date: '2024-01-18T12:05:00Z'
The way I understand it, it should only load data where generated_timestamp is within these boundaries, yet I see it go back as far as
2023-12-11
. I have concluded that my job is slow because it is loading data from way far behind. Without going too much into the source code for this tap or into Singer, does anyone know if I am doing something wrong or misinterpreting something? Many thanks to whomever might help me crack this one! Cheers. 😄
e
Hi @Alin! Looking the tap's source, it seems like it should be using the start date to extract incrementally: https://github.com/twilio-labs/twilio-tap-zendesk/blob/1fd317a8b9b5e64aea7f7fb7b9c989c9c05e46cf/tap_zendesk/streams.py#L244-L250 And the client implementation in the zenpy library says the same thing: https://github.com/facetoe/zenpy/blob/7bb3ff89f517ab77e80edfaed2486ad52839d699/zenpy/lib/endpoint.py#L222
a
Hey, I actually figured it out, sort of. So the tap uses an Sqlite database, by default, where it stores information about past runs. Thus, there is a 'state' that is being passed when invoking the tap. And apparently that always overrides the configuration. A bit weird but...
e
So the tap uses an Sqlite database
You mean
.meltano/meltano.db
? That's expected. See https://docs.meltano.com/contribute/prerequisites/#system-database.
Thus, there is a 'state' that is being passed when invoking the tap. And apparently that always overrides the configuration. A bit weird but...
Ok, you could try using the
--full-refresh
flag for
meltano run
. That should tell Meltano to ignore the state.