Hey everyone :wave: I have a pipeline that is inc...
# troubleshooting
b
Hey everyone 👋 I have a pipeline that is incrementally extracting from sql using
tap-mysql
to snowflake using
target-snowflake
. One of the tables, has a column named
last_checking
.. and it was referenced in the tap metadata to be used as
replication-key
.. later we changed the column from
last_checkin
->
time
.. and this broke the pipeline .. I'm wondering how to fix this issue thinkspin the options i'm thinking of are: 1. Wiping the whole data in snowflake, and deleting the
state.json
file that is used by meltano to incrementally extract the tables, so it will start the extraction from the beginning (this way, in snowflake we will end up by having the same table with the new column name only - and we might need to have the old column as well). 2. Editing the
state.json
file, and changing the replication key name from
last_checking
to
time
(not sure if meltano will create a new column named
time
and continue the extraction from where it stopped the day before OR should I manually create a new column in Snowflake to not confuse the
target-snowflake
loader) Any other options? Thanks a lot for the help.
e
If it was a simple column name change, then editing the state file and making sure the catalog refers to the new column name should suffice. If the column is present in the SCHEMA message emitted by the tap, then the target should automatically create it.
b
Which catalog you mean? How would I check if the column present in the schema message? The way we are runnjng the etl pipeline is by building a meltano docker image and triggering an aws batch job that runs fresh and new everytime except for the state.json that is stored in an s3 bucket. @edgar_ramirez_mondragon
e
I mean the singer tap catalog. If the only thing you persist after a pipeline run is the state file, then you shouldn't have a stale catalog on each run, so that's good.
How would I check if the column present in the schema message?
I'd try running the tap with
meltano invoke tap-mysql
and inspect the output to see if the SCHEMA message for the stream in question has the
time
field.