brandon_isom
07/01/2021, 1:37 PMFULL_TABLE
or INCREMENTAL
replication? And if it's INCREMENTAL
how do I determine if a --full-refresh
flag has been passed in to the meltano elt
?douwe_maan
07/01/2021, 2:45 PM--full-refresh
and the replication-method
only have impact on the tap’s behavior. The tap does not communicate to the target if it’s doing a full refresh or an incremental sync.
This is Singer behavior that we could change if the need is high enough. What are you trying to accomplish that would require the target to know this?brandon_isom
07/01/2021, 4:34 PMbrandon_isom
07/01/2021, 4:35 PMincremental
from the presence of a replication_key
in the schema, though maybe that's out of spec.brandon_isom
07/01/2021, 4:35 PMSCHEMA
messagedouwe_maan
07/01/2021, 5:41 PMACTIVATE_VERSION
message was invented for: https://gitlab.com/meltano/meltano/-/issues/2508douwe_maan
07/01/2021, 5:42 PMdouwe_maan
07/01/2021, 5:43 PMdouwe_maan
07/01/2021, 5:45 PMaaronsteers
07/01/2021, 5:50 PMaaronsteers
07/01/2021, 5:54 PMSink.remove_old_records(prior_to_version_num)
and Sink implementations could optionally add logic to remove records prior to that specified version.brandon_isom
07/01/2021, 7:22 PMbrandon_isom
07/01/2021, 7:42 PMbrandon_isom
07/01/2021, 7:50 PMFULL_TABLE
streams, we'd want to overwrite. For INCREMENTAL
streams, we'd want to append, normally, but we'd like to support --full-refresh
functionality, without having to do any out of band janitoring.
In some cases in the past, we've also run into issues where some source system switched from integer ids to uuids, which ends up breaking spectrum queries, as the table now expects a string, but the older partitions are integers. So, gracefully rewriting partitions on a schema change like that would be a nice-to-have.douwe_maan
07/01/2021, 8:17 PMdouwe_maan
07/01/2021, 8:18 PMaaronsteers
07/01/2021, 8:34 PMSo, a bit of context that may help, I'm putting together a target that will write partitioned (or sometimes not) parquet out to s3 with glue catalog tables/partitions on top of it, so they're queryable via spectrum.First, just as a quick aside, @brandon_isom, have you seen our new/WIP target-athena as discussed in #C01ASPH8GSX? While the name has "Athena", technically we also register the tables in the Glue catalog and could be used for Spark and other Glue-compatible services. We're adding Parquet support and partitioning their as well. Would love to pool resources with you if that's something you wanted to work together on. (cc @andrew_stewart)
aaronsteers
07/01/2021, 8:36 PMIn some cases in the past, we've also run into issues where some source system switched from integer ids to uuids, which ends up breaking spectrum queries, as the table now expects a string, but the older partitions are integers. So, gracefully rewriting partitions on a schema change like that would be a nice-to-have.The pattern I've seen work well in these cases is - when a column type is modified to an incompatible type (such as uuid->int or str->date), rename the old column with a suffix and create a new column with the orginal name and new data type.
aaronsteers
07/01/2021, 8:38 PMaaronsteers
07/01/2021, 8:41 PMForI think this ties back to the ACTIVATE_VERSION message type we discussed above - OR some kind of an override of the target table name so your automated processes can just delete the older version.streams, we'd want to overwrite. ForFULL_TABLE
streams, we'd want to append, normally, but we'd like to supportINCREMENTAL
functionality, without having to do any out of band janitoring.--full-refresh
aaronsteers
07/01/2021, 8:42 PMaaronsteers
07/01/2021, 8:43 PMtarget-athena
implementation, or if additional effort would be needed.brandon_isom
07/01/2021, 8:44 PMhave you seen our new/WIP target-athena as discussed in #C01ASPH8GSX?Yeah, I've taken a peek. We've got some processes already that we're basically wrapping with this target, and I don't think we want to add a dependency on Athena, atm.