HI all, evaluating this for a project and wonder h...
# singer-taps
m
HI all, evaluating this for a project and wonder how you would handle flat files that have multiple record types contained in them. The first column will dictate the record type?
h
do all the records have the same schema, despite having different record types? if yes, you might be able to do an extract-load and split record types to their own tables once the data is in the warehouse. if no, but you know in advance the different record types, you can set up different taps, one for each record type, and filter the stream so that only the specific record type is selected. this approach will result in a separate table per record type.
m
So I know the different types and the schemas of those types. They are different number/type of columns based on the first column. I assume I would use the csv tap, these are pipe delimited, and then filter to the 3 tables based on that. Can you point me in the right direction, just started looking at this and unsure where to apply the filtering
h
Since the data is in csv files & tap-csv is based on the meltano singer-sdk, you might be able to use the stream-maps functionality You would specify the
__filter__
config for each the stream map. if stream-maps is not supported (i'm not sure because the tap is based on the sdk but the readme only advertises the capabilities sync, catalog & discover), you can use the standalone stream mapper meltano-map-transform to perform the same filtering.
if doing the latter, you'd have to use the
meltano run
command as shown in the mapper's readme
Copy code
meltano run tap-csv hash_email target-sqlite
m
thanks, will take a look at those to see if I can get a POC up and running
🙌 2