diwakar_kasi
08/29/2023, 1:39 PMtap-universal-file
extractor to pull JSON
data from S3
storage.
Specifically, I'm working with a particular data stream that contains multiple JSON
files, and I've noticed that the schema isn't consistent across all of them. Some files have additional JSON
fields that are not present in others.
As jsonl_sampling_strategy
is defaulted to '`first`' and '`all`' is not supported yet, the extractor skips to read the additional fields. I also attempted to specify a catalog
file to explicitly select these additional fields when they're available, but unfortunately, it doesn't seem to be working as expected.
I'm hoping that some of you might have encountered a similar issue or have suggestions on how to proceed. Any insights or guidance you can provide would be greatly appreciated.
Thanks!diwakar_kasi
08/29/2023, 2:21 PM2023-08-29T14:08:03.391967Z [info ] Found catalog in /Users/dkasi/Documents/GitHub/data-dagster/meltano/extract/tap-sailthru-data-exporter.catalog.json
2023-08-29T14:08:12.545093Z [info ] 2023-08-29 10:08:12,544 | WARNING | tap-universal-file | Properties ('template',) were present in the 'blast' stream but not found in catalog schema. Ignoring. cmd_type=extractor job_id=2023-08-29T140800--tap-sailthru-data-exporter--blast--target-jsonl name=tap-sailthru-data-exporter--blast run_id=dd4cf68f-3f71-440f-9d59-14b7b531fbb4 stdio=stderr
diwakar_kasi
08/29/2023, 2:24 PM