Hi Team, I hope you're all doing well. I wanted to...
# troubleshooting
d
Hi Team, I hope you're all doing well. I wanted to reach out regarding an issue I've been encountering while using the
tap-universal-file
extractor to pull
JSON
data from
S3
storage. Specifically, I'm working with a particular data stream that contains multiple
JSON
files, and I've noticed that the schema isn't consistent across all of them. Some files have additional
JSON
fields that are not present in others. As
jsonl_sampling_strategy
is defaulted to '`first`' and '`all`' is not supported yet, the extractor skips to read the additional fields. I also attempted to specify a
catalog
file to explicitly select these additional fields when they're available, but unfortunately, it doesn't seem to be working as expected. I'm hoping that some of you might have encountered a similar issue or have suggestions on how to proceed. Any insights or guidance you can provide would be greatly appreciated. Thanks!
Here's the message on stdout:
Copy code
2023-08-29T14:08:03.391967Z [info     ] Found catalog in /Users/dkasi/Documents/GitHub/data-dagster/meltano/extract/tap-sailthru-data-exporter.catalog.json
2023-08-29T14:08:12.545093Z [info     ] 2023-08-29 10:08:12,544 | WARNING  | tap-universal-file   | Properties ('template',) were present in the 'blast' stream but not found in catalog schema. Ignoring. cmd_type=extractor job_id=2023-08-29T140800--tap-sailthru-data-exporter--blast--target-jsonl name=tap-sailthru-data-exporter--blast run_id=dd4cf68f-3f71-440f-9d59-14b7b531fbb4 stdio=stderr
message has been deleted