Hi Team I hope you re all doing well I wanted to reach out r Meltano #troubleshooting

Hi Team, I hope you're all doing well. I wanted to...

diwakar_kasi

08/29/2023, 1:39 PM

Hi Team, I hope you're all doing well. I wanted to reach out regarding an issue I've been encountering while using the

tap-universal-file

extractor to pull

JSON

data from

S3

storage. Specifically, I'm working with a particular data stream that contains multiple

JSON

files, and I've noticed that the schema isn't consistent across all of them. Some files have additional

JSON

fields that are not present in others. As

jsonl_sampling_strategy

is defaulted to '`first`' and '`all`' is not supported yet, the extractor skips to read the additional fields. I also attempted to specify a

catalog

file to explicitly select these additional fields when they're available, but unfortunately, it doesn't seem to be working as expected. I'm hoping that some of you might have encountered a similar issue or have suggestions on how to proceed. Any insights or guidance you can provide would be greatly appreciated. Thanks!

diwakar_kasi

08/29/2023, 2:21 PM

Here's the message on stdout:

Copy code

2023-08-29T14:08:03.391967Z [info     ] Found catalog in /Users/dkasi/Documents/GitHub/data-dagster/meltano/extract/tap-sailthru-data-exporter.catalog.json
2023-08-29T14:08:12.545093Z [info     ] 2023-08-29 10:08:12,544 | WARNING  | tap-universal-file   | Properties ('template',) were present in the 'blast' stream but not found in catalog schema. Ignoring. cmd_type=extractor job_id=2023-08-29T140800--tap-sailthru-data-exporter--blast--target-jsonl name=tap-sailthru-data-exporter--blast run_id=dd4cf68f-3f71-440f-9d59-14b7b531fbb4 stdio=stderr

diwakar_kasi

08/29/2023, 2:24 PM

message has been deleted

8 Views

Open in Slack

Previous Next