Hi all - I'm a little new to meltano, but enjoying...
# troubleshooting
a
Hi all - I'm a little new to meltano, but enjoying my time with it so far. I'm running the airbyte wrapper with the
airbyte/source-azure-blob-storage
image to clone data from azure blob storage to s3. This may be not a particularly intended use-case, but if it helps us keep our data pipeline orchestration and auth centralised, it'd be pretty handy. I'm successfully pulling data out of azure blob using the airbyte wrapper. At the moment, it's only got a single stream with file type
jsonl
. With
target-jsonl
, it outputs all data into a single file with a line per file from the source system, each of which has a
_ab_source_file_url
field. I'd like to use that field to specify an output filename with
target-s3
, so that the one stream can be split and essentially mirror the storage from azure blob to s3. Can anyone point me in the right direction? I imagine stream maps and flattening might have some bearing, but I'm struggling to map the relationship between the jsonl output from from the airbyte connector to the target-s3 output.
1
e
Hi @Alex Maras!
target-s3
only seems to use the stream name https://github.com/crowemi/target-s3/blob/24d451a9c1b38910b247fdaf478960f5a8084b27/target_s3/formats/format_base.py#L109 and the batch timestamp https://github.com/crowemi/target-s3/blob/24d451a9c1b38910b247fdaf478960f5a8084b27/target_s3/formats/format_base.py#L121 to determine the file path. That means you'd need to map
_ab_source_file_url
to the stream name, which is not a use case currently supported by stream maps. The good news is you can write your own mapper script that does exactly what you need. See https://github.com/edgarrmondragon/singer-playground/blob/main/merge_streams/map.py for an example of a very simple mapper.
a
Thanks @Edgar Ramírez (Arch.dev)! I wrote a simple mapper for handling prefixing tables with a given string for a sync job between MSSQL and Snowflake, so I should be able to build off that basis to get this working. Thanks for the pointer.
e
Awesome!
a
ok, got it working - I ended up having to fork the
target-s3
one too, so that I could force it to use the stream name directly instead of trying to append
.json
and always
gzip
stuff. Thanks again for the help! If I manage to clean up the mapping plugin to be generic enough to work - i.e. so that you could split a table by a field within that table into multiple streams - then I'll look at putting it up as a public plugin. It should be relatively easy, I just need to handle multiple streams initially, as I'm just dealing with a single stream here and my config won't account for multiple streams with multiple schemas.
👍 1