Hi Everyone, had a question regarding meltano SDK ...
# getting-started
g
Hi Everyone, had a question regarding meltano SDK and schemas. The source I’m trying to ingest from ships data columnwise. and the number of columns will change (each column is named for
mm/yyyy
id
02/2020
) As we run this in the days and months ahead seeing more columns is expected and I’m wondering if we can generate the schema in sequance with hitting the source API. IE: 1. Call API, get response. 2. From response generate schema. 3. Yield rows that have been post processed. The API we’re calling claims to be RESTfull, it’s absolutely not. I’m using a custom client for this. Right now it’s working fine but with a hardcoded Schema… IE:
Copy code
class StreamyStream(StreamyStream):
    """Define custom stream."""

    name = "Versions"
    primary_keys = [
        "Field_a",
        "Field_b",
        "Field_c",
        "Field_d",
        "Field_e",
    ]
    replication_key = None

    schema = th.PropertiesList(
        th.Property("Field_a", th.StringType),
        th.Property("Field_b", th.StringType),
        th.Property("Field_c", th.StringType),
        th.Property("Field_d", th.StringType),
        th.Property("Field_e", th.StringType),
        th.Property("02/2014", th.NumberType),
        th.Property("03/2014", th.NumberType),
        th.Property("04/2014", th.NumberType)
        ... etc
    ).to_dict()
Thanks a bunch!
a
Hi, @gary_lucas - and welcome! Yes, you can absolutely create this schema on-the-fly. Where you currently have
schema
defined as a static attribute, you can instead provide a dynamic
@property
method as in the sample here: Code Samples — Meltano SDK 0.2.0 documentation A couple things you'd have to tackle though: 1. We don't yet have an easy entrypoint to send supplemental requests using the same auth/request rails. That's certainly doable but we don't have existing patterns or docs on that approach. (Tracked here: Streamline complementary REST requests (#93)) The workaround would just be to call
requests
directly, optionally piggy-backing on
Stream.http_headers
and/or
Stream.authenticator.auth_headers
2. The fully-adaptive schema use cases isn't yet supported - wherein any change to the stream schema could happen anywhere during the stream. Essentially, you'd need to dynamically update
Stream.schema
and make extra calls to
Stream._write_schema_message()
- but that is probably overkill for the use case you describe. And the unfortunate downsides of implementing a fully adaptive schema are (1) the
--discover
method which documents the best/latest known schema would presumably not know about the adaptive changes, and (2) downstream targets don't always deal gracefully with a schema that is modified mid-stream (although the spec generally has no problem with it).
For either approach - either streamlining complementary REST requests or adding adaptive schema capabilities, we'd welcome contributions. The first approach is much easier, and may even be be doable today. Is this helpful at all?
g
Hi @aaronsteers thanks! It’s helpful. My current thoughts are we can manually stub out to
12/2023
And that should be fine for awhile. I will create a ticket for our team to investigate dynamically creating schema and try to get that scheduled in the next quarter or two. My theory is that by the time we do that work we will have written several meltano taps / targets and just generally be more competent at the framework.