How possible is the following use case with Meltan...
# getting-started
g
How possible is the following use case with Meltano? We have one API data source we want to extract from, let's say it provides 4 fields in its payload:
Copy code
{
  "field_1": "value_1",
  "field_2": "value_2",
  "field_3": "value_3",
  "field_4": "value_4"
}
We want to send data from 2 of these fields to Table A:
Copy code
| field_1 | field_2 |
And send one of the other fields to Table B:
Copy code
| field_4 |
We don't want to hit the API for the data twice (once for each table), because the API provides all the data we need in a single extraction. We'd rather parse the payload and redirect part of it to a second table (ie, field_4 to Table B), and ignore field_3 altogether. Any thoughts?
e
Hi Garret! I think you could use stream maps for that with a config like:
Copy code
{
  "stream_maps": {
    "stream_2": {
      "__source__": "stream_1",
      "field_4": "field_4",
      "__else__": null
    },
    "stream_1": {
      "field_1": "field_1",
      "field_2": "field_2",
      "__else__": null
    }
  }
}
g
That could be just what I'm looking for, thanks! I'm a bit confused as to where to put this config though. We have created custom taps with our own Tap and Stream classes that override some of the default methods. Would this config be a property of our custom Stream class?
e
You would put this in the tap config option, e.g.
--config config.json
or if you’re using Meltano you can something like
Copy code
plugins:
  extractors:
  - name: your-tap
    config:
      stream_maps: ...
g
Amazing, thank you!
e
Np, do let me know how it goes!
g
I think I'm missing something. I set my meltano.yml file up like this:
Copy code
plugins:
  extractors:
    - name: tap-my-custom-tap
      config:
        stream_maps:
          stream_1:
            id: "id"
            ... # all my other fields
          stream_2:
            __source__: "stream_1"
            ... # more fields
I run my job like this:
Copy code
meltano --log-level info elt tap-my-custom-tap target-postgres --state-id my-state-id --select stream_1 --select stream_2
And 2 issues arise: 1. The field names are being parsed as simpleeval commands, and aren't recognized (
singer_sdk.exceptions.MapExpressionError: Failed to evaluate simpleeval expressions id.
) 2. Both streams hit the API for the data. I imagine there's some other config I need set up so stream_1 always hits the API for data, and stream_2 only hits the API if it is the only stream selected, otherwise it should use the same data from stream_1's run
e
Ok, so a couple of notes: 1.
--select
does not currently work with streams generated by stream maps, only with the original streams (has to do with the singer catalog). You can see which streams are those by running
meltano select tap-my-custom-tap --list --all
2. Make sure you’re referencing an existing stream, e.g. if the original stream single stream in your tap is
my_stream
(which is only extracted once, but you want to split each record into two) you’d need a config like this:
Copy code
plugins:
  extractors:
    - name: tap-my-custom-tap
      config:
        stream_maps:
          my_stream:
            id: "id"
            ... # all my other fields
          new_stream:
            __source__: "my_stream"
            ... # more fields
The tap would only be hitting the API for
my_stream
, and stream maps only work on the already-extracted record.