So I would need to convert some postgres columns for example Meltano #troubleshooting

So I would need to convert some postgres columns, ...

jonas_kalderstam

10/21/2022, 1:43 PM

So I would need to convert some postgres columns, for example

int4range

into two new columns

x_start

x_end

, as part of the loading. Has anyone solved this with existing taps?

jonas_kalderstam

10/21/2022, 1:58 PM

what I want is for the tap to instead of this:

Copy code

SELECT x FROM table

to do this

Copy code

SELECT lower(x) as x_start, upper(x) as x_end FROM table

visch

10/21/2022, 2:30 PM

With existing taps I haven't seen anything. I've thought about this a bit in a slightly different context, but I think it'd be great to have the ability to add adhoc streams to any sql tap like postgres where you could do something like

Copy code

name: tap_postgres:  
config:
    additional_queries:
    -  name: xstart_xend
       sql: SELECT lower(x) as x_start, upper(x) as x_end FROM table

I don't think that's really what you're after here as you're looking to apply this to multiple queries right? If you're trying to use lower(x) and upper for incremental loads or something then it's a different question 🤷

jonas_kalderstam

10/21/2022, 2:31 PM

this is just one one example

jonas_kalderstam

10/21/2022, 2:31 PM

the more prominent one would be our Spatial columns

jonas_kalderstam

10/21/2022, 2:31 PM

which needs

st_text(...)

to be run

visch

10/21/2022, 2:32 PM

Do you want to run custom sql arbitrarily? Or is it something consistent / programable?

jonas_kalderstam

10/21/2022, 2:33 PM

it's consistent as I know how I want it transformed every time. currently we use a couple of views in the product database to do the conversion for us. but for reasons we want to stop using views

visch

10/21/2022, 2:34 PM

I think the additional queries piece would work pretty well then but it'd be a fairly big mod, views are much easier in the existing taps 🤷

jonas_kalderstam

10/21/2022, 2:34 PM

yeah. then I'll continue hacking on my own sql tap

visch

10/21/2022, 2:35 PM

I really like the idea of running a few queries I know I want to run up front with the sql. There's a lot of time's that's useful for me especially when I don't have access to create views in the source db

visch

10/21/2022, 2:35 PM

or I don't really want the other team controlling the view 🙂

jonas_kalderstam

10/21/2022, 2:36 PM

exactly..

jonas_kalderstam

10/21/2022, 2:36 PM

or rather neglect the view..

aaronsteers

10/21/2022, 4:33 PM

There are two paths I can see which we've discussed internally and I think which could help here... First is to allow additional SQL expressions to be appended to an existing 'stream' - like a calculated column in a mapper but pushed down to the SQL layer so it uses the SQL engine. Second was to allow user-provided queries as the stream definition itself.

aaronsteers

10/21/2022, 4:36 PM

In both cases, detecting of the new schema types is a relatively difficult problem - but not impossible.

jonas_kalderstam

11/04/2022, 10:02 AM

I ended up solving this with https://github.com/YouCruit/youcruit-tap-rawpostgresql/ it allows you to write arbitrary SQL to run against the database. And you have to manually define the schema of the stream. This works since we only have one or two columns in a handful of tables which need custom SQL.

jonas_kalderstam

11/04/2022, 10:03 AM

the readme is very boilerplate. The test demonstrates how the meltano config would look: https://github.com/YouCruit/youcruit-tap-rawpostgresql/blob/master/tap_rawpostgresql/tests/test_postgres.py

aaronsteers

11/04/2022, 7:36 PM

Fantastic - thanks for sharing this!

aaronsteers

11/04/2022, 7:39 PM

cc @visch, @ken_payne, @edgar_ramirez_mondragon - above new work takes a 'streams' config which can accept 'sql' as an arbitrary query. (Still needs 'schema' manually defined, as we feared.) Definitely a strong start for something we'd eventually also want in the sdk.

visch

11/04/2022, 7:42 PM

yes I had someone asking for this yesterday (without directly asking) ! For schema during discovery there's some stuff we could do for sure to dynamically make it work (select one record or something), could probably remove the round trip but probably easiest to let sql alchemy take the record and make a schema for us

aaronsteers

11/04/2022, 7:50 PM

In the past, I've done odd things like creating a view (temp if supported) so I could scan the metadata from the generated view. Also I've done

CREATE (temp?) TABLE temp_foo AS SELECT ... LIMIT 0

and scanned metadata from that. Neither of these solutions is very elegant. That said, I just realized now that (at least in theory), this cost could be paid during discovery and not during stream time if the schema is cached into the catalog.json and if the sync-mode operation leverages it when available. 🤔 In theory, 'discovery' could also be run with additional permissions - and additional patience for the added scan/compile costs.

2 Views

Open in Slack

Previous Next