So I would need to convert some postgres columns, ...
# troubleshooting
j
So I would need to convert some postgres columns, for example
int4range
into two new columns
x_start
,
x_end
, as part of the loading. Has anyone solved this with existing taps?
what I want is for the tap to instead of this:
Copy code
SELECT x FROM table
to do this
Copy code
SELECT lower(x) as x_start, upper(x) as x_end FROM table
v
With existing taps I haven't seen anything. I've thought about this a bit in a slightly different context, but I think it'd be great to have the ability to add adhoc streams to any sql tap like postgres where you could do something like
Copy code
name: tap_postgres:  
config:
    additional_queries:
    -  name: xstart_xend
       sql: SELECT lower(x) as x_start, upper(x) as x_end FROM table
I don't think that's really what you're after here as you're looking to apply this to multiple queries right? If you're trying to use lower(x) and upper for incremental loads or something then it's a different question 🤷
j
this is just one one example
the more prominent one would be our Spatial columns
which needs
st_text(...)
to be run
v
Do you want to run custom sql arbitrarily? Or is it something consistent / programable?
j
it's consistent as I know how I want it transformed every time. currently we use a couple of views in the product database to do the conversion for us. but for reasons we want to stop using views
v
I think the additional queries piece would work pretty well then but it'd be a fairly big mod, views are much easier in the existing taps 🤷
j
yeah. then I'll continue hacking on my own sql tap
v
I really like the idea of running a few queries I know I want to run up front with the sql. There's a lot of time's that's useful for me especially when I don't have access to create views in the source db
or I don't really want the other team controlling the view 🙂
j
exactly..
or rather neglect the view..
a
There are two paths I can see which we've discussed internally and I think which could help here... First is to allow additional SQL expressions to be appended to an existing 'stream' - like a calculated column in a mapper but pushed down to the SQL layer so it uses the SQL engine. Second was to allow user-provided queries as the stream definition itself.
In both cases, detecting of the new schema types is a relatively difficult problem - but not impossible.
j
I ended up solving this with https://github.com/YouCruit/youcruit-tap-rawpostgresql/ it allows you to write arbitrary SQL to run against the database. And you have to manually define the schema of the stream. This works since we only have one or two columns in a handful of tables which need custom SQL.
the readme is very boilerplate. The test demonstrates how the meltano config would look: https://github.com/YouCruit/youcruit-tap-rawpostgresql/blob/master/tap_rawpostgresql/tests/test_postgres.py
a
Fantastic - thanks for sharing this!
cc @visch, @ken_payne, @edgar_ramirez_mondragon - above new work takes a 'streams' config which can accept 'sql' as an arbitrary query. (Still needs 'schema' manually defined, as we feared.) Definitely a strong start for something we'd eventually also want in the sdk.
v
yes I had someone asking for this yesterday (without directly asking) ! For schema during discovery there's some stuff we could do for sure to dynamically make it work (select one record or something), could probably remove the round trip but probably easiest to let sql alchemy take the record and make a schema for us
a
In the past, I've done odd things like creating a view (temp if supported) so I could scan the metadata from the generated view. Also I've done
CREATE (temp?) TABLE temp_foo AS SELECT ... LIMIT 0
and scanned metadata from that. Neither of these solutions is very elegant. That said, I just realized now that (at least in theory), this cost could be paid during discovery and not during stream time if the schema is cached into the catalog.json and if the sync-mode operation leverages it when available. 🤔 In theory, 'discovery' could also be run with additional permissions - and additional patience for the added scan/compile costs.