Not meltano related but I have questions about db ...
# getting-started
m
Not meltano related but I have questions about db indexes... • Can we define indexes on target tables using meltano config file? • Is there a way protect the same indexing from the source db when moving from postgres to postgres ? • What's the best practice here? Do we just ignore the indexing on the initial target tables since they will get transformed into new tables anyway?
I realised my really slow queries caused by not indexed sdc_batched_at, sdc_deleted_at fields. Fixed it for now by a custom ddl but I also want to define indexes for certain columns. Is there a good way to manage this type of requirement?
v
dbt for indexes isn't bad. An argument can be made for target-postgres making indexes as well
m
can you give me an example for configuring indexes target-postgres, or point me a link? I couldn't find anything
thanks for the reply by the way
v
I'm not sure you understand what I"m saying maybe? Some options 1. use dbt to create the indexes 2. Make a feature in target-postgres to do it for you
m
oh oky, already using dbt indexes. I thought you said we can define the indexes in the yaml file. got it now.
t
FWIW I have a dbt macro that creates an index on the _sdc_batched_at column in a given table. In our "first level" dbt models (the ones that read the raw data populated by target-postgres) I call that macro using a pre_hook.
m
would you mind sharing it?
t
Sure. The macro is just
Copy code
{% macro create_batched_at_index(tableName) %}
{% set sql %}
    CREATE INDEX IF NOT EXISTS {{ tableName|lower }}_batched_at ON <your-schema>."{{ tableName|lower }}"(_sdc_batched_at)
{% endset %}

{% do run_query(sql) %}
{% do log("Ensured existence of batched_at index on " ~ tableName, info=True) %}
{% endmacro %}
and then in a model that uses that table, include
Copy code
{{ config(
    ...
    pre_hook = "{{ create_batched_at_index('<raw-table>') }}",
    ...
    )
}}
It amounts to dbt changing tables created by tap-postgres, but... it does the job.
And importantly continues to do the job even if the raw tables are recreated, for some reason.
m
thanks, I'll try this. This gave me an idea about defining custom indices on dbt source.yml and create those indices with this pre hook method. I'm not sure if dbt allows additional arguments on the source.yml though.
this generated the log two times for me, so is it running twice or dbt compiles the macro at start?
Copy code
19:35:37  Running with dbt=1.5.1
19:35:40  Ensured existence of batched_at and deleted_at indexes on xyz
19:35:45  Found 208 models, 482 tests, 0 snapshots, 0 analyses, 654 macros, 0 operations, 5 seed files, 112 sources, 0 exposures, 0 metrics, 0 groups
19:35:45  
19:35:45  Concurrency: 4 threads (target='....')
19:35:45  
19:35:45  1 of 32 START sql view model --- .............. [RUN]
19:35:45  2 of 32 START sql view model --- ................. [RUN]
19:35:45  3 of 32 START sql view model --- .................. [RUN]
19:35:45  4 of 32 START sql view model xyz ............ [RUN]
19:35:46  Ensured existence of batched_at and deleted_at indexes on xyz
This fixed it:
Copy code
{% if execute %}
    {% do run_query(sql) %}
    {% do log("Ensured existence of batched_at and deleted_at indexes on " ~ schema_name ~ table_name, info=True) %}
{% endif %}