Not meltano related but I have questions about db indexes Ca Meltano #getting-started

Not meltano related but I have questions about db ...

mert_bakir

06/01/2023, 9:35 PM

Not meltano related but I have questions about db indexes... • Can we define indexes on target tables using meltano config file? • Is there a way protect the same indexing from the source db when moving from postgres to postgres ? • What's the best practice here? Do we just ignore the indexing on the initial target tables since they will get transformed into new tables anyway?

mert_bakir

06/02/2023, 11:49 AM

I realised my really slow queries caused by not indexed sdc_batched_at, sdc_deleted_at fields. Fixed it for now by a custom ddl but I also want to define indexes for certain columns. Is there a good way to manage this type of requirement?

visch

06/02/2023, 12:33 PM

dbt for indexes isn't bad. An argument can be made for target-postgres making indexes as well

mert_bakir

06/02/2023, 12:48 PM

can you give me an example for configuring indexes target-postgres, or point me a link? I couldn't find anything

mert_bakir

06/02/2023, 12:49 PM

thanks for the reply by the way

visch

06/02/2023, 12:49 PM

I'm not sure you understand what I"m saying maybe? Some options 1. use dbt to create the indexes 2. Make a feature in target-postgres to do it for you

mert_bakir

06/02/2023, 12:54 PM

oh oky, already using dbt indexes. I thought you said we can define the indexes in the yaml file. got it now.

thomas_briggs

06/02/2023, 1:10 PM

FWIW I have a dbt macro that creates an index on the _sdc_batched_at column in a given table. In our "first level" dbt models (the ones that read the raw data populated by target-postgres) I call that macro using a pre_hook.

mert_bakir

06/02/2023, 4:45 PM

would you mind sharing it?

thomas_briggs

06/02/2023, 4:51 PM

Sure. The macro is just

Copy code

{% macro create_batched_at_index(tableName) %}
{% set sql %}
    CREATE INDEX IF NOT EXISTS {{ tableName|lower }}_batched_at ON <your-schema>."{{ tableName|lower }}"(_sdc_batched_at)
{% endset %}

{% do run_query(sql) %}
{% do log("Ensured existence of batched_at index on " ~ tableName, info=True) %}
{% endmacro %}

and then in a model that uses that table, include

Copy code

{{ config(
    ...
    pre_hook = "{{ create_batched_at_index('<raw-table>') }}",
    ...
    )
}}

thomas_briggs

06/02/2023, 4:52 PM

It amounts to dbt changing tables created by tap-postgres, but... it does the job.

thomas_briggs

06/02/2023, 4:53 PM

And importantly continues to do the job even if the raw tables are recreated, for some reason.

mert_bakir

06/02/2023, 5:09 PM

thanks, I'll try this. This gave me an idea about defining custom indices on dbt source.yml and create those indices with this pre hook method. I'm not sure if dbt allows additional arguments on the source.yml though.

mert_bakir

06/04/2023, 7:38 PM

this generated the log two times for me, so is it running twice or dbt compiles the macro at start?

Copy code

19:35:37  Running with dbt=1.5.1
19:35:40  Ensured existence of batched_at and deleted_at indexes on xyz
19:35:45  Found 208 models, 482 tests, 0 snapshots, 0 analyses, 654 macros, 0 operations, 5 seed files, 112 sources, 0 exposures, 0 metrics, 0 groups
19:35:45  
19:35:45  Concurrency: 4 threads (target='....')
19:35:45  
19:35:45  1 of 32 START sql view model --- .............. [RUN]
19:35:45  2 of 32 START sql view model --- ................. [RUN]
19:35:45  3 of 32 START sql view model --- .................. [RUN]
19:35:45  4 of 32 START sql view model xyz ............ [RUN]
19:35:46  Ensured existence of batched_at and deleted_at indexes on xyz

mert_bakir

06/04/2023, 8:28 PM

This fixed it:

Copy code

{% if execute %}
    {% do run_query(sql) %}
    {% do log("Ensured existence of batched_at and deleted_at indexes on " ~ schema_name ~ table_name, info=True) %}
{% endif %}

Open in Slack

Previous Next