What s the suggested approach for handling transformations t Meltano #getting-started

What's the suggested approach for handling transfo...

Oren Teich

03/20/2025, 4:21 AM

What's the suggested approach for handling transformations that are beyond DBT's capabilities? I've got a postgres warehouse, loading CSV data in. I need to do fuzzy matching on company name (e.g. ACME Corp. vs Acme Co. vs Acme). It's brutal in SQL/DBT. Ideally I could write a python script to do it. I see that there are plugins that are 'strongly discouraged'. Is there a recommended approach? Worst case, I'll write a custom script that is outside the meltano pipeline, but was hoping for something that might be more integrated, for example to help deal with incremental updates.

Andy Carter

03/20/2025, 8:13 AM

Do you have an option to use an orchestrator that supports python ops like airflow or dagster?

Oren Teich

03/20/2025, 4:03 PM

@Andy Carter you bet - I can use anything. Wasn’t sure if meltano had something specifically recommended for this, or if dagster was the way to go. Happy to go that route.

visch

03/20/2025, 6:56 PM

I do this a lot with utilities in Meltano, I make a python script and just run it with a Meltano utility!

✅ 1

visch

03/20/2025, 6:58 PM

Also I'd try to give soundex a try in SQL, it tends to getg you pretty close for that kind of thing

visch

03/20/2025, 6:58 PM

but totally understand wanting better closeness functions!

🙌 1

Andy Carter

03/21/2025, 8:34 AM

@visch would be interested to see an example of how to run python script as a utility, is there a way to do it natively?

visch

03/21/2025, 1:09 PM

Here's one way to do it (there's more depending on what you're after as at the end of the day we're just running an executable with utilities)

Copy code

utilities:
  - name: autoidm-transform
    namespace: autoidm_transform
    pip_url: -e ./autoidm-transform
  - name: autoidm-ehsinsight_report
    namespace: autoidm_transform
    pip_url: -e ./autoidm-transform

In the project there's a folder called autoidm-transform that's a standard python project so

./autoidm-transform/pyproject.toml

Copy code

./autoidm-transform/autoidm_transform/script.py

pyproject.toml that looks like this

Copy code

[tool.poetry]
name = "autoidm-transform"
version = "0.1.0"
description = "Client Transformations"
authors = ["Derek Visch <dvisch@autoidm.com>"]

[tool.poetry.dependencies]
python = ">=3.9"
pandas = "1.5.0"
numpy = "1.26.4" # Pinned to prevent ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
sqlalchemy = "1.4.40"
psycopg2-binary = "2.9.3"

[tool.poetry.dev-dependencies]
pytest = "^5.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.scripts]
autoidm-transform = "autoidm_transform.desired_azuread:Transformation.run"

visch

03/21/2025, 1:10 PM

an even simpler way would be to not worry about a pyproject.toml file, put all your deps in

pip_url

and then add the exectuable key in your meltano.yml to point to your python file that has whatever you want to run

visch

03/21/2025, 1:12 PM

something like

Copy code

utilities:
  - name: autoidm-transform
    executable: $MELTANO_PROJECT_ROOT/script.py
    pip_url: sqlalchemy

🙌 1

Oren Teich

03/21/2025, 9:21 PM

Very cool! But I’m not sure how to get it to work.

Copy code

utilities:
  - name: dbt-postgres
    variant: dbt-labs
    pip_url: dbt-core dbt-postgres meltano-dbt-ext~=0.3.0
    config:
      host: localhost... etc
  - name: company-matcher
    executable: $MELTANO_PROJECT_ROOT/matcher.py
    pip_url: sqlalchemy python-dotenv

Copy code

$ meltano invoke company-matcher                                      
2025-03-21T21:19:31.159251Z [info     ] Environment 'dev' is active
Utility 'company-matcher' is not known to Meltano. Try running `meltano lock --update --all` to ensure your plugins are up to date.

$ meltano lock --update --all
Utility 'company-matcher' is not known to Meltano. Check <https://hub.meltano.com/> for available plugins.

Running the matcher.py by itself works ‘great’ (it’s a hello world to test how to invoke utilities’. I’m running meltano version 3.6.0

visch

03/25/2025, 3:43 PM

add a namespace

namespace: company_match

see if that does it for you or not

9 Views

Open in Slack

Previous Next