What's the suggested approach for handling transfo...
# getting-started
o
What's the suggested approach for handling transformations that are beyond DBT's capabilities? I've got a postgres warehouse, loading CSV data in. I need to do fuzzy matching on company name (e.g. ACME Corp. vs Acme Co. vs Acme). It's brutal in SQL/DBT. Ideally I could write a python script to do it. I see that there are plugins that are 'strongly discouraged'. Is there a recommended approach? Worst case, I'll write a custom script that is outside the meltano pipeline, but was hoping for something that might be more integrated, for example to help deal with incremental updates.
a
Do you have an option to use an orchestrator that supports python ops like airflow or dagster?
o
@Andy Carter you bet - I can use anything. Wasn’t sure if meltano had something specifically recommended for this, or if dagster was the way to go. Happy to go that route.
v
I do this a lot with utilities in Meltano, I make a python script and just run it with a Meltano utility!
1
Also I'd try to give soundex a try in SQL, it tends to getg you pretty close for that kind of thing
but totally understand wanting better closeness functions!
🙌 1
a
@visch would be interested to see an example of how to run python script as a utility, is there a way to do it natively?
v
Here's one way to do it (there's more depending on what you're after as at the end of the day we're just running an executable with utilities)
Copy code
utilities:
  - name: autoidm-transform
    namespace: autoidm_transform
    pip_url: -e ./autoidm-transform
  - name: autoidm-ehsinsight_report
    namespace: autoidm_transform
    pip_url: -e ./autoidm-transform
In the project there's a folder called autoidm-transform that's a standard python project so
./autoidm-transform/pyproject.toml
Copy code
./autoidm-transform/autoidm_transform/script.py
pyproject.toml that looks like this
Copy code
[tool.poetry]
name = "autoidm-transform"
version = "0.1.0"
description = "Client Transformations"
authors = ["Derek Visch <dvisch@autoidm.com>"]

[tool.poetry.dependencies]
python = ">=3.9"
pandas = "1.5.0"
numpy = "1.26.4" # Pinned to prevent ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
sqlalchemy = "1.4.40"
psycopg2-binary = "2.9.3"

[tool.poetry.dev-dependencies]
pytest = "^5.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry.scripts]
autoidm-transform = "autoidm_transform.desired_azuread:Transformation.run"
an even simpler way would be to not worry about a pyproject.toml file, put all your deps in
pip_url
and then add the exectuable key in your meltano.yml to point to your python file that has whatever you want to run
something like
Copy code
utilities:
  - name: autoidm-transform
    executable: $MELTANO_PROJECT_ROOT/script.py
    pip_url: sqlalchemy
🙌 1
o
Very cool! But I’m not sure how to get it to work.
Copy code
utilities:
  - name: dbt-postgres
    variant: dbt-labs
    pip_url: dbt-core dbt-postgres meltano-dbt-ext~=0.3.0
    config:
      host: localhost... etc
  - name: company-matcher
    executable: $MELTANO_PROJECT_ROOT/matcher.py
    pip_url: sqlalchemy python-dotenv
Copy code
$ meltano invoke company-matcher                                      
2025-03-21T21:19:31.159251Z [info     ] Environment 'dev' is active
Utility 'company-matcher' is not known to Meltano. Try running `meltano lock --update --all` to ensure your plugins are up to date.

$ meltano lock --update --all
Utility 'company-matcher' is not known to Meltano. Check <https://hub.meltano.com/> for available plugins.
Running the matcher.py by itself works ‘great’ (it’s a hello world to test how to invoke utilities’. I’m running meltano version 3.6.0
v
add a namespace
namespace: company_match
see if that does it for you or not