Hello I am new to Meltano and I m working on a pipeline usin Meltano #getting-started

Hello! I am new to Meltano and I’m working on a pi...

Steven Searcy

05/02/2025, 3:17 PM

Hello! I am new to Meltano and I’m working on a pipeline using tap-csv to ingest a CSV file and would like to load its data into multiple Postgres tables, depending on column mappings. Has anyone done this with Meltano before? Curious if you’d recommend using stream maps, custom plugins, or something like dbt for post-load splitting. Any best practices or patterns would be greatly appreciated!

visch

05/02/2025, 4:02 PM

generally the recommendation would be to first pull the CSV file into the database with Meltano then transform it using DBT (or just straight SQL etc) to put the data into the multiple tables you refer to

visch

05/02/2025, 4:04 PM

It's the general idea behind ELT vs ETL. Few possible reasons this is good 1. When your logic changes for those down stream tables you still have your source data 2. raw data can be used for another purpose 3. The way you pull the data into the database is less coupled to the down stream tables, you can change the source (FTP -> SFTP -> S3, etc etc) without having to worry about the down stream logic

👍 1

Steven Searcy

05/02/2025, 4:10 PM

Thanks Derek, that makes sense. I feel like I am maybe trying to extend stream maps too much, and some should be done in dbt. For example, this is a really basic example pipeline I have written for learning purposes:

Copy code

version: 1
default_environment: ${MELTANO_ENV}
project_id: 17a04890-5b7e-4473-b2ff-d4346303690a
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-csv
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-csv.git>
    config:
      files:
      - entity: users
        path: ./fixtures/users.csv
        keys: [id]
      stream_maps:
        users:
          id: int(user_id)
          full_name: first_name + " " + last_name
          email: email_address
          signup_source: source
          is_active: bool(active.lower() == "true")
          __else__: __NULL__
  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      host: ${PG_HOST}
      port: ${PG_PORT}
      user: ${PG_USER}
      password: ${PG_PASSWORD}
      database: ${PG_DATABASE}
      default_target_schema: public

I am taking
first_name
and
last_name
from the CSV and concatenating them for the actual Postgres column. Am I reaching here, and is this something that should be done in dbt instead? Or is this indeed a good use case for stream maps.

Steven Searcy

05/02/2025, 4:16 PM

Disregard ^ as I have figured it out. I was entirely over simplifying.

Matt Menzenski

05/06/2025, 3:57 PM

A lot of targets support a stream name format of

<schema_name>-<table_name>

(schema name and table name separated by a hyphen). If your variant of target-postgres supports this (I believe the MeltanoLabs variant does), you might be able to do this entirely with stream maps.

Matt Menzenski

05/06/2025, 3:59 PM

Stream maps will require that you know all table names and mapping information ahead of time, though. Dynamic mappings aren’t supported.(issue)

Steven Searcy

05/06/2025, 4:03 PM

Thanks Matt! I was able to do it with stream maps, but it seemed more appropriate to place this logic in the dtb sql file.

2 Views

Open in Slack

Previous Next