Hello! I am new to Meltano and I’m working on a pi...
# getting-started
s
Hello! I am new to Meltano and I’m working on a pipeline using tap-csv to ingest a CSV file and would like to load its data into multiple Postgres tables, depending on column mappings. Has anyone done this with Meltano before? Curious if you’d recommend using stream maps, custom plugins, or something like dbt for post-load splitting. Any best practices or patterns would be greatly appreciated!
v
generally the recommendation would be to first pull the CSV file into the database with Meltano then transform it using DBT (or just straight SQL etc) to put the data into the multiple tables you refer to
It's the general idea behind ELT vs ETL. Few possible reasons this is good 1. When your logic changes for those down stream tables you still have your source data 2. raw data can be used for another purpose 3. The way you pull the data into the database is less coupled to the down stream tables, you can change the source (FTP -> SFTP -> S3, etc etc) without having to worry about the down stream logic
👍 1
s
Thanks Derek, that makes sense. I feel like I am maybe trying to extend stream maps too much, and some should be done in dbt. For example, this is a really basic example pipeline I have written for learning purposes:
Copy code
version: 1
default_environment: ${MELTANO_ENV}
project_id: 17a04890-5b7e-4473-b2ff-d4346303690a
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-csv
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-csv.git>
    config:
      files:
      - entity: users
        path: ./fixtures/users.csv
        keys: [id]
      stream_maps:
        users:
          id: int(user_id)
          full_name: first_name + " " + last_name
          email: email_address
          signup_source: source
          is_active: bool(active.lower() == "true")
          __else__: __NULL__
  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      host: ${PG_HOST}
      port: ${PG_PORT}
      user: ${PG_USER}
      password: ${PG_PASSWORD}
      database: ${PG_DATABASE}
      default_target_schema: public
I am taking
first_name
and
last_name
from the CSV and concatenating them for the actual Postgres column. Am I reaching here, and is this something that should be done in dbt instead? Or is this indeed a good use case for stream maps.
Disregard ^ as I have figured it out. I was entirely over simplifying.
m
A lot of targets support a stream name format of
<schema_name>-<table_name>
(schema name and table name separated by a hyphen). If your variant of target-postgres supports this (I believe the MeltanoLabs variant does), you might be able to do this entirely with stream maps.
Stream maps will require that you know all table names and mapping information ahead of time, though. Dynamic mappings aren’t supported.(issue)
s
Thanks Matt! I was able to do it with stream maps, but it seemed more appropriate to place this logic in the dtb sql file.