Jesse Neumann
02/21/2025, 10:45 PMJesse Neumann
02/22/2025, 11:42 PMutilities:
- name: airflow
variant: apache
pip_url: ...
- name: llm-postgres
namespace: llm-postgres
commands:
run:
executable: python
args: llm/main.py
Then in the llm folder in the root of the Meltano project (/llm/configs
) you specify the requirements for the LLM call. Each config yaml specifies the required LLM arguments such as model details and prompt template. The AI generated JSON results are then saved in the specified destination table.
config:
model: "deepseek-r1-distill-qwen-32b"
api_key_env: "GROQ_API_KEY" # Name of environment variable with actual key
base_url: "<https://api.groq.com/openai/v1>"
messages:
- role: "system"
content: "You are an expert at crafting jokes given a subject's name."
- role: "user"
content: |
Create a joke about {author}.
parameters:
temperature: 0.7 # Range: 0-2, default 1.0
top_p: 0.9 # Range: 0-1, default 1.0
sources:
- name: "authors"
schema: "public"
tables:
- name: "authors"
primary_key: "authors"
input_mapping:
- column: "authors"
alias: "author"
destination:
schema: "public"
table: "jokes"
columns: ["content", "explanation", "rating"]
load_strategy: "upsert" # append-only|upsert|overwrite
Now I can add the utility to the job and it will be run as part of the pipeline as desired:
jobs:
- name: demo-pipeline
tasks:
- tap-github target-postgres dbt-postgres:run llm-postgres:run
Let's me keep data transformations and orchestration inside Meltano which is great.