followup on <https meltano slack com archives C04632W0HT2 p1 Meltano #singer-targets

Join Slack

followup on <https://meltano.slack.com/archives/C0...

# singer-targets

visch

02/08/2023, 5:40 PM

followup on https://meltano.slack.com/archives/C04632W0HT2/p1674572371394779 from #C01QS0RV78D @thomas_briggs 🧵

visch

02/08/2023, 5:42 PM

@thomas_briggs Here's what I have, trying to share as much of this as I can this is a generator I have. Note this is really really crap code that breaks in all sort of scenarios which is why I didn't share it. This does just what I need but it's not going to apply generally. There's an overlap with target-jinja in the cookie cutter section. generator.py

Copy code

import os
from datetime import datetime
from distutils.dir_util import copy_tree, remove_tree

import click
from cookiecutter.main import cookiecutter


@click.command()
@click.argument('model_folder')
@click.argument('schema_name')
@click.argument('model_name')
@click.option('--extract_model_name')
@click.option('--extract_source_relation')
@click.option('--cookiecutter_folder', default='cookiecutter/')
@click.option('--cookiecutter_folder', default='cookiecutter/')
@click.option('--output_folder', default='out/', help='Folder I will copy cookiecutter generated files to')
#@click.option('--temp_folder', default='/temp', help='Folder I will generate cookiecutters in')
def generate_models(model_folder, schema_name, model_name, cookiecutter_folder, output_folder, extract_model_name, extract_source_relation):
    click.echo(locals())
    model_name = model_name.upper()
    primary_keys = {}
    with open("powerschool_primary_keys.txt", "r") as f:
        # Read in each line split by tabs to a dictionary
        primary_keys = {line.split("\t")[1]: line.split("\t")[0] for line in f.read().splitlines()}
    extra = {
        "model_folder":model_folder, 
        "schema_name":schema_name, 
        "model_name": model_name, 
        "output_folder":output_folder,
        "extract_model_name": extract_model_name,
        "extract_source_relation": extract_source_relation,
        "primary_key": primary_keys[model_name],
        }
    
    #Generate cookie cutter files
    cookiecutter_caller(cookiecutter_folder, extra)

    #Move them to output folder
    copy_tree(model_folder, output_folder+model_folder+"//")

    #Remove old cookie cutter files
    remove_tree(model_folder)

    #Move to DBT folder

    #datamart
    source_folder = output_folder+model_folder+"/"+schema_name
    db_name = "DataMart"
    source_file_name = f"/{db_name}__PowerSchool__"+model_name+".sql"
    output_folder = f"../meltano/transform/models/{db_name.lower()}/"+schema_name
    os.rename(source_folder + source_file_name, output_folder+source_file_name)

    #stage
    db_name = "Stage"
    source_file_name = f"/{db_name}__PowerSchool__"+model_name+".sql"
    output_folder = f"../meltano/transform/models/{db_name.lower()}/"+schema_name
    os.rename(source_folder + source_file_name, output_folder+source_file_name)

    #update sources.yml
    with open(f"../meltano/transform/models/stage/{schema_name}/sources.yml", "a") as sources_file:
        sources_file.write("\n      - name: PS_"+model_name)
    
    #TODO Meltano.yml

    print(f"Run this: python autoidm.py --tap_name=tap-powerschool --target_name=target-mssql --select_filter=\"PS-{model_name}\" --dbt_modelfilter=\"+DataMart__PowerSchool__{model_name}\"")


def cookiecutter_caller(cookiecutter_folder, extra):
    cookiecutter(
        cookiecutter_folder,
        extra_context=extra,
        no_input=True
    )

if __name__ == '__main__':
    generate_models()

Example cookie cutter file

Copy code

with source as (
        
        {{ '{{' }} cleaned_source('{{cookiecutter.schema_name}}', 'PS_{{cookiecutter.model_name|upper}}') {{ '}}'}}    

), final AS (
	select 
	*
	from source
)
select * from final

visch

02/08/2023, 5:46 PM

Maybe your approach is the best as it's most "unix" like. 1. A target-jinja (or could be cookiecutter if you wanted, jinja may just be easier) 2. A target-dbt_source 3. target-meltano And then combine them in different ways. Hmm

visch

02/08/2023, 5:47 PM

Then I want a 1. target-gitlab-repo / target-github-repo , but really you'd normally do this in a gitlab CI file / github actions file. And you'd have an MR get auto pushedup

visch

02/08/2023, 5:48 PM

The thing that's not clear to me is where to pull out the pieces

thomas_briggs

02/08/2023, 6:01 PM

Sorry @visch I don't understand that script... I don't know Python 🤣 😛

thomas_briggs

02/08/2023, 6:01 PM

j/k real comments after lunch 😉

visch

02/08/2023, 6:01 PM

😄

thomas_briggs

02/08/2023, 7:13 PM

So here's a thought: what if we built a target-meltano-cli? 🤯 It would execute a meltano CLI command per record, with the fields in the record specifying parameters of the command. You could use that to run a

meltano select tap <new_table>.*

and I could use it to run

meltano state remove <streamId>

... at least once the state command has a remove option, anyway. 😉 You might also need a meltano utility to raise a PR to get the new/modified files checked in though.

thomas_briggs

02/08/2023, 7:14 PM

I'm totally in Mad Scientist Mode here though so these thoughts are admittedly only half-baked.

Open in Slack

Previous Next