I ve opened up a github issue with `datamill co target postg Meltano #troubleshooting

I've opened up a github issue with `datamill-co/ta...

emcp

09/12/2021, 8:43 PM

I've opened up a github issue with

datamill-co/target-postgres

but wondering.. this thing looks about 2 years with no changes.. are others on a better version of a postgres target? https://github.com/datamill-co/target-postgres/issues/208

amanda.folson

09/13/2021, 6:53 PM

Have you checked out the variants here? https://hub.meltano.com/loaders/postgres

emcp

09/14/2021, 7:05 AM

i indeed saw theres some other alternatives and if no one shouted out for a specific one was planning to switch to the meltano variant

emcp

09/14/2021, 9:38 PM

holy smokes.. I switched variants in about a minute.. and it's now running .. at least the basic query.. going to let that finish and then try with the stuff that was failing with papermill I am going to demo this with someone tomorrow and just talk a bit about the culture mindset shift from old monolithic data platforms to this type of flexible switching and changing things up

emcp

09/14/2021, 9:38 PM

and I can see the meltano variant offers some nicer output too, very nice

emcp

09/14/2021, 9:44 PM

hmm upon inspection I have a bit more work to do apparently

emcp

09/14/2021, 9:44 PM

all the string based data is coming in

NULL

emcp

09/14/2021, 9:44 PM

oddly the integer data .. seems okay

emcp

09/14/2021, 9:45 PM

and the query datetimestamp .. that too is

NULL

emcp

09/14/2021, 9:47 PM

weird.. now the

query_start_time

is returning

NaT

emcp

09/14/2021, 9:47 PM

is this.. maybe how the variant works ? incrementally?

emcp

09/14/2021, 9:47 PM

and I need to just let it finish perhaps ?

emcp

09/14/2021, 9:50 PM

I see the variant defaults to a sort of

Copy code

COPY
UPDATE
INSERT

emcp

09/14/2021, 9:51 PM

I guess I've got a lot of learning to do in this area.. but would be interesting to see how I can speed this up

emcp

09/14/2021, 9:54 PM

Copy code

target-postgres | INFO Loading 27 rows into 'my_table'
target-postgres | INFO COPY my_table_temp ("col2", "col3", "col4", "col5", "col6", "col7", "col8") FROM STDIN WITH (FORMAT CSV, ESCAPE '\')
target-postgres | INFO UPDATE 27
target-postgres | INFO INSERT 0 0

emcp

09/14/2021, 9:54 PM

I think this INFO line might hold a clue to why my string based data is not showing up

emcp

09/14/2021, 9:55 PM

maybe I need to.. change something in the column descriptions ? seeing it's

VARCHAR

emcp

09/14/2021, 10:02 PM

streams.py

looks like

Copy code

schema = th.PropertiesList(
        th.Property("col2", th.StringType),
        th.Property("col3", th.StringType),
        th.Property("col4", th.StringType),
        th.Property("col5", th.StringType),
        th.Property("col6", th.StringType),
        th.Property("contract_id", th.IntegerType),
        th.Property("col7", th.DateTimeType)
    ).to_dict()

and my record looks like

Copy code

new_record = {
                            "col2": python_string.lower(),
                            "col3": python_string.lower(),
                            "col4": python_string.lower(),
                            "col5": python_string.lower(),
                            "col6": python_string.lower(),
                            "contract_id": integer_input,
                            "col7": python_string_with_datetime
                        }

emcp

09/14/2021, 10:09 PM

I used

schema: public

in the setup of the target... not sure whether I should drop that or ..

emcp

09/14/2021, 10:18 PM

based on my quick research.. this could be my custom taps fault? https://towardsdatascience.com/data-stacks-for-fun-nonprofit-part-iii-dcfd46da9f9f basically other taps setup some delimeter info in the confi like so

Copy code

version: 1
send_anonymous_usage_stats: true
project_id: xxx
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: original
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    capabilities:
    - catalog
    - discover
    - state
    config:
      tables:   
      - path: <https://dataverse.harvard.edu/api/access/datafile/>
        name: mit__house_elections
        pattern: "4202836"
        start_date: '2010-01-01T00:00:00Z'
        key_properties: []
        format: csv
        delimiter: "\t"

perhaps I too need to learn how to setup my tap for pushing into meltanos variant?

emcp

09/14/2021, 10:46 PM

switching now to try transferwise variant

emcp

09/14/2021, 11:02 PM

okay so.. with transferwise I get the strings to show up

emcp

09/14/2021, 11:02 PM

but wow I have loads to learn about postgres now

emcp

09/14/2021, 11:03 PM

because to debug this.. AND apache superset.. to get everything running smoothly I need to understand what schemas are for

emcp

09/14/2021, 11:34 PM

got it, just add

SELECT * FROM schema_name.table_name

and it's working now

emcp

09/14/2021, 11:34 PM

I'm curious to learn why the meltano variant did what it did but..

emcp

09/14/2021, 11:34 PM

for now go with the transferwise variant

emcp

09/14/2021, 11:34 PM

and test with the larger query..

emcp

09/14/2021, 11:36 PM

about 83681 different unique combos of A-Z .. from length 1-5 .. but I think now i need to work on cleaning up the query to be more efficient

emcp

09/15/2021, 8:41 AM

Update: So far Postgresql targets are doing this datamill-co = works until you start to increase the amount of queries.. maybe I need to increase some buffer or resource but.. the cursor collapses at some point and crashes the job meltano = works for small jobs but the string data appears to get varchar type in postgres... and the data just appears as

NULL

transferwise = still running the big big job (takes about 12 hours in bare bones python script) .. happy to say the string data and all gets populated and I currently have the job running at 9 hours 20 minutes

emcp

09/15/2021, 8:42 AM

im a noob at postgres in general so.. lots of this is just me learning how it works and basics.. but transferwise looks to be the winner so far

amanda.folson

09/15/2021, 3:16 PM

Sorry, just getting caught up on this. Glad you got it sorted out! That's interesting feedback about the variants...I'll pass that on to the team. I'll look into the varchar/NULL issue for the Meltano variant.

edward_ryan

02/03/2022, 3:55 PM

@jo_pearson seems like a similar issue to the one we're facing

Open in Slack

Previous Next