I've opened up a github issue with `datamill-co/ta...
# troubleshooting
e
I've opened up a github issue with
datamill-co/target-postgres
but wondering.. this thing looks about 2 years with no changes.. are others on a better version of a postgres target? https://github.com/datamill-co/target-postgres/issues/208
a
Have you checked out the variants here? https://hub.meltano.com/loaders/postgres
e
i indeed saw theres some other alternatives and if no one shouted out for a specific one was planning to switch to the meltano variant
holy smokes.. I switched variants in about a minute.. and it's now running .. at least the basic query.. going to let that finish and then try with the stuff that was failing with papermill I am going to demo this with someone tomorrow and just talk a bit about the culture mindset shift from old monolithic data platforms to this type of flexible switching and changing things up
and I can see the meltano variant offers some nicer output too, very nice
hmm upon inspection I have a bit more work to do apparently
all the string based data is coming in
NULL
oddly the integer data .. seems okay
and the query datetimestamp .. that too is
NULL
weird.. now the
query_start_time
is returning
NaT
is this.. maybe how the variant works ? incrementally?
and I need to just let it finish perhaps ?
I see the variant defaults to a sort of
Copy code
COPY
UPDATE
INSERT
I guess I've got a lot of learning to do in this area.. but would be interesting to see how I can speed this up
Copy code
target-postgres | INFO Loading 27 rows into 'my_table'
target-postgres | INFO COPY my_table_temp ("col2", "col3", "col4", "col5", "col6", "col7", "col8") FROM STDIN WITH (FORMAT CSV, ESCAPE '\')
target-postgres | INFO UPDATE 27
target-postgres | INFO INSERT 0 0
I think this INFO line might hold a clue to why my string based data is not showing up
maybe I need to.. change something in the column descriptions ? seeing it's
VARCHAR
streams.py
looks like
Copy code
schema = th.PropertiesList(
        th.Property("col2", th.StringType),
        th.Property("col3", th.StringType),
        th.Property("col4", th.StringType),
        th.Property("col5", th.StringType),
        th.Property("col6", th.StringType),
        th.Property("contract_id", th.IntegerType),
        th.Property("col7", th.DateTimeType)
    ).to_dict()
and my record looks like
Copy code
new_record = {
                            "col2": python_string.lower(),
                            "col3": python_string.lower(),
                            "col4": python_string.lower(),
                            "col5": python_string.lower(),
                            "col6": python_string.lower(),
                            "contract_id": integer_input,
                            "col7": python_string_with_datetime
                        }
I used
schema: public
in the setup of the target... not sure whether I should drop that or ..
based on my quick research.. this could be my custom taps fault? https://towardsdatascience.com/data-stacks-for-fun-nonprofit-part-iii-dcfd46da9f9f basically other taps setup some delimeter info in the confi like so
Copy code
version: 1
send_anonymous_usage_stats: true
project_id: xxx
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: original
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    capabilities:
    - catalog
    - discover
    - state
    config:
      tables:   
      - path: <https://dataverse.harvard.edu/api/access/datafile/>
        name: mit__house_elections
        pattern: "4202836"
        start_date: '2010-01-01T00:00:00Z'
        key_properties: []
        format: csv
        delimiter: "\t"
perhaps I too need to learn how to setup my tap for pushing into meltanos variant?
switching now to try transferwise variant
okay so.. with transferwise I get the strings to show up
but wow I have loads to learn about postgres now
because to debug this.. AND apache superset.. to get everything running smoothly I need to understand what schemas are for
got it, just add
SELECT * FROM schema_name.table_name
and it's working now
I'm curious to learn why the meltano variant did what it did but..
for now go with the transferwise variant
and test with the larger query..
about 83681 different unique combos of A-Z .. from length 1-5 .. but I think now i need to work on cleaning up the query to be more efficient
Update: So far Postgresql targets are doing this datamill-co = works until you start to increase the amount of queries.. maybe I need to increase some buffer or resource but.. the cursor collapses at some point and crashes the job meltano = works for small jobs but the string data appears to get varchar type in postgres... and the data just appears as
NULL
transferwise = still running the big big job (takes about 12 hours in bare bones python script) .. happy to say the string data and all gets populated and I currently have the job running at 9 hours 20 minutes
im a noob at postgres in general so.. lots of this is just me learning how it works and basics.. but transferwise looks to be the winner so far
a
Sorry, just getting caught up on this. Glad you got it sorted out! That's interesting feedback about the variants...I'll pass that on to the team. I'll look into the varchar/NULL issue for the Meltano variant.
e
@jo_pearson seems like a similar issue to the one we're facing