The spreadsheets anywhere tap references a string schema ove Meltano #getting-started

The spreadsheets-anywhere tap references a string ...

greg_vaslo

04/30/2022, 8:37 PM

The spreadsheets-anywhere tap references a string schema override in the link below. I get what the point is, but is there anymore documentation or examples of how it's used? I'm not able to set it up just using the directions on the page. I'm using the one that spreadsheets-anywhere had already built in but it sounds like it would be really good to learn the built-in Meltano version so it could be used anywhere. Thanks! https://meltano.com/blog/now-available-meltano-v1-41-1/#What_else_is_new

edgar_ramirez_mondragon

05/01/2022, 4:48 AM

hi @greg_vaslo. what does your

meltano.yml

look like?

greg_vaslo

05/04/2022, 2:26 AM

Here's what I have for the tap - the piece that I commented out works but even the author of the tap says to try and use the meltano extra

greg_vaslo

05/04/2022, 2:26 AM

Copy code

version: 1
send_anonymous_usage_stats: true
project_id: 774e6b3c-feb6-48f8-b6f4-dcea9ccf3fc7
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
          # you will need to add a forced data type for text columns that appear to be numbers since SA forces it to a number and breaks target postgres
          # schema_overrides:
          #   acct_number:
          #     type: [string]
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: string

greg_vaslo

05/04/2022, 2:28 AM

Not quite sure what a tap_stream_id is even after reading, probably because I'm really new to this, i tried a generic name, I tried the tap name there and I tried the name of the table as well. Any thought as to why it won't force "acct_number" to a text string?

edgar_ramirez_mondragon

05/04/2022, 3:23 PM

Any thought as to why it won't force "acct_number" to a text string?

well you're not currently overriding the schema for that field. What you're doing with

created_at

is the way to go. You can check what the produced catalog will look like with

meltano invoke --dump=catalog tap-spreadsheets-anywhere

. Those schema overrides should be applied there.

greg_vaslo

05/04/2022, 6:18 PM

Sorry @edgar_ramirez_mondragon , are you saying I should dump the stream id with “accounts”? What I am showing here is drumming up an error. I'll run the command you suggest though I'm still new to how the yaml/json is parsed

greg_vaslo

05/04/2022, 6:30 PM

Actually sorry it is not working for some reason, I guess I didn't save before I went back and made the change

greg_vaslo

05/04/2022, 6:30 PM

Let me look at what the error is

greg_vaslo

05/04/2022, 6:37 PM

yeah its still trying to say that the acct_number is an integer despite reviving the schema change:

greg_vaslo

05/04/2022, 6:37 PM

Copy code

{
      "tap_stream_id": "accounts",
      "key_properties": [
        "acct_number"
      ],
      "schema": {
        "properties": {
          "acct_number": {
            "type": [
              "null",
              "integer"
            ]

greg_vaslo

05/04/2022, 6:37 PM

and i see this below:

greg_vaslo

05/04/2022, 6:38 PM

Copy code

"created_at": {
            "type": [
              "string",
              "null"
            ],
            "format": "string"
          }

edgar_ramirez_mondragon

05/04/2022, 7:11 PM

@greg_vaslo the

meltano.yml

above is missing the schema override for

acct_number

I think:

Copy code

extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: date-time
        acct_number:
          type: ["integer", "null"]

greg_vaslo

05/04/2022, 7:54 PM

Thanks - this did the trick

greg_vaslo

05/04/2022, 7:54 PM

Copy code

plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts*.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
          # # you will need to add a forced data type for text columns that appear to be numbers since SA forces it to a number and breaks target postgres
          # schema_overrides:
          #   acct_number:
          #     type: [st ring]
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: string
        acct_number:
          type: ["string","null"]

edgar_ramirez_mondragon

05/04/2022, 7:55 PM

Nice!

greg_vaslo

05/04/2022, 7:55 PM

Can you tell me - what is the "Created at" referring to? so like, why do you need both - I get now why you would need the acct_number

edgar_ramirez_mondragon

05/04/2022, 7:57 PM

oh I see you copied

created_at

from the docs, I thought you'd added that yourself, lol. It's just an example, i.e. if the table had a

created_at

field whose schema you wanted to override. You can remove it from your

meltano.yml

greg_vaslo

05/04/2022, 7:58 PM

😂

greg_vaslo

05/04/2022, 7:58 PM

Ok let me clean that up, thank you again

edgar_ramirez_mondragon

05/04/2022, 7:59 PM

np!

Open in Slack

Previous Next