The spreadsheets-anywhere tap references a string ...
# getting-started
g
The spreadsheets-anywhere tap references a string schema override in the link below. I get what the point is, but is there anymore documentation or examples of how it's used? I'm not able to set it up just using the directions on the page. I'm using the one that spreadsheets-anywhere had already built in but it sounds like it would be really good to learn the built-in Meltano version so it could be used anywhere. Thanks! https://meltano.com/blog/now-available-meltano-v1-41-1/#What_else_is_new
e
hi @greg_vaslo. what does your
meltano.yml
look like?
g
Here's what I have for the tap - the piece that I commented out works but even the author of the tap says to try and use the meltano extra
Copy code
version: 1
send_anonymous_usage_stats: true
project_id: 774e6b3c-feb6-48f8-b6f4-dcea9ccf3fc7
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
          # you will need to add a forced data type for text columns that appear to be numbers since SA forces it to a number and breaks target postgres
          # schema_overrides:
          #   acct_number:
          #     type: [string]
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: string
Not quite sure what a tap_stream_id is even after reading, probably because I'm really new to this, i tried a generic name, I tried the tap name there and I tried the name of the table as well. Any thought as to why it won't force "acct_number" to a text string?
e
Any thought as to why it won't force "acct_number" to a text string?
well you're not currently overriding the schema for that field. What you're doing with
created_at
is the way to go. You can check what the produced catalog will look like with
meltano invoke --dump=catalog tap-spreadsheets-anywhere
. Those schema overrides should be applied there.
g
Sorry @edgar_ramirez_mondragon , are you saying I should dump the stream id with “accounts”? What I am showing here is drumming up an error. I'll run the command you suggest though I'm still new to how the yaml/json is parsed
Actually sorry it is not working for some reason, I guess I didn't save before I went back and made the change
Let me look at what the error is
yeah its still trying to say that the acct_number is an integer despite reviving the schema change:
Copy code
{
      "tap_stream_id": "accounts",
      "key_properties": [
        "acct_number"
      ],
      "schema": {
        "properties": {
          "acct_number": {
            "type": [
              "null",
              "integer"
            ]
and i see this below:
Copy code
"created_at": {
            "type": [
              "string",
              "null"
            ],
            "format": "string"
          }
e
@greg_vaslo the
meltano.yml
above is missing the schema override for
acct_number
I think:
Copy code
extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: date-time
        acct_number:
          type: ["integer", "null"]
g
Thanks - this did the trick
Copy code
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere
    variant: ets
    pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
    config:
      tables:
        - path: file:///mnt/c/Users/gxv383/Documents/postgresql_tables/ledger/extract
          name: accounts
          pattern: "accounts*.csv"
          start_date: "2017-05-01T00:00:00Z"
          key_properties: [acct_number]
          format: csv
          # # you will need to add a forced data type for text columns that appear to be numbers since SA forces it to a number and breaks target postgres
          # schema_overrides:
          #   acct_number:
          #     type: [st ring]
    schema:
      accounts:
        created_at:
          type: ["string", "null"]
          format: string
        acct_number:
          type: ["string","null"]
e
Nice!
g
Can you tell me - what is the "Created at" referring to? so like, why do you need both - I get now why you would need the acct_number
e
oh I see you copied
created_at
from the docs, I thought you'd added that yourself, lol. It's just an example, i.e. if the table had a
created_at
field whose schema you wanted to override. You can remove it from your
meltano.yml
g
😂
Ok let me clean that up, thank you again
e
np!