david_tout
06/18/2021, 10:30 AM"-1 is less than the minimum of 0\n\nFailed validating 'minimum' in schema['properties']['properties']['additionalProperties']['properties']['maxLength']:\n {'minimum': 0, 'type': 'integer'}\n\nOn instance['properties']['project_desc']['maxLength']:\n -1")
Here is a snippet from my catalog.json
file
{
"streams": [
{
"selected": true,
"tap_stream_id": "dbo-vwProjectRepl",
"table_name": "vwProjectRepl",
"schema": {
"properties": {
...
"project_desc": {
"inclusion": "available",
"maxLength": "8000",
"type": [
"null",
"string"
]
}
...
},
"type": "object"
},
"stream": "vwProjectRepl",
"metadata": [
{
"breadcrumb": [],
"metadata": {
"selected": true,
"replication-method": "INCREMENTAL",
"replication-key": "updated_at",
"selected-by-default": false,
"database-name": "xxx",
"is-view": true,
"table-key-properties": []
}
},
...
{
"breadcrumb": [
"properties",
"project_desc"
],
"metadata": {
"selected-by-default": true,
"sql-datatype": "nvarchar"
}
},
...
]
}
]
}
Why is it not respecting my settings?visch
06/18/2021, 12:26 PMvisch
06/18/2021, 12:28 PMSCHEMA: Blah
Record: Blah
cat testdata | meltano invoke target
Becomes a lot easier imo to debug. Haven't been clever enough to think about debugging Meltano could make easier but there's lots of options I thinkvisch
06/18/2021, 12:29 PMdavid_tout
06/20/2021, 1:54 AMtarget-postgres
It looks to me like the error is telling me that the value of maxLength
violates a rule that says it cannot be lower than 0, and that at runtime it is seeing maxLength: -1
, even though I have my catalogue configured to set it to 8000. Certainly possible that my catalog is not configured correctly...
The source filed is nvarchar(max)
. When I ran meltano invoke tap-mssql --discover > catalog.json
that particular field did originally have maxLength: -1
(which kind of makes sense, as nvarchar(max) doesn't really have a practical upper limit)
I am quite confident that it is using my catalogue file as the source database has plenty of tables, but I am only selecting three, and Meltano is respecting that selection. It is only importing the three tables that I expect.david_tout
06/20/2021, 3:44 AMtarget-postgres
and not what is being read from `tap-mssql`:
target-postgres | CRITICAL
`schema` is an invalid JSON Schema instance:
{
"type": "SCHEMA",
"stream": "xxx-vwProjectRepl",
"schema": {
"properties": {
...
"project_desc": {
"inclusion": "available",
"maxLength": -1,
"type": [
"null",
"string"
]
},
...
},
"type": "object"
},
"key_properties": []
}
-1 is less than the minimum of 0
Failed validating 'minimum' in schema['properties']['properties']['additionalProperties']['properties']['maxLength']:
{
"minimum": 0,
"type": "integer"
}
On instance['properties']['project_desc']['maxLength']: -1
I am not sure how to influence this schema definition. I've tried setting in a catalog file, and with extras in meltano.yml
and relying on the discovered catalogue instead:
e.g.: meltano.yml
extractors:
- name: tap-mssql
namespace: tap_mssql
pip_url: -e extract/pipelinewise-tap-mssql
executable: tap-mssql
capabilities:
- catalog
- discover
- state
settings:
- name: host
- name: port
kind: integer
- name: user
- name: password
kind: password
config:
host: <http://xxx.database.windows.net|xxx.database.windows.net>
port: 1433
user: xxx@xxx
password: $TAP_MSSQL_PASSWORD
database: xxx
select:
- xxx-vwProjectRepl.*
schema:
xxx-vwProjectRepl:
project_desc:
type:
- 'null'
- string
maxLength: 8000
metadata:
xxx-vwProjectRepl:
replication-method: INCREMENTAL
replication-key: updated_at
updated_at:
is-replication-key: true
loaders:
- name: target-postgres
variant: datamill-co
pip_url: singer-target-postgres
config:
logging_level: DEBUG
disable_collection: true
postgres_host: xxx
postgres_database: postgres
postgres_username: postgres
postgres_schema: $MELTANO_EXTRACT__LOAD_SCHEMA
invalid_records_detect: false
persist_empty_tables: true
david_tout
06/20/2021, 3:47 AMmetadata
and schema
are not used if specifying a catalog file. I have tried it both ways - no luck with either method)david_tout
06/20/2021, 6:04 AMdavid_tout
06/20/2021, 6:05 AMvisch
06/20/2021, 10:47 AMvisch
06/20/2021, 10:48 AMvisch
06/20/2021, 10:51 AMvisch
06/20/2021, 10:54 AMdavid_tout
06/20/2021, 11:06 PMnvarchar(max)
column
2. a `real`column
This particular tap leverages the system catalog for determining the data types and it made two errors.
1. nvarchar(max)
columns have a "`character_maximum_length`" attribute of -1
(i.e. no limit). The tap takes this value verbatim and uses it to set the maxLength
attribute in the catalog. However the JSON Schema validation that was being applied by Singer says that this attribute must be a positive integer. My work around was to detect the -1 for string types and just not set the maximum. Therefore the tap now provides the string as is with maxLength
no validation for nvarchar(max)
and varchar(max)
columns
2. real
was incorrectly categorised as an integer
type. I moved it into the list of FLOAT_TYPES
3. Even after treating real
as a float type, there was another issue. A real
is a single
, but the catalogue converts them to number
in the JSON Schema which was resulting in a double
when the schema is created at the Postgres end. The tap was also misconfigured to use the numeric_scale
attribute, or in the case of a NULL
set the number of decimal places (multipleOf
) to 6
. I had some lat/long values in decimal degrees and with all the conversion malarky they ended up with much more than 6 decimal places. I used the same workaround. If numeric_scale
is null
then omit the multipleOf
attribute.david_tout
06/20/2021, 11:07 PMvisch
06/21/2021, 1:10 PMvisch
06/21/2021, 1:12 PMvisch
06/21/2021, 1:12 PM