jacob_mulligan
05/01/2023, 3:40 PMtap-spreadsheets-anywhere
tap. I'm pulling json files from S3, each file contains 1 JSON object. From what I can tell the tap requires the json files contains an array.
Is there any way to configure the tap to work with JSON docs that contain just 1 object that's not wrapped in [...]
?pat_nadolny
05/01/2023, 4:05 PMpat_nadolny
05/01/2023, 4:06 PMMatt Menzenski
05/01/2023, 4:07 PMMatt Menzenski
05/01/2023, 4:08 PMpat_nadolny
05/01/2023, 4:09 PMjson
?Matt Menzenski
05/01/2023, 4:12 PMjacob_mulligan
05/01/2023, 5:31 PMUnable to write Catalog entry for 'ccda_documents' - it will be skipped due to error Expecting property name enclosed in double quotes: line 2 column 1 (char 2)
Matt Menzenski
05/01/2023, 5:32 PMjacob_mulligan
05/01/2023, 5:32 PMERROR Unable to write Catalog entry for 'ccda_documents' - it will be skipped due to error 'str' object has no attribute 'items'
Matt Menzenski
05/01/2023, 5:33 PMjacob_mulligan
05/01/2023, 5:33 PMMatt Menzenski
05/01/2023, 5:33 PMThis error matches my experience trying to use JSON type FWIWCopy codeERROR Unable to write Catalog entry for 'ccda_documents' - it will be skipped due to error 'str' object has no attribute 'items'
jacob_mulligan
05/01/2023, 5:34 PMMatt Menzenski
05/01/2023, 5:34 PMMatt Menzenski
05/01/2023, 5:35 PMpattern
match any JSON files that do have a containing array / more than one item?jacob_mulligan
05/01/2023, 5:38 PMjacob_mulligan
05/01/2023, 5:39 PM{
"+p_xml": "version=\"1.0\" ",
"ClinicalDocument": {
"+@xmlns": "urn:hl7-org:v3",
"realmCode": {
"+@code": "US"
},
"typeId": {
"+@root": "2.16.840.1.abcdefg",
"+@extension": "POCD_HD000040"
},
"templateId": [
{
"+@root": "2.16.840.abcdefg"
},
{
"+@root": "2.16.840.1.abcdefg"
}
],
"id": {
"+@root": "dce8b808-c812-11ed-redacted"
},
"code": {
"+@code": "34133-9",
"+@codeSystem": "2.16.840.1.113883.6.1",
"+@codeSystemName": "LN",
"+@displayName": "Summarization of Episode Note"
},
}
The +
represent XML properties. I wonder if the leading +
causes any problems? Also, these files are ~10-20k lines longMatt Menzenski
05/01/2023, 5:41 PMjacob_mulligan
05/01/2023, 5:42 PMjacob_mulligan
05/01/2023, 5:44 PMMatt Menzenski
05/01/2023, 5:45 PMMatt Menzenski
05/01/2023, 5:45 PMMatt Menzenski
05/01/2023, 5:46 PMjacob_mulligan
05/01/2023, 5:48 PMjacob_mulligan
05/01/2023, 5:50 PMjacob_mulligan
05/01/2023, 5:50 PMMatt Menzenski
05/01/2023, 5:50 PMjsonl
support, if that’s useful - you might start by adding test cases that have newlines in the JSON objects and seeing what error is thrown. https://github.com/ets/tap-spreadsheets-anywhere/pull/28/filesjacob_mulligan
05/01/2023, 5:51 PMjacob_mulligan
05/01/2023, 5:51 PMMatt Menzenski
05/01/2023, 5:51 PMfor obj in root_iterator:
is essentially equal to for line in file:
I believepat_nadolny
05/01/2023, 5:55 PM{"user_id": 1, "first_name": "John", "last_name": "Doe"}
{"user_id": 2, "first_name": "Sarah", "last_name": "Smith"}
{"user_id": 3, "first_name": "Joe", "last_name": "Momma"}
{"user_id": 4, "first_name": "Steve", "last_name": "Madden"}
with my meltano.yml
- name: tap-spreadsheets-anywhere
variant: ets
pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
config:
tables:
- path: s3://<bucket>
format: jsonl
key_properties: [user_id]
name: user_names
start_date: '2020-01-01T00:00:00Z'
pattern: "spreadsheets_test/user_names\\.json"
or as
- name: tap-spreadsheets-anywhere
variant: ets
pip_url: git+<https://github.com/ets/tap-spreadsheets-anywhere.git>
config:
tables:
- path: s3://<bucket>
format: json
key_properties: [user_id]
name: user_names
start_date: '2020-01-01T00:00:00Z'
pattern: "spreadsheets_test/user_names\\.json"
Matt Menzenski
05/01/2023, 5:56 PMpat_nadolny
05/01/2023, 5:56 PMpat_nadolny
05/01/2023, 5:57 PMjson
as expected also
[
{"user_id": 1, "first_name": "John", "last_name": "Doe"},
{"user_id": 2, "first_name": "Sarah", "last_name": "Smith"},
{"user_id": 3, "first_name": "Joe", "last_name": "Momma"},
{"user_id": 4, "first_name": "Steve", "last_name": "Madden"}
]
Matt Menzenski
05/01/2023, 5:57 PMMatt Menzenski
05/01/2023, 5:58 PMJSON
will errorpat_nadolny
05/01/2023, 5:59 PMjacob_mulligan
05/02/2023, 3:27 PM[..]
to create a list of 1 which we already know the package handles todayjacob_mulligan
05/02/2023, 3:28 PMjacob_mulligan
05/02/2023, 3:28 PM