jobert_abma
08/17/2021, 12:22 AM--discover
I see the properties of the schema. However, when I run the tap directly, I’m only getting empty records. The schema in the tap output looks like this (please ignore the xmlStream
name, I’ll rename this later):
{
"type": "SCHEMA",
"stream": "xmlStream",
"schema": {
"properties": {},
"type": "object"
},
"key_properties": [
"@ID"
]
}
Here’s the skeleton of the stream:
class xmlStream(Stream):
name = "xmlStream"
primary_keys = ["@ID"]
schema = th.PropertiesList(
th.Property("@ID", th.StringType),
th.Property("@Name", th.StringType),
).to_dict()
def get_records(self, context: Optional[dict]) -> Iterable[dict]:
print(self.schema)
data = open("/Users/jobert/Downloads/cwec_v4.5.xml", "r").read()
cwes = xmltodict.parse(data, process_namespaces=False)['Weakness_Catalog']['Weaknesses']['Weakness']
for cwe in cwes:
yield cwe
Anyone has a clue why I’m not seeing the properties in the schema output? Here’s what the records look like:
{"type": "RECORD", "stream": "xmlStream", "record": {}, "time_extracted": "2021-08-17T00:17:20.100213Z"}
{"type": "RECORD", "stream": "xmlStream", "record": {}, "time_extracted": "2021-08-17T00:17:20.100676Z"}
{"type": "RECORD", "stream": "xmlStream", "record": {}, "time_extracted": "2021-08-17T00:17:20.100906Z"}
{"type": "RECORD", "stream": "xmlStream", "record": {}, "time_extracted": "2021-08-17T00:17:20.101292Z"}