Hi everyone! :smile: I need to access an endpoint...
# troubleshooting
f
Hi everyone! 😄 I need to access an endpoint from a Databricks API which, when accessing, is downloaded some cost logs in csv file. Is there a specific configuration for me to make this download work in my custom tap? How should I specify my schema? Is there any way I can transform this data to json? I tried with th.PropertiesList as follows:
Copy code
schema = th.PropertiesList(
        th.Property("workspaceId", th.StringType),
        th.Property("timestamp", th.StringType),
        th.Property("clusterId", th.StringType),
        th.Property("clusterName", th.StringType),
        th.Property("clusterNodeType", th.StringType),
        th.Property("clusterOwnerUserId", th.StringType),
        th.Property("clusterCustomTags", th.StringType),
        th.Property("sku", th.StringType),
        th.Property("dbus", th.StringType),
        th.Property("machineHours", th.StringType),
        th.Property("clusterOwnerUserName", th.StringType),
        th.Property("tags", th.StringType)
    ).to_dict()
But I end up getting the following error:
Copy code
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here is the link to the enpoint: https://docs.databricks.com/api/account/billableusage/download
e
Hi Felipe! You probably need to override parse_response in the stream class for that endpoint with something like:
Copy code
import csv

class BillableUsage(DatabricksStream):
  def parse_response(self, response):
    csv_lines = response.text.splitlines()
    csv_reader = csv.DictReader(csv_lines)
    yield from csv_reader
f
Hi, @edgar_ramirez_mondragon! Thanks for your support. You are right. I did that modification and this time a lot of data is going to s3 (because im using target-s3). But at the end the same error happens. Any idea?
e
What's the command you're using to load data?
f
meltano run tap-databricksbilling target-s3
Just above the error message, I have this: [info ] 2023-07-21 141801,912 Target 'target-s3' completed reading 1492 lines of input (1491 records, (0 batch manifests, 0 state messages).
e
Ok. So the error implies there's a line with invalid JSON somewhere. Without more context, I'd try to push the tap data to an intermediate location:
Copy code
meltano invoke tap-databricksbilling > databricks.singer.jsonl
And inspect the file contents looking for a line that's not valid JSON (maybe you're `print`ing somewhere in your tap?
f
Im getting the same error when executing this command 😐 But the file was generated and aparently i do not have any invalid line
u
Hmm ok so the error is in the tap, how about
Copy code
meltano invoke tap-databricksbilling --discover > catalog.json
f
This command works without errors. How can i use that? I tried to use this schema in my stream but i get:
Copy code
2023-07-21T17:42:53.279677Z [info     ]     properties_dict = self.schema["properties"] cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-07-21T17:42:53.279780Z [info     ] KeyError: 'properties'         cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
u
It's not really that you should use that file, rather we were just confirming that error occurred during sync mode rather than discovery. Does your tap have other streams apart from the billable usage one?
f
Hm, ok. No, only that stream. On discovery, despite the error, the file was generated and aparently i do not have any invalid line.
e
Ok so perhaps you can inspect
.meltano/run/tap-databricksbilling/tap.properties.json
to see if it's valid JSON
f
Yes, it appears to be correct...
The problem was the pagination. I fixed it and now its working. Thanks for your help.