Hi everyone smile I need to access an endpoint from a Databr Meltano #troubleshooting

Hi everyone! :smile: I need to access an endpoint...

felipe_souto_campelo

07/21/2023, 4:11 PM

Hi everyone! 😄 I need to access an endpoint from a Databricks API which, when accessing, is downloaded some cost logs in csv file. Is there a specific configuration for me to make this download work in my custom tap? How should I specify my schema? Is there any way I can transform this data to json? I tried with th.PropertiesList as follows:

Copy code

schema = th.PropertiesList(
        th.Property("workspaceId", th.StringType),
        th.Property("timestamp", th.StringType),
        th.Property("clusterId", th.StringType),
        th.Property("clusterName", th.StringType),
        th.Property("clusterNodeType", th.StringType),
        th.Property("clusterOwnerUserId", th.StringType),
        th.Property("clusterCustomTags", th.StringType),
        th.Property("sku", th.StringType),
        th.Property("dbus", th.StringType),
        th.Property("machineHours", th.StringType),
        th.Property("clusterOwnerUserName", th.StringType),
        th.Property("tags", th.StringType)
    ).to_dict()

But I end up getting the following error:

Copy code

requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Here is the link to the enpoint: https://docs.databricks.com/api/account/billableusage/download

edgar_ramirez_mondragon

07/21/2023, 4:36 PM

Hi Felipe! You probably need to override parse_response in the stream class for that endpoint with something like:

Copy code

import csv

class BillableUsage(DatabricksStream):
  def parse_response(self, response):
    csv_lines = response.text.splitlines()
    csv_reader = csv.DictReader(csv_lines)
    yield from csv_reader

felipe_souto_campelo

07/21/2023, 4:47 PM

Hi, @edgar_ramirez_mondragon! Thanks for your support. You are right. I did that modification and this time a lot of data is going to s3 (because im using target-s3). But at the end the same error happens. Any idea?

edgar_ramirez_mondragon

07/21/2023, 5:17 PM

What's the command you're using to load data?

felipe_souto_campelo

07/21/2023, 5:18 PM

meltano run tap-databricksbilling target-s3

felipe_souto_campelo

07/21/2023, 5:19 PM

Just above the error message, I have this: [info ] 2023-07-21 141801,912 Target 'target-s3' completed reading 1492 lines of input (1491 records, (0 batch manifests, 0 state messages).

edgar_ramirez_mondragon

07/21/2023, 5:22 PM

Ok. So the error implies there's a line with invalid JSON somewhere. Without more context, I'd try to push the tap data to an intermediate location:

Copy code

meltano invoke tap-databricksbilling > databricks.singer.jsonl

And inspect the file contents looking for a line that's not valid JSON (maybe you're `print`ing somewhere in your tap?

felipe_souto_campelo

07/21/2023, 5:30 PM

Im getting the same error when executing this command 😐 But the file was generated and aparently i do not have any invalid line

user

07/21/2023, 5:31 PM

Hmm ok so the error is in the tap, how about

Copy code

meltano invoke tap-databricksbilling --discover > catalog.json

felipe_souto_campelo

07/21/2023, 5:45 PM

This command works without errors. How can i use that? I tried to use this schema in my stream but i get:

Copy code

2023-07-21T17:42:53.279677Z [info     ]     properties_dict = self.schema["properties"] cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3
2023-07-21T17:42:53.279780Z [info     ] KeyError: 'properties'         cmd_type=elb consumer=True name=target-s3 producer=False stdio=stderr string_id=target-s3

user

07/21/2023, 5:47 PM

It's not really that you should use that file, rather we were just confirming that error occurred during sync mode rather than discovery. Does your tap have other streams apart from the billable usage one?

felipe_souto_campelo

07/21/2023, 5:49 PM

Hm, ok. No, only that stream. On discovery, despite the error, the file was generated and aparently i do not have any invalid line.

edgar_ramirez_mondragon

07/21/2023, 8:44 PM

Ok so perhaps you can inspect

.meltano/run/tap-databricksbilling/tap.properties.json

to see if it's valid JSON

felipe_souto_campelo

07/21/2023, 10:06 PM

Yes, it appears to be correct...

felipe_souto_campelo

07/24/2023, 12:14 PM

The problem was the pagination. I fixed it and now its working. Thanks for your help.

Open in Slack

Previous Next