Hey guys, I'm having a hard time creating a tap th...
# getting-started
k
Hey guys, I'm having a hard time creating a tap that pulls the data from an API and loads it onto a postgres DB. Can anyone help?
r
Can you be more specific?
k
Hey @Reuben (Matatika), I'm trying to create a custom extractor to fetch data from the spacex api and eventually load it onto postgres using the target-postgres loader
r
I meant about the custom tap you are creating. What errors are you running into?
k
@Reuben (Matatika) I am new to Meltano so please excuse me if I miss out on something. 1. I have deployed a custom extractor using the cookie cutter and set it up for REST Api. 2. I went ahead and changed the API URL in the url_base function to point to the SpaceX Api (https://api.spacexdata.com/v5/launches) 3. I configured a target-postgres loader just gave it the username, db, password and port. I can recreate the error and share it with you but do you think I am headed in the right direction or if you could give me the step-wise approach that I should be following instead, that would mean the world.
r
Doesn't sound like you have done anything wrong so far, but you will need to define streams to pull data from the specific SpaceX API endpoints you are interested in (make sure you initialise them as well).
From the sounds of it, you've set a base URL but haven't defined a JSONPath expression to collect "launch" records from the JSON response body, or perhaps you need to define authentication to access the API?
The general pattern I suggest following would be to have a top-level
SpaceXStream
class inheriting from
RESTStream
, and then stream classes inheriting from
SpaceXStream
for specific data (usually targeting one endpoint). That way, you can define logic common across the SpaceX API in the client stream class (e.g. base URL, authentication, pagination), and then define and/or override logic as necessary in each data stream. Incomplete example:
client.py
Copy code
class SpaceXStream(RESTStream):
    url_base = "<https://api.spacexdata.com/v5>"
streams.py
Copy code
class LaunchStream(SpaceXStream):
    path = "/launches"
k
Reuben - thank you so much. Do you think you can spare 5 minutes of your time to quickly get on a call? I just need to get a little bit of context and I should be good thereon.
If not, I just wanted to know there are two streams prebuilt, GroupStream and UserStream, can you help me understand the significance of both? I would still request, if you could have a quick call with me that would be lovely.
r
Sure, where do you want to call?
k
Google Meet?
@Reuben (Matatika) have you ever faced this issue? I have the flight_number column in my schema json (https://github.com/r-spacex/SpaceX-API/blob/master/docs/launches/v5/schema.md) but for some reason it wouldn't pick it up. I have tried with multiple column names but neither of them works :/
r
How is the schema defined in the tap?
k
I created a json file, placed it in the schemas folder and defined it in the streams.py file.
Here
I converted the JSON file into python and used that instead, it's working fine this way
r
If you just copied the JSON from that Markdown file, that doesn't constitute a valid JSON schema by itself (which is probably why it didn't work).
k
I did try adding the properties tag at the start, but that didn't work either so I had to convert it back into Python and make it work like that
Reuben - just wondering. Have you ever worked with dbt-postgres transformer?
r
Yeah I have.
k
I am struggling with that now 🤦‍♂️ So I tried adding the transformer dbt-postgres and configured it but when I run the compile, it give me this error I have already setup the environment to test and all
r
dbt-postgres
brings in a
profiles.yml
that contains dev, staging and prod profiles (i.e. no
test
profile). You would need to add a
test
profile there (or just change
dev
to
test
), or update your
meltano.yml
to define one/all of
dev
,
staging
and
prod
environments (at the moment, it only defines
test
) and then set one as the
default_environment
(or run with
meltano --environment <name>
as a one-time override).
k
I had already made the changes in the profile.yml and just ran the
meltano --environment=test
Still the same issue 😕
I tried running this as well
meltano invoke dbt-postgres:compile --target=dev
but for some reason it just wouldn't pick up
r
What does your
profiles.yml
look like?
k
message has been deleted
r
Did you add an entry for
test
under
outputs
?
k
Rookie mistake, yeah changed the dev to test and I had added a target: test in the profile.yml, removed that. It helped
Reuben a quick question please - the basic configs for DBT are already done. Do you have a template or a starter for DBT. From what I understand, I only need to define the models. But I don't know where to start
r
No sorry, I really only have an understanding of the basic concepts and how it works with Meltano - not writing models. I suggest you check out the dbt docs for a starting point. 🙂
k
I am on it Reuben - thank you so much