welp got another issue here, when i try to run an ...
# troubleshooting
j
welp got another issue here, when i try to run an etl with a redshift loader - it tells me that the column doesn't exist
```root@60789494ab28:/projects/taps/tap-pardot# meltano run tap-pardot target-redshift 2023-06-22T231616.639211Z [info ] Environment 'test' is active 2023-06-22T231618.644561Z [info ] time=2023-06-22 231618 name=target_redshift level=INFO message=Getting catalog objects from table cache... cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231618.728635Z [info ] 2023-06-22 231618,728 | INFO | tap-pardot | Beginning full_table sync of 'visitor_activities'... cmd_type=elb consumer=False name=tap-pardot producer=True stdio=stderr string_id=tap-pardot 2023-06-22T231618.729207Z [info ] 2023-06-22 231618,728 | INFO | tap-pardot | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-pardot producer=True stdio=stderr string_id=tap-pardot 2023-06-22T231618.969486Z [info ] time=2023-06-22 231618 name=target_redshift level=INFO message=Table 'pardot5_dev."VISITOR_ACTIVITIES"' does not exist. Creating... cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231618.970318Z [info ] time=2023-06-22 231618 name=target_redshift level=INFO message=(Re)creating pardot5_dev."VISITOR_ACTIVITIES" table... cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.041369Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.041837Z [info ] File "/projects/taps/tap-pardot/.meltano/loaders/target-redshift/venv/bin/target-redshift", line 8, in <module> cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.042204Z [info ] sys.exit(main()) cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.042462Z [info ] File "/projects/taps/tap-pardot/.meltano/loaders/target-redshift/venv/lib/python3.9/site-packages/target_redshift/__init__.py", line 447, in main cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.042701Z [info ] persist_lines(config, singer_messages, table_cache) cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.043081Z [info ] File "/projects/taps/tap-pardot/.meltano/loaders/target-redshift/venv/lib/python3.9/site-packages/target_redshift/__init__.py", line 242, in persist_lines cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.043356Z [info ] stream_to_sync[stream].sync_table() cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.043577Z [info ] File "/projects/taps/tap-pardot/.meltano/loaders/target-redshift/venv/lib/python3.9/site-packages/target_redshift/db_sync.py", line 738, in sync_table cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.043790Z [info ] self.create_table_and_grant_privilege() cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.044000Z [info ] File "/projects/taps/tap-pardot/.meltano/loaders/target-redshift/venv/lib/python3.9/site-packages/target_redshift/db_sync.py", line 718, in create_table_and_grant_privilege cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06-22T231619.044317Z [info ] self.create_table(is_stage=is_stage) cmd_type=elb consumer=True name=target-redshift producer=False stdio=stderr string_id=target-redshift 2023-06…
i should also note that the tap is custom, and i see that it says the tap has a mapper, but not sure what i'm supposed to do to fix this
u
Two things you could try: 1. It seems like the tap stream could be labeling the ID column as a PK but then its not provided in the records. You can run
meltano invoke tap-pardot > output.json
then inspect the records, see the spec docs for details on the records themselves https://hub.meltano.com/singer/spec. Make sure the id property is in the schema and record messages 2. You can also try deselecting that visitor activities stream to see if the rest of your streams work, which would isolate the bad stream. Its hard to tell right now but my gut says this sounds like its probably a tap issue and not a target issue
j
hey @pat_nadolny thanks for the advice, so it looks like it takes everything as 1 record. when i get a response from the api it comes in like this:
Copy code
{
  "nextPageToken": null,
  "nextPageUrl": null,
  "values": [
    {
      "id": 17778405,
      "campaignId": 87445,
      "createdAt": "2022-08-31T22:24:32-07:00",
      "customRedirectId": null,
      "details": "N2E Test Form - R&L",
      "emailId": null,
      "emailTemplateId": null,
      "fileId": null,
      "formHandlerId": null,
      "formId": 2398,
      "landingPageId": null,
      "listEmailId": null,
      "multivariateTestVariationId": null,
      "opportunityId": null,
      "paidSearchAdId": null,
      "prospectId": 31170486,
      "siteSearchQueryId": null,
      "type": 2,
      "typeName": "Form",
      "updatedAt": "2022-08-31T22:25:09-07:00",
      "visitId": null,
      "visitorId": 1500660,
      "visitorPageViewId": null
    },
    {
      "id": 17778621,
      "campaignId": 87445,
      "createdAt": "2022-08-31T22:25:09-07:00",
      "customRedirectId": null,
      "details": "N2E Test Form - R&L",
      "emailId": null,
      "emailTemplateId": null,
      "fileId": null,
      "formHandlerId": null,
      "formId": 2398,
      "landingPageId": null,
      "listEmailId": null,
      "multivariateTestVariationId": null,
      "opportunityId": null,
      "paidSearchAdId": null,
      "prospectId": 31170486,
      "siteSearchQueryId": null,
      "type": 4,
      "typeName": "Form",
      "updatedAt": "2022-08-31T22:25:09-07:00",
      "visitId": null,
      "visitorId": 1500660,
      "visitorPageViewId": null
    },
    {
      "id": 17780790,
      "campaignId": 87445,
      "createdAt": "2022-08-31T22:33:01-07:00",
      "customRedirectId": null,
      "details": "N2E Test Form - Software",
      "emailId": null,
      "emailTemplateId": null,
      "fileId": null,
      "formHandlerId": null,
      "formId": 2395,
      "landingPageId": null,
      "listEmailId": null,
      "multivariateTestVariationId": null,
      "opportunityId": null,
      "paidSearchAdId": null,
      "prospectId": 31170486,
      "siteSearchQueryId": null,
      "type": 2,
      "typeName": "Form",
      "updatedAt": "2022-08-31T22:33:32-07:00",
      "visitId": null,
      "visitorId": 1501005,
      "visitorPageViewId": null
    },
    {
      "id": 17780940,
      "campaignId": 87445,
      "createdAt": "2022-08-31T22:33:32-07:00",
      "customRedirectId": null,
      "details": "N2E Test Form - Software",
      "emailId": null,
      "emailTemplateId": null,
      "fileId": null,
      "formHandlerId": null,
      "formId": 2395,
      "landingPageId": null,
      "listEmailId": null,
      "multivariateTestVariationId": null,
      "opportunityId": null,
      "paidSearchAdId": null,
      "prospectId": 31170486,
      "siteSearchQueryId": null,
      "type": 4,
      "typeName": "Form",
      "updatedAt": "2022-08-31T22:33:32-07:00",
      "visitId": null,
      "visitorId": 1501005,
      "visitorPageViewId": null
    }
  ]
}
but i'm trying to get it so that each value is it's own record...otherwise i'm always going to just get 1 record...
u
You said this was a custom tap right? Is the code public? You need to implement the logic to parse that response and return the contents properly
p
Also if youre developing a new tap I'd recommend testing it either without a target by inspecting the output or using something very similar like target-jsonl to avoid convoluting tap and target issues
j
yep im just working on the tap only i used
meltano invoke my-tap > restults.txt
and in the tap i defined the schema to be exactly what i get from the api:
Copy code
{
    "$schema": "<http://json-schema.org/draft-07/schema#>",
    "type": "object",
    "properties": {
      "nextPageToken": {
        "type": ["null", "string"],
        "description": "Token for retrieving the next page of results."
      },
      "nextPageUrl": {
        "type": ["null", "string"],
        "description": "URL for retrieving the next page of results."
      },
      "values": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "campaignId": {
                "type": "integer",
                "description": "Account Engagement ID for the associated campaign."
              },
              "campaign": {
                "type": "object",
                "description": "Object representing the campaign related to this object. See documentation for Campaign for fields."
              },

              etc.
              etc.
am i supposed bring it in like this, and then transform it to only output the
values
key? i'm assuming that i would need the first two for any pagination, so kinda confused about that. but mostly, trying to understand if the schema i use on the tap is being used by the target? meaning, it validates the the source data, and then has to be used by the target as well to create the tables in the target db?
Copy code
Inline Data Mapping
Meltano gives you the flexibility to alter data passing through your EL pipelines to do precisely what you need for your use case. Although commonly users want to replicate their data in the most raw format, using ELT vs ETL, there are many use cases that require minor alterations to data on the fly. This is where mappers, also referred to as inline stream maps, shine!
<https://docs.meltano.com/guide/mappers>
i assumed im supposed to keep it as is because it says users replicate it in the most raw format, and then meltano would be able to figure out the values. but now im thinking that's not the case and i should parse it as you say to remove the values? but if i do that then do i create a new schema for the target somehow?
u
yeah the schema is the schema for the individual record, not the whole api response. So your schema ends up just being
campaignId
,
campaign
, etc. It might be helpful for you to check out a couple examples from https://github.com/MeltanoLabs, I think https://github.com/MeltanoLabs/tap-gmail is a relatively simple REST example but check out others too
u
Raw data in this context means that the data "rows" returned from the API is not aggregated or altered. Extracting the data from the API response json wouldnt be considered manipulating the data at all, the raw data is returned as is.
j
ahh gotcha, so really i get the response, and then code into the
parse_response
method whatever i'm trying to extract for each row?
i removed the pagination part from the schema and added
records_jsonpath = "$.values[*]"
and it returned a bunch of records...so i guess i can still extract the pagination stuff, while leaving the schema to not include it right?
u
that sounds like its working! If I understand your question correctly then yes