Has anyone have a complete guide or suggestions on how I can Meltano #troubleshooting

Has anyone have a complete guide or suggestions on...

Kyunghwan Choi

08/06/2024, 6:25 PM

Has anyone have a complete guide or suggestions on how I can set-up AWS Redshift Loader? I am just using

target-redshift

, configured everything and when I run Meltano, it seems to be connecting to redshift properly, as it creates tables, but no data is being loaded. And the run fails with

BrokenPipeError

. When I load into json (using

target-jsonl

, it seems to work fine, so extractor does not seem to be an issue. Thank you.

Edgar Ramírez (Arch.dev)

08/06/2024, 6:50 PM

Have you tried increasing the buffer size?

Kyunghwan Choi

08/06/2024, 7:05 PM

Hello, yes. Increased it to 50MB and still fails eventually. I think the concern is that no data is inserted into redshift, even though it created the tables.

Charles Feduke

08/06/2024, 7:07 PM

I haven’t tried Meltano with Redshift specifically, but have years of experience using Redshift in production. My guess is that target-redshift relies on a S3 bucket to place data prior to loading it into Redshift - does Redshift have read access permissions to S3? It’d make sense that the tables were created since that happens over a regular JDBC connection; the load command is invoked over JDBC as well, but the actual data is usually (at least, in my experience) copied from S3.

Kyunghwan Choi

08/06/2024, 7:11 PM

oh interesting. It have configured it to have read access, but there's no write access. The target did not ask for an arn or any credentials for the one with S3 write access though. I just set • aws_redshift_copy_role_arn • cluster_identifier • dbname • host • password • port • user Wouldn't that mean that it needs to put data from the extractor to s3, then load it somehow? But this loader (https://hub.meltano.com/loaders/target-redshift) does not seem to require something that has s3 write access?

Kyunghwan Choi

08/06/2024, 7:12 PM

or maybe I need to already have aws creds on the system. let me try that too.

Charles Feduke

08/06/2024, 7:18 PM

aws_redshift_copy_role_arn

👍 1

Charles Feduke

08/06/2024, 7:18 PM

that’s an ARN that should allow Redshift to have read access to S3, just my guess, similar to how most other things interface with Redshift

Kyunghwan Choi

08/06/2024, 7:18 PM

it doesn't seem to put anything in s3 bucket after setting creds.

Kyunghwan Choi

08/06/2024, 7:19 PM

Yes, that arn has read access to s3, but not write access.

Charles Feduke

08/06/2024, 7:19 PM

where is Meltano executing? Your local machine, or an EC2 instance?

Kyunghwan Choi

08/06/2024, 7:19 PM

doesn't that mean something should put files in s3 first after reading from the extractor?

Kyunghwan Choi

08/06/2024, 7:19 PM

ec2 instance

Charles Feduke

08/06/2024, 7:19 PM

yeah I would imagine that it will write files to S3

Charles Feduke

08/06/2024, 7:20 PM

can the EC2 instance write files to the s3_bucket/s3_key_prefix?

Charles Feduke

08/06/2024, 7:20 PM

(if its Amazon Linux I think the AWS CLI tools are installed by default, I could be wrong, it’s been a while since I’ve setup new EC2 instances)

Charles Feduke

08/06/2024, 7:22 PM

assuming you are ssh-ing in you can

aws s3 cp /tmp/some-file.txt <s3://bucket/prefix/some-file.txt>

and verify quickly that way

Kyunghwan Choi

08/06/2024, 7:23 PM

yes, that succeeded. I set creds that has admin access.

Charles Feduke

08/06/2024, 7:23 PM

and the credentials are associated with the IAM role associated with the EC2 instance or you’ve set them in the shell with something like

AWS_PROFILE

environment variable?

Kyunghwan Choi

08/06/2024, 7:24 PM

and I see that s3 file in s3 bucket.

Kyunghwan Choi

08/06/2024, 7:24 PM

I set them using

aws configure

Charles Feduke

08/06/2024, 7:24 PM

looks like

target-redshit

has the configuration option

remove_s3_files

is that set to false? Will help debugging this problem.

Charles Feduke

08/06/2024, 7:25 PM

okay - you are running meltano as a command in the same shell/same user where you’ve

aws configure

-d, correct?

Kyunghwan Choi

08/06/2024, 7:26 PM

yes

Charles Feduke

08/06/2024, 7:27 PM

and you’ve verified with another target that your tap is functional so you know you’re reading data

Charles Feduke

08/06/2024, 7:27 PM

do you have any additional log output?

Kyunghwan Choi

08/06/2024, 7:27 PM

yes to

remove_s3_files

. Default is False. I just set it to False as well to make sure.

Kyunghwan Choi

08/06/2024, 7:27 PM

Yes, it outputs data if I use

jsonl

target to json files.

Charles Feduke

08/06/2024, 7:28 PM

certain there are no misspellings in either s3_bucket and s3_key_prefix

Kyunghwan Choi

08/06/2024, 7:29 PM

Copy code

/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/structlog/stdlib.py:1098: UserWarning: Remove `format_exc_info` from your processor chain if you want pretty exceptions.
  ed = p(logger, meth_name, cast(EventDict, ed))
^[[2m2024-08-06T19:24:16.044211Z^[[0m [^[[32m^[[1mdebug    ^[[0m] ^[[1mNeed help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

^[[0m
Traceback (most recent call last):
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/__init__.py", line 103, in _run_cli
    cli(obj={"project": None})
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/cli.py", line 46, in main
    return super().main(*args, windows_expand_args=False, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/utils.py", line 662, in invoke
    super().invoke(ctx)  # type: ignore[misc]
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/utils.py", line 711, in invoke
    super().invoke(ctx)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/params.py", line 74, in decorate
    return func(*args, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/params.py", line 164, in decorate
    func(project, *args, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/utils/__init__.py", line 73, in wrapper
    return asyncio.run(func(*args, **kwargs))
  File "/usr/lib64/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib64/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/run.py", line 191, in run
    raise err
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/run.py", line 188, in run
    await _run_blocks(tracker, parsed_blocks, dry_run=dry_run)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/run.py", line 241, in _run_blocks
    raise bare_err
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/run.py", line 222, in _run_blocks
    await blk.run()
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/block/extract_load.py", line 505, in run
    await self.run_with_job()
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/block/extract_load.py", line 537, in run_with_job
    await self.execute()
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/block/extract_load.py", line 497, in execute
    await manager.run()
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/block/extract_load.py", line 704, in run
    await self._wait_for_process_completion(self.elb.head)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/block/extract_load.py", line 777, in _wait_for_process_completion
    raise output_futures_failed.exception()  # noqa: RSE102
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/logging/utils.py", line 230, in capture_subprocess_output
    if not await _write_line_writer(writer, line):
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/logging/utils.py", line 198, in _write_line_writer
    await writer.wait_closed()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 359, in wait_closed
    await self._protocol._get_close_waiter(self)
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/core/logging/utils.py", line 196, in _write_line_writer
    await writer.drain()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib64/python3.9/asyncio/streams.py", line 197, in _drain_helper
    await waiter
BrokenPipeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/.local/share/pipx/venvs/meltano/lib64/python3.9/site-packages/meltano/cli/__init__.py", line 113, in _run_cli
    raise CliError(f"{troubleshooting_message}\n{err}") from err  # noqa: EM102
meltano.cli.utils.CliError: Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.


Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.

Kyunghwan Choi

08/06/2024, 7:31 PM

I didn't have anything for the prefix as it wasn't required, but let me set it and re-run and see.

Charles Feduke

08/06/2024, 7:31 PM

yeah I don’t imagine it is required since it is just part of the key name prepended to whatever meltano is generating for the rest of the key

Kyunghwan Choi

08/06/2024, 7:32 PM

thank you for your help btw, first day trying to run meltano, but running into some issues. 😞

Kyunghwan Choi

08/06/2024, 7:33 PM

setting prefix doesn't seem to do anything.

Charles Feduke

08/06/2024, 7:35 PM

I’d expect to see files being created on S3 at the very least, assuming the permissions are correct - unfortunately the logs and stack trace above is just an async trace which isn’t really useful. (I mean, I’m here… in this Slack… because of those error messages. Heh.)

Charles Feduke

08/06/2024, 7:36 PM

if we can get a trace output that actually points to something in the target-redshift source code that’s at least something that can be diagnosed

Kyunghwan Choi

08/06/2024, 7:36 PM

I am running the

run

command with debug log level, and it doesn't seem to give an error related to S3.

Charles Feduke

08/06/2024, 7:38 PM

what do you get if you run

free -h

Kyunghwan Choi

08/06/2024, 7:39 PM

image.png

Kyunghwan Choi

08/06/2024, 7:39 PM

I assume it should be enough as I'm testing a small load.

Charles Feduke

08/06/2024, 7:39 PM

could try upping the buffer size 🙂 which Edgar suggested

Kyunghwan Choi

08/06/2024, 7:40 PM

bigger than 50m?

Charles Feduke

08/06/2024, 7:40 PM

I’d expect the error message to be a bit more descriptive though if it was memory constrained, but who knows

Charles Feduke

08/06/2024, 7:40 PM

so 50m is 25/25, I mean that should be fine

Charles Feduke

08/06/2024, 7:40 PM

you were able to write to a different target already so really this should not be the issue

Charles Feduke

08/06/2024, 7:42 PM

ahh

Charles Feduke

08/06/2024, 7:42 PM

hmm

Charles Feduke

08/06/2024, 7:42 PM

where is your state database?

Charles Feduke

08/06/2024, 7:42 PM

are you using sqlite/the default?

Kyunghwan Choi

08/06/2024, 7:42 PM

us-east-1

Kyunghwan Choi

08/06/2024, 7:42 PM

no sqlite

Kyunghwan Choi

08/06/2024, 7:43 PM

what's state database? you mean redshift?

Charles Feduke

08/06/2024, 7:43 PM

because we’re not seeing anything like attempting to retrieve the data from the tap which I believe occurs after reading from state

Charles Feduke

08/06/2024, 7:43 PM

state database is where Meltano stores information like bookmarks from previous invocations

Charles Feduke

08/06/2024, 7:43 PM

by default its SQLite and lives in the .meltano directory

Kyunghwan Choi

08/06/2024, 7:43 PM

I see. I didn't set it separately

Charles Feduke

08/06/2024, 7:44 PM

if you haven’t, then it should still be the default state - something you’ll want to look into changing at some point in the future since EC2 instances are ephemeral - but probably not the issue right now

Kyunghwan Choi

08/06/2024, 7:44 PM

I see

meltano.db

under

.meltano/

directory.

Charles Feduke

08/06/2024, 7:44 PM

yeah that should be the default

Charles Feduke

08/06/2024, 7:45 PM

if you

echo MELTANO_DATABASE_URI

it comes up as an empty string, right?

Charles Feduke

08/06/2024, 7:45 PM

err

Charles Feduke

08/06/2024, 7:45 PM

echo $MELTANO_DATABASE_URI

(don’t paste results in Slack)

Charles Feduke

08/06/2024, 7:46 PM

empty string = using default Meltano state database - non-empty string = tries to connect to that database (or S3) as the state backend

Kyunghwan Choi

08/06/2024, 7:48 PM

yes, empty string

Charles Feduke

08/06/2024, 7:48 PM

alternatively in your meltano.yml make sure you don’t have a

state_backend.uri

defined to some value as well

Charles Feduke

08/06/2024, 7:48 PM

https://docs.meltano.com/reference/settings/#state_backenduri

Kyunghwan Choi

08/06/2024, 7:48 PM

right, no state_backend setting in meltano.yml

Charles Feduke

08/06/2024, 7:49 PM

hmm how are we seeing BROKEN_PIPE before even printing anything from the tap logs

Charles Feduke

08/06/2024, 7:49 PM

that’s the full logs from the run you pasted above right?

Kyunghwan Choi

08/06/2024, 7:49 PM

I have 15M output with debug log enabled.

Kyunghwan Choi

08/06/2024, 7:50 PM

but that may contain some sensitive data.

Kyunghwan Choi

08/06/2024, 7:50 PM

the portion of log I pasted above is the one at the end, when it errors out.

Charles Feduke

08/06/2024, 7:51 PM

yeah don’t paste that, ultimately we’re looking for anything like

The config profile (aws_profile) could not be found

Charles Feduke

08/06/2024, 7:51 PM

you’re seeing the tables get created so connectivity to Redshift is functional, but not seeing the files get created on S3

Kyunghwan Choi

08/06/2024, 7:52 PM

Right, no line like that.

Kyunghwan Choi

08/06/2024, 7:54 PM

I think I can upload this. Line before that is data it probably extracted from the source. After, it seems like it's trying to flush data (it has lines like that before this too), then eventually fails.

meltano_output.txt

Charles Feduke

08/06/2024, 7:55 PM

do you see any lines like ’writing # records to s3://…`?

Charles Feduke

08/06/2024, 7:55 PM

Copy code

self.write_csv(records)
        <http://self.logger.info|self.logger.info>(f'writing {len(records)} records to s3://{self.config["s3_bucket"]}/{self.object}')
        self.copy_to_s3()
        self.copy_to_redshift(table, cursor)

this is from target_redshift’s sinks.py

Kyunghwan Choi

08/06/2024, 7:56 PM

right. no line like that.

Kyunghwan Choi

08/06/2024, 7:56 PM

interesting. so it does try to write to s3 first? weird.

Kyunghwan Choi

08/06/2024, 7:58 PM

Oh, I guess their guide does say s3 bucket is for 'S3 bucket to save staging files before using COPY command'. So.. I should see files in s3 if set correctly.

Charles Feduke

08/06/2024, 7:58 PM

yeah

Charles Feduke

08/06/2024, 7:59 PM

the typical workflow for loading data into Redshift is: 1. write data to S3 2. issue COPY command from connected Redshift client pointing the S3 key where data was written in step 1

Kyunghwan Choi

08/06/2024, 8:00 PM

ah, there's csv file in

temp

folder. I assume that's what it should upload.

Charles Feduke

08/06/2024, 8:00 PM

yeah probably

Charles Feduke

08/06/2024, 8:01 PM

I wonder

Charles Feduke

08/06/2024, 8:01 PM

Copy code

2024-08-06T19:24:15.414824Z [info     ]     raise TypeError(f'Object of type {o.__class__.__name__} ' cmd_type=elb consumer=True job_name=staging:tap-hubspot-to-target-redshift name=target-redshift producer=False run_id=d7ca55e2-84c2-4ad7-b724-211055c09ea6 stdio=stderr string_id=target-redshift
2024-08-06T19:24:15.415020Z [info     ] TypeError: Object of type Decimal is not JSON serializable cmd_type=elb consumer=True job_name=staging:tap-hubspot-to-target-redshift name=target-redshift producer=False run_id=d7ca55e2-84c2-4ad7-b724-211055c09ea6 stdio=stderr string_id=target-redshift

Charles Feduke

08/06/2024, 8:01 PM

does your CSV file appear truncated or does it contain the number of rows you might expect?

Charles Feduke

08/06/2024, 8:02 PM

the log stack trace includes this:

Copy code

2024-08-06T19:24:15.413077Z [info     ]   File "/home/ec2-user/eval/.meltano/loaders/target-redshift/venv/lib/python3.9/site-packages/target_redshift/sinks.py", line 215, in <dictcomp> cmd_type=elb consumer=True job_name=staging:tap-hubspot-to-target-redshift name=target-redshift producer=False run_id=d7ca55e2-84c2-4ad7-b724-211055c09ea6 stdio=stderr string_id=target-redshift

Kyunghwan Choi

08/06/2024, 8:02 PM

hmm.. I can't tell, there's only one file in there. I assume it probably failed somewhere along the way.

Kyunghwan Choi

08/06/2024, 8:02 PM

It should actually have lot more files

Charles Feduke

08/06/2024, 8:03 PM

line 215 in sinks.py is this:

Copy code

{key: (json.dumps(value).replace("None", "") if key in object_keys else value) for key, value in record.items()}

Charles Feduke

08/06/2024, 8:03 PM

so I’m thinking its not completing the CSV dump

Kyunghwan Choi

08/06/2024, 8:06 PM

😞 hmm.. is that file just using boto3, using creds in .aws/credentials file?

Kyunghwan Choi

08/06/2024, 8:07 PM

https://github.com/TicketSwap/target-redshift/blob/main/target_redshift/sinks.py#L35 although s3 is supposed to be region free, so it may not matter.

Charles Feduke

08/06/2024, 8:07 PM

yeah I don’t even think you’re getting to the S3 delivery part

Charles Feduke

08/06/2024, 8:07 PM

it’s dying on line 215 due to a type error of some sort

Charles Feduke

08/06/2024, 8:08 PM

https://github.com/TicketSwap/target-redshift/blob/main/target_redshift/sinks.py#L215

Charles Feduke

08/06/2024, 8:10 PM

my best guess is that it is related to this [for whatever reason tagged

info

-level] line in the log:

TypeError: Object of type Decimal is not JSON serializable

Charles Feduke

08/06/2024, 8:14 PM

even in Python 3.12 Decimal is not JSON serializable:

Copy code

>>> import json
>>> from decimal import Decimal
>>> num = Decimal('3.14')
>>> json.dumps({'xyz': num})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Decimal is not JSON serializable

Charles Feduke

08/06/2024, 8:15 PM

however, if

default=str

is specified it works:

Copy code

json.dumps({'xyz':num}, default=str)
'{"xyz": "3.14"}'

Charles Feduke

08/06/2024, 8:16 PM

how… this exists in beyond me, I mean I have to assume people have used target-redshift in production and encountered this issue, decimals aren’t exactly rare

Kyunghwan Choi

08/06/2024, 8:17 PM

lol, odd.. so.. it's possible that this particular loader isn't working. 😞

Charles Feduke

08/06/2024, 8:18 PM

I don’t know enough about Singer and Python to know if there are downstream issues with just adding

default=str

- there must be a reason why its not present (and that reason could be… someone just forgot that Decimal is a type)

Kyunghwan Choi

08/06/2024, 8:18 PM

there's only one other redshift loader, but last commit was 2021, so didn't want to use it. Maybe I should try that one. It's not like redshift changed that much in last three years.

Charles Feduke

08/06/2024, 8:18 PM

yeah the redshift copy/unload semantics have remained mostly the same since I started using it back in like 2015 or so

Kyunghwan Choi

08/06/2024, 8:19 PM

let me try the other one and see if that works. 😞

Charles Feduke

08/06/2024, 8:19 PM

I’d recommend creating an issue on the repo for this target it seems like they are pretty responsive

Kyunghwan Choi

08/06/2024, 8:22 PM

let me do that too. thank you.

Charles Feduke

08/06/2024, 8:25 PM

if you create the issue link it here and I can put in additional info if necessary

Kyunghwan Choi

08/06/2024, 8:48 PM

https://github.com/TicketSwap/target-redshift/issues/104

Kyunghwan Choi

08/07/2024, 5:27 PM

Ended up switching to the other redshift variant and was able to make it work, though with that one, primary key setting needed to be set to false (not sure what kind of issues this will cause in the future, as it does seem to be updating the correct data).

👍 1

Open in Slack

Previous Next