Hi Team I ve been banging my head against a wall for the las Meltano #troubleshooting

Hi Team I've been banging my head against a wall f...

nelis_verhoef

12/22/2022, 5:56 AM

Hi Team I've been banging my head against a wall for the last couple of days. I'm trying to get the

dynamodb

tap to target

postgres

. After some struggling I got all the set up to work, created the relevant schema (of our horrible dynamo structure). I even got it to write to Postgres. The issue I am running into is that all of my Dynamo tables end with with

-prod

e.g

Users-<UUID>-prod

Organization-<UUID>-prod

etc. and for some reason when writing the data to postgres the table name just becomes

prod

dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed). I though it might be an easy fix, I can just use a

stream-map

and

__alias__

each table, like this:

Copy code

loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
    config:
      stream_maps:
        QuestionSet-<SomeGUID>-prod:
          __alias__: prod_question_set

but as soon as I do that I get this error: ```2022-12-22T055311.422307Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.423435Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.428362Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.429771Z [info ] | QuestionSet-<SomeGUID>-prod | LOG_BASED | 74 records | 20.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.435444Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.438289Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055317.576898Z [info ] 2022-12-22 055317,574 Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055317.585458Z [info ] 2022-12-22 055317,582 Initializing 'target-postgres' target sink... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055317.587014Z [info ] 2022-12-22 055317,583 Initializing target sink for stream 'prod_question_set'... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.043043Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.049124Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.050546Z [info ] sys.exit(TargetPostgres.cli()) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.052834Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/click/core.py", line 1130, in call cmd_type=elb consumer=True name=target-postgres producer=False st…

nelis_verhoef

01/04/2023, 10:18 PM

Hi Team, does anyone have any ideas on this one? I have ended up loading all tables into one table called

prod

with a heap of nulls in the columns and then I just clean it up with DBT, it would be much cleaner to have each source dump into it's own raw table though.

aaronsteers

01/05/2023, 1:32 AM

The issue I am running into is that all of my Dynamo tables end with with
-prod
e.g
Users-<UUID>-prod
or
Organization-<UUID>-prod
etc. and for some reason when writing the data to postgres the table name just becomes
prod
dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed).

I have run into the same issue when pulling from a past employer's DynamoDB source. Many/most SQL targets treat

as a delimiter, as in

<db_name>-<schema_name>-<table_name>

<schema_name>-<table_name>

. Targets which interpret the stream name this way, are not great with DynamoDB sources, since

is very common in Dynamo source tables. Given this convention, the target interprets all of your tables as being named

prod

, which is not at all what you want here. In my case (before I had Meltano and before stream maps existed), I used the fact that the Singer catalog spec treats 'table_name' as an optional override for upstream table name mapping and 'stream' or 'stream_id' as the identifier to send downstream. So, I modified the catalog file to use DynamoDB source table name as 'table_name', and a common-sense 'stream' ID that didn't end in '-prod'.

aaronsteers

01/05/2023, 1:33 AM

This doesn't necessarily give a clear path forward, but hopefully explains what you're running into and outlines a couple options for a path forward.

aaronsteers

01/05/2023, 1:35 AM

The challenge with our approach, manually modifying the catalog file, is that then you have to maintain it. But if you are using one of the tap-dynamodb extractors which require you to specify

schema

manually anyway, perhaps this isn't much additional lift?

aaronsteers

01/05/2023, 1:36 AM

Can you say if you are already modifying/freezing a catalog file? If so, I recommend trying an overriden

table_name

per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.

aaronsteers

01/05/2023, 1:36 AM

Does this help at all?

nelis_verhoef

01/05/2023, 2:02 AM

Can you say if you are already modifying/freezing a catalog file? If so, I recommend trying an overriden
table_name
per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.

Thanks @aaronsteers, I am not manually touching the catalog file at the moment (or at least I don't think I am), I have however created the schemas and saved them in a seperate yml file, here is a snippet of that file:

Copy code

- name: tap-dynamodb
    variant: singer-io
    pip_url: tap-dynamodb
    capabilities:
    - discover
    - state
    - catalog
    metadata:
      Question-<123>-prod:
        replication-key: _lastChangedAt
        replication-method: LOG_BASED
      User-<456>-prod:
        replication-key: _lastChangedAt
        replication-method: LOG_BASED
    schema:
      Question-<123>-prod:
        __typename:
          type: string
        _deleted:
          type: boolean
        _lastChangedAt:
          type: number
        _ttl:
          type: [number, 'null']
        ....

would I be able to just pass through the

table_name

in the metadata block? Or do I need to change the catalog file?

aaronsteers

01/05/2023, 2:17 AM

I'm not sure, actually. The tricky part here is that you need to add a table_name entry but change the stream ID. Since our Meltano

metedata

and

schema

entried are indexed/addressed by stream ID, I'm not sure if it would be possible to use this method to change them.

nelis_verhoef

01/05/2023, 2:22 AM

Thanks for getting back to me so quickly, Should this not be a reasonably easy fix with using the

meltano-map-transformer

? then I can get the stream names including

then just rename them to use

using the transformer and let

target-postgres

just do its thing? Feels like it is the cleanest separation of concerns. So something like this:

Copy code

- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          User_prod:
            __source__: User-<123>-prod
            __alias__: prod_user
          Question_prod:
            __source__: Question-<456>-prod
            __alias__: prod_question

nelis_verhoef

01/05/2023, 2:23 AM

Is what I've tried, but unfortunately it does not really seem to work consistently, it will only move one of the tables to it's own location in postgres, the rest still just go to prod.

aaronsteers

01/05/2023, 2:37 AM

The problem may be that

__source__

and

__alias__

are intended as inverse alternatives of basically the same operation. How about this syntax?

Copy code

- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          User-<123>-prod:
            __alias__: prod_user
          Question-<456>-prod:
            __alias__: prod_question

Or conversely:

Copy code

- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          prod_user:
            __source__: User-<123>-prod
          prod_question:
            __source__: Question-<456>-prod

nelis_verhoef

01/05/2023, 3:30 AM

Ahhh right, I tried

__alias__

on it's own previously but I've not tried

__source__

on its own yet (until now): ```2023-01-05T025647.479247Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.482014Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.483498Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.485555Z [info ] | Question-<123>-prod | LOG_BASED | 566 records | 83.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.489006Z [info ] | QuestionSet-<456>-prod | LOG_BASED | 70 records | 38.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.491657Z [info ] | User-<789>-prod | LOG_BASED | 204 records | 51.4 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.492917Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.505035Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025648.375200Z [info ] time=2023-01-05 025648 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025649.412441Z [info ] time=2023-01-05 025649 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025650.536650Z [info ] time=2023-01-05 025650 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025651.542919Z [info ] time=2023-01-05 025651 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025652.334863Z [info ] time=2023-01-05 025652 name=target_postgres level=INFO message=Loading 70 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.281289Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.303586Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."question_prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.354515Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 204 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.639268Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading into tap_dynamodb…

aaronsteers

01/05/2023, 5:56 AM

Okay, yes, there's some progress here. I'd forgotten that the source operation doesn't remove the original stream but duplicates it under a new name. These lines show that at least one stream (

question_prod

) is being successfully cloned/aliased, although the original stream is still coming through.

Copy code

2023-01-05T02:56:54.823194Z [info     ] time=2023-01-05 02:56:54 name=target_postgres level=INFO message=Loading into tap_dynamodb."prod": {"inserts": 0, "updates": 566, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T02:56:55.053296Z [info     ] time=2023-01-05 02:56:55 name=target_postgres level=INFO message=Loading into tap_dynamodb."question_prod": {"inserts": 566, "updates": 0, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres

aaronsteers

01/05/2023, 6:00 AM

Based on your above log, I'd try something like this next:

Copy code

# ...
        stream_maps:
          question_prod:
            __source__: Question-<123>-prod
          questionset_prod:
            __source__: QuestionSet-<456>-prod
          user_prod:
            __source__: User-<789>-prod
          Question-<123>-prod: null
          QuestionSet-<456>-prod: null
          User-<789>-prod: null

aaronsteers

01/05/2023, 6:01 AM

In theory, that would be functionally identical to:

Copy code

# ...
        stream_maps:
          Question-<123>-prod:
            __alias__: question_prod
          QuestionSet-<456>-prod:
            __alias__: questionset_prod
          User-<789>-prod:
            __alias__: user_prod

aaronsteers

01/05/2023, 6:03 AM

Are the carrot-bracketed numbers literally part of the table name, or is that a placeholder?

nelis_verhoef

01/05/2023, 6:05 AM

Are the carrot-bracketed numbers literally part of the table name, or is that a placeholder?

Nope, it us just a place holder, they look something like this

Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod

nelis_verhoef

01/05/2023, 6:07 AM

I'll try to set the streams to

null

, in the next 10-15 minutes. Thanks for all the help with this one!

aaronsteers

01/05/2023, 6:36 AM

Nope, it us just a place holder, they look something like this
Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod

Makes sense. Just wanted to ask, in case special characters were part of the issue (and sounds like they are not). To help isolate the issue (especially since they all end in

-prod

), it might be worthwhile tackling one stream at a time - setting the other streams to

null

in the stream map or deselecting all the others with

select

filter

, enabling and aliasing each one at a time to control for others that might be sneaking through.

nelis_verhoef

01/05/2023, 6:53 AM

So back on a run with this stream_map config:

Copy code

stream_maps:
          question_prod:
            __source__: Question-<123>-prod
          questionset_prod:
            __source__: QuestionSet-<456>-prod
          user_prod:
            __source__: User-<789>-prod
          Question-<123>-prod: null
          QuestionSet-<456>-prod: null
          User-<789>-prod: null

It threw this new error, so again it happily did

Question

but failed on

QuestionSet

because it's missing the

key_properties

field. ```2023-01-05T063944.641303Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.642359Z [info ] INFO Syncing log based for stream: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.643366Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.644469Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.645477Z [info ] INFO Scanning table Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod with params: cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.646394Z [info ] INFO TableName = Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.647303Z [info ] INFO Limit = 1000 cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.651679Z [info ] 2023-01-05 063944,612 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063944.657629Z [info ] 2023-01-05 063944,622 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063945.246880Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Completed sync (566 rows) cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.258943Z [info ] INFO QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.262879Z [info ] INFO Syncing log based for stream: QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.264479Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.265870Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.269818Z [info ] 2023-01-05 063945,264 | INFO | root | Set null tansform as default for 'QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063945.272520Z [info ] 2023-01-05 063945,268 |…

nelis_verhoef

01/05/2023, 6:54 AM

I'll test now with the other streams omitted from the

select

list...

nelis_verhoef

01/05/2023, 10:49 PM

Nope, no dice. All streams fail with the same

key_properties

error when running one by one. Am I just completely trying to misuse the

meltano-map-transformer

or is this weird/unexpected behaviour? This might be a silly question but is there a straight forward way for me to debug this? For example could I get the raw and transformed streams to write to disk, that way I can step trough where things go wrong?

aaronsteers

01/06/2023, 2:36 AM

All streams fail with the same
key_properties
error when running one by one.

This is still progress, I think.

aaronsteers

01/06/2023, 2:39 AM

Am I just completely trying to misuse the
meltano-map-transformer
or is this weird/unexpected behaviour?

Do I understand correctly that you are seeing the streams come through (more-or-less) correctly under the aliased names? The failure around key_properties is a separate issue (in the target layer) and has a couple options to mitigate

aaronsteers

01/06/2023, 2:44 AM

In order of what I'd try next: 1. If the streams don't have key properties defined in tap-dynamodb (or if they just aren't coming through), you have two options: a. Within the same map transforms config, you can add explicit primary key overrides using the syntax here. b. Configure the target to not require primary keys, or use a target that does not require them. 2. If the streams do have key properties defined in tap-dynamodb, and if the map transformer is losing track of them, that would be a bug and we'd welcome a bug report for that.

nelis_verhoef

01/06/2023, 5:55 AM

Hi no, not really, it kinda works if I only select one table with

___source___

but then continues to complaine about the key properties if I set the original stream to

null

so if my mapper looks like this:

Copy code

mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          user_prod_1:
            __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
            "__key_properties__": ["id"]
          User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: null

it hates life.

Copy code

2023-01-06T05:44:51.346010Z [info     ] +Sync Summary--------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.350770Z [info     ] | table name                           | replication method | total records | write speed         | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.355206Z [info     ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.357058Z [info     ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED          | 218 records   | 35.3 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.358449Z [info     ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.359695Z [info     ] INFO Done syncing.             cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.519742Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.522662Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.524737Z [info     ]     sys.exit(main())           cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.526232Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 373, in main cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.527487Z [info     ]     persist_lines(config, singer_messages) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.529310Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 197, in persist_lines cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.530479Z [info     ]     raise Exception("key_properties field is required") cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.532066Z [info     ] Exception: key_properties field is required cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres

2023-01-06T054451.626756Z [error ] Loader failed If I just let it write to the

prod

table i.e mapper looks like this (while only selecting the User table in the dynamodb tap) and commenting out the key properties: ```mappings: - name: alias_stream_names config: stream_maps: user_prod_1: …

nelis_verhoef

01/06/2023, 6:00 AM

But if I then select another table so that my mapper looks like this:

Copy code

mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          user_prod_1:
            __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
          question_prod_1:
            __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod

it is like the stream mapper is not iterating through the tables/streams anymore: ```2023-01-06T055822.994755Z [info ] +Sync Summary------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055822.998344Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.000691Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.004125Z [info ] | Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 567 records | 102.6 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.010621Z [info ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 218 records | 81.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.012924Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.015614Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.609215Z [info ] time=2023-01-06 055823 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055824.711233Z [info ] time=2023-01-06 055824 name=target_postgres level=INFO message=Table '"question_prod_1"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055826.273088Z [info ] time=2023-01-06 055826 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055827.332930Z [info ] time=2023-01-06 055827 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.632480Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."question_prod_1"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.763276Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.863333Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 218 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055829.749532Z [info ] time=2023-01-06 055829 name=target_postgres level=INFO message=Loading into tap_dynamodb."questio…

edgar_ramirez_mondragon

01/06/2023, 7:22 PM

It now writes data to the
question_prod_1
table, for questions, but it completely ignores the previously working
user_prod_1
table.

Does it not output anything for the

user_*

table or it just doesn’t map it? (I’m thinking of a potential issue with state)

nelis_verhoef

01/08/2023, 10:44 PM

Does it not output anything for the
user_*
table or it just doesn’t map it? (I’m thinking of a potential issue with state)

Nope, no data is written to the

user

table - it's as if there is no mapping to it.

edgar_ramirez_mondragon

01/09/2023, 6:39 PM

@visch sorry for the ping, can you confirm if this 👆🏼 is a bug?

visch

01/09/2023, 8:15 PM

@edgar_ramirez_mondragon I"ll take a look now!

visch

01/09/2023, 8:58 PM

I'm sure I"m missing something I"ll try this with the mapper as well but this works for me 😕

cat out | meltano invoke target-postgres

Also my tables are

question_prod_1

as I'd expect as well.

visch

01/09/2023, 8:59 PM

out

visch

01/09/2023, 9:07 PM

I'm seeing something odd with dash's in the stream names diving more

visch

01/09/2023, 9:48 PM

Down a couple rabbit holes trying to help with the debugging of mappers by using https://github.com/kgpayne/target-singer-jsonl/blob/main/target_singer_jsonl/__init__.py , getting closer I"ll have something tomorrow / tonight (If I have some time) First look is the remap isn't happening as you'd expect in the mapper stream_maps config

visch

01/09/2023, 9:48 PM

Copy code

version: 1
send_anonymous_usage_stats: true
project_id: "target-postgres"
default_environment: "dev"
plugins:
  extractors:
  - name: "tap-smoke-test"
    namespace: "tap_smoke_test"
    executable: "tap-smoke-test"
    pip_url: "git+<https://gitlab.com/meltano/tap-smoke-test.git>"
    config:
      streams:
      - stream_name: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        "input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/animals-data.jsonl>"
      - stream_name: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        "input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/pageviews-data.jsonl>"
  loaders:
  - name: "target-postgres"
    namespace: "target_postgres"
    pip_url: -e .
    config:
      sqlalchemy_url: "<postgresql://postgres:postgres@localhost:5432/postgres>"
      target_schema: test
      stream_maps:
        user_prod_1:
          __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        question_prod_1:
          __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
    settings:
    - name: sqlalchemy_url
      kind: password
  - name: target-singer-jsonl
    namespace: target_singer_jsonl
    pip_url: "git+<https://github.com/visch/target-singer-jsonl@patch-1>"
    executable: target-singer-jsonl
    settings:
    - name: destination
      kind: string
    - name: local.folder
      kind: string
    - name: s3.bucket
      kind: string
    - name: s3.prefix
      kind: string
    - name: add_record_metadata
      kind: boolean
    config:
      source: local
      local:
        folder: output/extract/
        recursive: true
  mappers:
    - name: meltano-map-transformer
      variant: meltano
      pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
      mappings:
      - name: alias_stream_names
        config:
          stream_maps:
            user_prod_1_test:
              __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
            question_prod_1_test:
              __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
environments:
- name: dev

I'll be back!

nelis_verhoef

01/09/2023, 9:57 PM

Thanks so much for digging into this Team! I really like the look of that

tap-smoke-test

extractor, will absolutely use it to debug going forward.

visch

01/10/2023, 4:04 PM

https://github.com/MeltanoLabs/meltano-map-transform/issues/92

meltano-map-transform

has a bug I didn't dive into fixing it but what you were seeing was is a pretty bad bug where if you pass in a stream map like. fyi @ @edgar_ramirez_mondragon

Copy code

{
  "stream_maps": {
    "question_prod_1_test": {
      "__source__": "User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod"
    }
  }
}

but the first

SCHEMA

message is from the

User-pz-pzjs3twtm2y4ip6uvfvy6ukhpa-prod

the stream map will just be ignored. This is really hard to debug with

meltano run

and family which is why @nelis_verhoef’s quesiton about how to debug this is very valid. See https://github.com/meltano/meltano/issues/3232 (please upvote this)

visch

01/10/2023, 4:06 PM

@nelis_verhoef I'll dive into this fix, but I want to get you up and running, and I had a thought last night that the "stream name splits" feature shouldn't be unhelpful, and if it is it should be able to be turned off. I'm going to put together a config var on

target-postgres

for

stream_name_splits

, and I"ll default it to True. There's a couple other things I'll get in with this too. Stay tuned.

visch

01/10/2023, 8:29 PM

@nelis_verhoef give the latest target-postgres a shot, with the

stream_name_splits

config flag set to

False

that should do the trick for you. PR https://github.com/MeltanoLabs/target-postgres/pull/59 I'm working on getting SDK issues in for this as well to see if we want to support this globally or not

visch

01/10/2023, 9:16 PM

https://github.com/meltano/sdk/issues/1319 SDK issue vouching for this to get added to the SDK 🙂

visch

01/10/2023, 9:34 PM

@nelis_verhoef If we want to make

stream_maps

work then we should use

alias's

instead

Copy code

{
  "stream_maps": {
    "User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
      "__alias__": "user_test"
    },
    "Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
      "__alias__": "question_test"
    }
  }
}

Works as you'd expect

visch

01/10/2023, 9:37 PM

Summary of all of this: 1. Alias works better than source here so it seems https://meltano.slack.com/archives/C01TCRBBJD7/p1673386444248969?thread_ts=1671688563.297449&cid=C01TCRBBJD7 2. Using source has a major bug (Bug may be in the SDK but I'm not certain) https://github.com/MeltanoLabs/meltano-map-transform/issues/92 3. All of the mapping can be avoided if we toggle this functionality in

target-postgres

so I implemented it here https://github.com/MeltanoLabs/target-postgres/pull/59 Other things we may want to do 1. If we detect dashes in the stream name we should log that this is happening so folks can look at the config stream_name_splits. Seems like a low cost high value thing to do 2. It's really hard to debug mappers. a. Can't invoke a mapper b. Can't dump config for a mapper 3. Smoke test should offer key properties or offer a way to override them as the catalog indicator for primary keys doesn't work.

nelis_verhoef

01/11/2023, 6:22 AM

Excellent, thanks for the

tap-postgres

fix and the explanation around this @visch! I'll give the postgres solution a crack

nelis_verhoef

01/12/2023, 12:08 AM

I'm so sorry, it's me again. With the uppercase stream names changes you made in the PR. I think there is a small bug - or I just need to change something on my end. But because of adding the

to enable postgres to read case sensitive table names we 'd need to only wrap the table name in

and be sure to not wrap the fully qualified table name in

. i.e what I'm seeing happening is that I get errors like these:

Copy code

2023-01-11T23:47:31.827337Z [info     ] sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" does not exist cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.828738Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.830161Z [info     ] [SQL: ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres

so from my understanding the issue is this command:

ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER

I think it should be:

ALTER TABLE tap_dynamo_db."Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER

nelis_verhoef

01/12/2023, 12:15 AM

So for this one we'd need pass table and schema separately to https://github.com/MeltanoLabs/target-postgres/blob/b580cdeee453c7945d9f9101ad3e75e41ee4245d/target_postgres/connector.py#L148. But we'd have to do it across the board right? Or we can just change this one - https://github.com/MeltanoLabs/target-postgres/blob/b580cdeee453c7945d9f9101ad3e75e41ee4245d/target_postgres/sinks.py#L224 to something like this?

Copy code

def conform_name(self, name: str, object_type: Optional[str] = None) -> str:
        """Conforming names of tables, schemas, column names."""
        return name.replace("-", "_").lower()

and then remove the explicit handling of cased stream/table names?

visch

01/12/2023, 1:50 AM

can't believe I made that mistake and my tests didn't catch them! I'll fix it first thing in the morning

visch

01/12/2023, 1:50 AM

And add tests!

visch

01/12/2023, 3:09 PM

@nelis_verhoef I had to revert the change we made to fix your issue in

target-postgres

I'm working through this, but it's going to a bit before I can get the toggle in place if you're interested I wrote up what happened here https://github.com/MeltanoLabs/target-postgres/issues/58 Getting close though!

nelis_verhoef

01/17/2023, 4:04 AM

Thanks for all the work in this one @visch

2 Views

Open in Slack

Previous Next