Hi Team I've been banging my head against a wall f...
# troubleshooting
n
Hi Team I've been banging my head against a wall for the last couple of days. I'm trying to get the
dynamodb
tap to target
postgres
. After some struggling I got all the set up to work, created the relevant schema (of our horrible dynamo structure). I even got it to write to Postgres. The issue I am running into is that all of my Dynamo tables end with with
-prod
e.g
Users-<UUID>-prod
or
Organization-<UUID>-prod
etc. and for some reason when writing the data to postgres the table name just becomes
prod
dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed). I though it might be an easy fix, I can just use a
stream-map
and
__alias__
each table, like this:
Copy code
loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
    config:
      stream_maps:
        QuestionSet-<SomeGUID>-prod:
          __alias__: prod_question_set
but as soon as I do that I get this error: ```2022-12-22T055311.422307Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.423435Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.428362Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.429771Z [info ] | QuestionSet-<SomeGUID>-prod | LOG_BASED | 74 records | 20.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.435444Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055311.438289Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2022-12-22T055317.576898Z [info ] 2022-12-22 055317,574 Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055317.585458Z [info ] 2022-12-22 055317,582 Initializing 'target-postgres' target sink... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055317.587014Z [info ] 2022-12-22 055317,583 Initializing target sink for stream 'prod_question_set'... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.043043Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.049124Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.050546Z [info ] sys.exit(TargetPostgres.cli()) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2022-12-22T055320.052834Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/click/core.py", line 1130, in call cmd_type=elb consumer=True name=target-postgres producer=False st…
Hi Team, does anyone have any ideas on this one? I have ended up loading all tables into one table called
prod
with a heap of nulls in the columns and then I just clean it up with DBT, it would be much cleaner to have each source dump into it's own raw table though.
a
The issue I am running into is that all of my Dynamo tables end with with
-prod
e.g
Users-<UUID>-prod
or
Organization-<UUID>-prod
etc. and for some reason when writing the data to postgres the table name just becomes
prod
dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed).
I have run into the same issue when pulling from a past employer's DynamoDB source. Many/most SQL targets treat
-
as a delimiter, as in
<db_name>-<schema_name>-<table_name>
or
<schema_name>-<table_name>
. Targets which interpret the stream name this way, are not great with DynamoDB sources, since
-
is very common in Dynamo source tables. Given this convention, the target interprets all of your tables as being named
prod
, which is not at all what you want here. In my case (before I had Meltano and before stream maps existed), I used the fact that the Singer catalog spec treats 'table_name' as an optional override for upstream table name mapping and 'stream' or 'stream_id' as the identifier to send downstream. So, I modified the catalog file to use DynamoDB source table name as 'table_name', and a common-sense 'stream' ID that didn't end in '-prod'.
This doesn't necessarily give a clear path forward, but hopefully explains what you're running into and outlines a couple options for a path forward.
The challenge with our approach, manually modifying the catalog file, is that then you have to maintain it. But if you are using one of the tap-dynamodb extractors which require you to specify
schema
manually anyway, perhaps this isn't much additional lift?
Can you say if you are already modifying/freezing a catalog file? If so, I recommend trying an overriden
table_name
per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.
Does this help at all?
n
Can you say if you are already modifying/freezing a catalog file? If so, I recommend trying an overriden
table_name
per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.
Thanks @aaronsteers, I am not manually touching the catalog file at the moment (or at least I don't think I am), I have however created the schemas and saved them in a seperate yml file, here is a snippet of that file:
Copy code
- name: tap-dynamodb
    variant: singer-io
    pip_url: tap-dynamodb
    capabilities:
    - discover
    - state
    - catalog
    metadata:
      Question-<123>-prod:
        replication-key: _lastChangedAt
        replication-method: LOG_BASED
      User-<456>-prod:
        replication-key: _lastChangedAt
        replication-method: LOG_BASED
    schema:
      Question-<123>-prod:
        __typename:
          type: string
        _deleted:
          type: boolean
        _lastChangedAt:
          type: number
        _ttl:
          type: [number, 'null']
        ....
would I be able to just pass through the
table_name
in the metadata block? Or do I need to change the catalog file?
a
I'm not sure, actually. The tricky part here is that you need to add a table_name entry but change the stream ID. Since our Meltano
metedata
and
schema
entried are indexed/addressed by stream ID, I'm not sure if it would be possible to use this method to change them.
n
Thanks for getting back to me so quickly, Should this not be a reasonably easy fix with using the
meltano-map-transformer
? then I can get the stream names including
-
then just rename them to use
_
using the transformer and let
target-postgres
just do its thing? Feels like it is the cleanest separation of concerns. So something like this:
Copy code
- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          User_prod:
            __source__: User-<123>-prod
            __alias__: prod_user
          Question_prod:
            __source__: Question-<456>-prod
            __alias__: prod_question
Is what I've tried, but unfortunately it does not really seem to work consistently, it will only move one of the tables to it's own location in postgres, the rest still just go to prod.
a
The problem may be that
__source__
and
__alias__
are intended as inverse alternatives of basically the same operation. How about this syntax?
Copy code
- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          User-<123>-prod:
            __alias__: prod_user
          Question-<456>-prod:
            __alias__: prod_question
Or conversely:
Copy code
- name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          prod_user:
            __source__: User-<123>-prod
          prod_question:
            __source__: Question-<456>-prod
n
Ahhh right, I tried
__alias__
on it's own previously but I've not tried
__source__
on its own yet (until now): ```2023-01-05T025647.479247Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.482014Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.483498Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.485555Z [info ] | Question-<123>-prod | LOG_BASED | 566 records | 83.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.489006Z [info ] | QuestionSet-<456>-prod | LOG_BASED | 70 records | 38.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.491657Z [info ] | User-<789>-prod | LOG_BASED | 204 records | 51.4 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.492917Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025647.505035Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T025648.375200Z [info ] time=2023-01-05 025648 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025649.412441Z [info ] time=2023-01-05 025649 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025650.536650Z [info ] time=2023-01-05 025650 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025651.542919Z [info ] time=2023-01-05 025651 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025652.334863Z [info ] time=2023-01-05 025652 name=target_postgres level=INFO message=Loading 70 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.281289Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.303586Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."question_prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.354515Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 204 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-05T025653.639268Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading into tap_dynamodb…
a
Okay, yes, there's some progress here. I'd forgotten that the source operation doesn't remove the original stream but duplicates it under a new name. These lines show that at least one stream (
question_prod
) is being successfully cloned/aliased, although the original stream is still coming through.
Copy code
2023-01-05T02:56:54.823194Z [info     ] time=2023-01-05 02:56:54 name=target_postgres level=INFO message=Loading into tap_dynamodb."prod": {"inserts": 0, "updates": 566, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T02:56:55.053296Z [info     ] time=2023-01-05 02:56:55 name=target_postgres level=INFO message=Loading into tap_dynamodb."question_prod": {"inserts": 566, "updates": 0, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
Based on your above log, I'd try something like this next:
Copy code
# ...
        stream_maps:
          question_prod:
            __source__: Question-<123>-prod
          questionset_prod:
            __source__: QuestionSet-<456>-prod
          user_prod:
            __source__: User-<789>-prod
          Question-<123>-prod: null
          QuestionSet-<456>-prod: null
          User-<789>-prod: null
In theory, that would be functionally identical to:
Copy code
# ...
        stream_maps:
          Question-<123>-prod:
            __alias__: question_prod
          QuestionSet-<456>-prod:
            __alias__: questionset_prod
          User-<789>-prod:
            __alias__: user_prod
Are the carrot-bracketed numbers literally part of the table name, or is that a placeholder?
n
Are the carrot-bracketed numbers literally part of the table name, or is that a placeholder?
Nope, it us just a place holder, they look something like this
Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
I'll try to set the streams to
null
, in the next 10-15 minutes. Thanks for all the help with this one!
a
Nope, it us just a place holder, they look something like this
Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
Makes sense. Just wanted to ask, in case special characters were part of the issue (and sounds like they are not). To help isolate the issue (especially since they all end in
-prod
), it might be worthwhile tackling one stream at a time - setting the other streams to
null
in the stream map or deselecting all the others with
select
or
filter
, enabling and aliasing each one at a time to control for others that might be sneaking through.
n
So back on a run with this stream_map config:
Copy code
stream_maps:
          question_prod:
            __source__: Question-<123>-prod
          questionset_prod:
            __source__: QuestionSet-<456>-prod
          user_prod:
            __source__: User-<789>-prod
          Question-<123>-prod: null
          QuestionSet-<456>-prod: null
          User-<789>-prod: null
It threw this new error, so again it happily did
Question
but failed on
QuestionSet
because it's missing the
key_properties
field. ```2023-01-05T063944.641303Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.642359Z [info ] INFO Syncing log based for stream: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.643366Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.644469Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.645477Z [info ] INFO Scanning table Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod with params: cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.646394Z [info ] INFO TableName = Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.647303Z [info ] INFO Limit = 1000 cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063944.651679Z [info ] 2023-01-05 063944,612 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063944.657629Z [info ] 2023-01-05 063944,622 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063945.246880Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Completed sync (566 rows) cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.258943Z [info ] INFO QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.262879Z [info ] INFO Syncing log based for stream: QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.264479Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.265870Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-05T063945.269818Z [info ] 2023-01-05 063945,264 | INFO | root | Set null tansform as default for 'QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer 2023-01-05T063945.272520Z [info ] 2023-01-05 063945,268 |…
I'll test now with the other streams omitted from the
select
list...
Nope, no dice. All streams fail with the same
key_properties
error when running one by one. Am I just completely trying to misuse the
meltano-map-transformer
or is this weird/unexpected behaviour? This might be a silly question but is there a straight forward way for me to debug this? For example could I get the raw and transformed streams to write to disk, that way I can step trough where things go wrong?
a
All streams fail with the same
key_properties
error when running one by one.
This is still progress, I think.
Am I just completely trying to misuse the
meltano-map-transformer
or is this weird/unexpected behaviour?
Do I understand correctly that you are seeing the streams come through (more-or-less) correctly under the aliased names? The failure around key_properties is a separate issue (in the target layer) and has a couple options to mitigate
In order of what I'd try next: 1. If the streams don't have key properties defined in tap-dynamodb (or if they just aren't coming through), you have two options: a. Within the same map transforms config, you can add explicit primary key overrides using the syntax here. b. Configure the target to not require primary keys, or use a target that does not require them. 2. If the streams do have key properties defined in tap-dynamodb, and if the map transformer is losing track of them, that would be a bug and we'd welcome a bug report for that.
n
Hi no, not really, it kinda works if I only select one table with
___source___
but then continues to complaine about the key properties if I set the original stream to
null
so if my mapper looks like this:
Copy code
mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          user_prod_1:
            __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
            "__key_properties__": ["id"]
          User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: null
it hates life.
Copy code
2023-01-06T05:44:51.346010Z [info     ] +Sync Summary--------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.350770Z [info     ] | table name                           | replication method | total records | write speed         | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.355206Z [info     ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.357058Z [info     ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED          | 218 records   | 35.3 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.358449Z [info     ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.359695Z [info     ] INFO Done syncing.             cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.519742Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.522662Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.524737Z [info     ]     sys.exit(main())           cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.526232Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 373, in main cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.527487Z [info     ]     persist_lines(config, singer_messages) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.529310Z [info     ]   File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 197, in persist_lines cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.530479Z [info     ]     raise Exception("key_properties field is required") cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.532066Z [info     ] Exception: key_properties field is required cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T054451.626756Z [error ] Loader failed If I just let it write to the
prod
table i.e mapper looks like this (while only selecting the User table in the dynamodb tap) and commenting out the key properties: ```mappings: - name: alias_stream_names config: stream_maps: user_prod_1: …
But if I then select another table so that my mapper looks like this:
Copy code
mappers:
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
    mappings:
    - name: alias_stream_names
      config:
        stream_maps:
          user_prod_1:
            __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
          question_prod_1:
            __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
it is like the stream mapper is not iterating through the tables/streams anymore: ```2023-01-06T055822.994755Z [info ] +Sync Summary------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055822.998344Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.000691Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.004125Z [info ] | Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 567 records | 102.6 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.010621Z [info ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 218 records | 81.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.012924Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.015614Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb 2023-01-06T055823.609215Z [info ] time=2023-01-06 055823 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055824.711233Z [info ] time=2023-01-06 055824 name=target_postgres level=INFO message=Table '"question_prod_1"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055826.273088Z [info ] time=2023-01-06 055826 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055827.332930Z [info ] time=2023-01-06 055827 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.632480Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."question_prod_1"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.763276Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055828.863333Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 218 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres 2023-01-06T055829.749532Z [info ] time=2023-01-06 055829 name=target_postgres level=INFO message=Loading into tap_dynamodb."questio…
e
It now writes data to the
question_prod_1
table, for questions, but it completely ignores the previously working
user_prod_1
table.
Does it not output anything for the
user_*
table or it just doesn’t map it? (I’m thinking of a potential issue with state)
n
Does it not output anything for the
user_*
table or it just doesn’t map it? (I’m thinking of a potential issue with state)
Nope, no data is written to the
user
table - it's as if there is no mapping to it.
e
@visch sorry for the ping, can you confirm if this 👆🏼 is a bug?
v
@edgar_ramirez_mondragon I"ll take a look now!
I'm sure I"m missing something I"ll try this with the mapper as well but this works for me 😕
cat out | meltano invoke target-postgres
Also my tables are
question_prod_1
as I'd expect as well.
out
I'm seeing something odd with dash's in the stream names diving more
Down a couple rabbit holes trying to help with the debugging of mappers by using https://github.com/kgpayne/target-singer-jsonl/blob/main/target_singer_jsonl/__init__.py , getting closer I"ll have something tomorrow / tonight (If I have some time) First look is the remap isn't happening as you'd expect in the mapper stream_maps config
Copy code
version: 1
send_anonymous_usage_stats: true
project_id: "target-postgres"
default_environment: "dev"
plugins:
  extractors:
  - name: "tap-smoke-test"
    namespace: "tap_smoke_test"
    executable: "tap-smoke-test"
    pip_url: "git+<https://gitlab.com/meltano/tap-smoke-test.git>"
    config:
      streams:
      - stream_name: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        "input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/animals-data.jsonl>"
      - stream_name: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        "input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/pageviews-data.jsonl>"
  loaders:
  - name: "target-postgres"
    namespace: "target_postgres"
    pip_url: -e .
    config:
      sqlalchemy_url: "<postgresql://postgres:postgres@localhost:5432/postgres>"
      target_schema: test
      stream_maps:
        user_prod_1:
          __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
        question_prod_1:
          __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
    settings:
    - name: sqlalchemy_url
      kind: password
  - name: target-singer-jsonl
    namespace: target_singer_jsonl
    pip_url: "git+<https://github.com/visch/target-singer-jsonl@patch-1>"
    executable: target-singer-jsonl
    settings:
    - name: destination
      kind: string
    - name: local.folder
      kind: string
    - name: s3.bucket
      kind: string
    - name: s3.prefix
      kind: string
    - name: add_record_metadata
      kind: boolean
    config:
      source: local
      local:
        folder: output/extract/
        recursive: true
  mappers:
    - name: meltano-map-transformer
      variant: meltano
      pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
      mappings:
      - name: alias_stream_names
        config:
          stream_maps:
            user_prod_1_test:
              __source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
            question_prod_1_test:
              __source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
environments:
- name: dev
I'll be back!
n
Thanks so much for digging into this Team! I really like the look of that
tap-smoke-test
extractor, will absolutely use it to debug going forward.
v
https://github.com/MeltanoLabs/meltano-map-transform/issues/92
meltano-map-transform
has a bug I didn't dive into fixing it but what you were seeing was is a pretty bad bug where if you pass in a stream map like. fyi @ @edgar_ramirez_mondragon
Copy code
{
  "stream_maps": {
    "question_prod_1_test": {
      "__source__": "User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod"
    }
  }
}
but the first
SCHEMA
message is from the
User-pz-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
the stream map will just be ignored. This is really hard to debug with
meltano run
and family which is why @nelis_verhoef’s quesiton about how to debug this is very valid. See https://github.com/meltano/meltano/issues/3232 (please upvote this)
@nelis_verhoef I'll dive into this fix, but I want to get you up and running, and I had a thought last night that the "stream name splits" feature shouldn't be unhelpful, and if it is it should be able to be turned off. I'm going to put together a config var on
target-postgres
for
stream_name_splits
, and I"ll default it to True. There's a couple other things I'll get in with this too. Stay tuned.
@nelis_verhoef give the latest target-postgres a shot, with the
stream_name_splits
config flag set to
False
that should do the trick for you. PR https://github.com/MeltanoLabs/target-postgres/pull/59 I'm working on getting SDK issues in for this as well to see if we want to support this globally or not
https://github.com/meltano/sdk/issues/1319 SDK issue vouching for this to get added to the SDK 🙂
@nelis_verhoef If we want to make
stream_maps
work then we should use
alias's
instead
Copy code
{
  "stream_maps": {
    "User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
      "__alias__": "user_test"
    },
    "Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
      "__alias__": "question_test"
    }
  }
}
Works as you'd expect
Summary of all of this: 1. Alias works better than source here so it seems https://meltano.slack.com/archives/C01TCRBBJD7/p1673386444248969?thread_ts=1671688563.297449&amp;cid=C01TCRBBJD7 2. Using source has a major bug (Bug may be in the SDK but I'm not certain) https://github.com/MeltanoLabs/meltano-map-transform/issues/92 3. All of the mapping can be avoided if we toggle this functionality in
target-postgres
so I implemented it here https://github.com/MeltanoLabs/target-postgres/pull/59 Other things we may want to do 1. If we detect dashes in the stream name we should log that this is happening so folks can look at the config stream_name_splits. Seems like a low cost high value thing to do 2. It's really hard to debug mappers. a. Can't invoke a mapper b. Can't dump config for a mapper 3. Smoke test should offer key properties or offer a way to override them as the catalog indicator for primary keys doesn't work.
n
Excellent, thanks for the
tap-postgres
fix and the explanation around this @visch! I'll give the postgres solution a crack
I'm so sorry, it's me again. With the uppercase stream names changes you made in the PR. I think there is a small bug - or I just need to change something on my end. But because of adding the
"
to enable postgres to read case sensitive table names we 'd need to only wrap the table name in
"
and be sure to not wrap the fully qualified table name in
"
. i.e what I'm seeing happening is that I get errors like these:
Copy code
2023-01-11T23:47:31.827337Z [info     ] sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" does not exist cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.828738Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.830161Z [info     ] [SQL: ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
so from my understanding the issue is this command:
ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER
I think it should be:
ALTER TABLE tap_dynamo_db."Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER
So for this one we'd need pass table and schema separately to https://github.com/MeltanoLabs/target-postgres/blob/b580cdeee453c7945d9f9101ad3e75e41ee4245d/target_postgres/connector.py#L148. But we'd have to do it across the board right? Or we can just change this one - https://github.com/MeltanoLabs/target-postgres/blob/b580cdeee453c7945d9f9101ad3e75e41ee4245d/target_postgres/sinks.py#L224 to something like this?
Copy code
def conform_name(self, name: str, object_type: Optional[str] = None) -> str:
        """Conforming names of tables, schemas, column names."""
        return name.replace("-", "_").lower()
and then remove the explicit handling of cased stream/table names?
v
can't believe I made that mistake and my tests didn't catch them! I'll fix it first thing in the morning
And add tests!
@nelis_verhoef I had to revert the change we made to fix your issue in
target-postgres
I'm working through this, but it's going to a bit before I can get the toggle in place if you're interested I wrote up what happened here https://github.com/MeltanoLabs/target-postgres/issues/58 Getting close though!
n
Thanks for all the work in this one @visch