nelis_verhoef
12/22/2022, 5:56 AMdynamodb
tap to target postgres
. After some struggling I got all the set up to work, created the relevant schema (of our horrible dynamo structure). I even got it to write to Postgres.
The issue I am running into is that all of my Dynamo tables end with with -prod
e.g Users-<UUID>-prod
or Organization-<UUID>-prod
etc. and for some reason when writing the data to postgres the table name just becomes prod
dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed). I though it might be an easy fix, I can just use a stream-map
and __alias__
each table, like this:
loaders:
- name: target-postgres
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/target-postgres.git>
config:
stream_maps:
QuestionSet-<SomeGUID>-prod:
__alias__: prod_question_set
but as soon as I do that I get this error:
```2022-12-22T055311.422307Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055311.423435Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055311.428362Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055311.429771Z [info ] | QuestionSet-<SomeGUID>-prod | LOG_BASED | 74 records | 20.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055311.435444Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055311.438289Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2022-12-22T055317.576898Z [info ] 2022-12-22 055317,574 Target 'target-postgres' is listening for input from tap. cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055317.585458Z [info ] 2022-12-22 055317,582 Initializing 'target-postgres' target sink... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055317.587014Z [info ] 2022-12-22 055317,583 Initializing target sink for stream 'prod_question_set'... cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055320.043043Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055320.049124Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055320.050546Z [info ] sys.exit(TargetPostgres.cli()) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2022-12-22T055320.052834Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/click/core.py", line 1130, in call cmd_type=elb consumer=True name=target-postgres producer=False st…nelis_verhoef
01/04/2023, 10:18 PMprod
with a heap of nulls in the columns and then I just clean it up with DBT, it would be much cleaner to have each source dump into it's own raw table though.aaronsteers
01/05/2023, 1:32 AMThe issue I am running into is that all of my Dynamo tables end with withI have run into the same issue when pulling from a past employer's DynamoDB source. Many/most SQL targets treate.g-prod
orUsers-<UUID>-prod
etc. and for some reason when writing the data to postgres the table name just becomesOrganization-<UUID>-prod
dropping the GUID and actual table name somewhere (I've done some digging but couldn't find where in the code the name gets changed).prod
-
as a delimiter, as in <db_name>-<schema_name>-<table_name>
or <schema_name>-<table_name>
. Targets which interpret the stream name this way, are not great with DynamoDB sources, since -
is very common in Dynamo source tables. Given this convention, the target interprets all of your tables as being named prod
, which is not at all what you want here.
In my case (before I had Meltano and before stream maps existed), I used the fact that the Singer catalog spec treats 'table_name' as an optional override for upstream table name mapping and 'stream' or 'stream_id' as the identifier to send downstream. So, I modified the catalog file to use DynamoDB source table name as 'table_name', and a common-sense 'stream' ID that didn't end in '-prod'.aaronsteers
01/05/2023, 1:33 AMaaronsteers
01/05/2023, 1:35 AMschema
manually anyway, perhaps this isn't much additional lift?aaronsteers
01/05/2023, 1:36 AMtable_name
per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.aaronsteers
01/05/2023, 1:36 AMnelis_verhoef
01/05/2023, 2:02 AMCan you say if you are already modifying/freezing a catalog file? If so, I recommend trying an overridenThanks @aaronsteers, I am not manually touching the catalog file at the moment (or at least I don't think I am), I have however created the schemas and saved them in a seperate yml file, here is a snippet of that file:per stream that points to the upstream DynamoDB source table, decoupled from the stream ID/name that would get passed downstream to the target.table_name
- name: tap-dynamodb
variant: singer-io
pip_url: tap-dynamodb
capabilities:
- discover
- state
- catalog
metadata:
Question-<123>-prod:
replication-key: _lastChangedAt
replication-method: LOG_BASED
User-<456>-prod:
replication-key: _lastChangedAt
replication-method: LOG_BASED
schema:
Question-<123>-prod:
__typename:
type: string
_deleted:
type: boolean
_lastChangedAt:
type: number
_ttl:
type: [number, 'null']
....
would I be able to just pass through the table_name
in the metadata block?
Or do I need to change the catalog file?aaronsteers
01/05/2023, 2:17 AMmetedata
and schema
entried are indexed/addressed by stream ID, I'm not sure if it would be possible to use this method to change them.nelis_verhoef
01/05/2023, 2:22 AMmeltano-map-transformer
? then I can get the stream names including -
then just rename them to use _
using the transformer and let target-postgres
just do its thing? Feels like it is the cleanest separation of concerns.
So something like this:
- name: meltano-map-transformer
variant: meltano
pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
mappings:
- name: alias_stream_names
config:
stream_maps:
User_prod:
__source__: User-<123>-prod
__alias__: prod_user
Question_prod:
__source__: Question-<456>-prod
__alias__: prod_question
nelis_verhoef
01/05/2023, 2:23 AMaaronsteers
01/05/2023, 2:37 AM__source__
and __alias__
are intended as inverse alternatives of basically the same operation. How about this syntax?
- name: meltano-map-transformer
variant: meltano
pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
mappings:
- name: alias_stream_names
config:
stream_maps:
User-<123>-prod:
__alias__: prod_user
Question-<456>-prod:
__alias__: prod_question
Or conversely:
- name: meltano-map-transformer
variant: meltano
pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
mappings:
- name: alias_stream_names
config:
stream_maps:
prod_user:
__source__: User-<123>-prod
prod_question:
__source__: Question-<456>-prod
nelis_verhoef
01/05/2023, 3:30 AM__alias__
on it's own previously but I've not tried __source__
on its own yet (until now):
```2023-01-05T025647.479247Z [info ] +Sync Summary---------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.482014Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.483498Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.485555Z [info ] | Question-<123>-prod | LOG_BASED | 566 records | 83.2 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.489006Z [info ] | QuestionSet-<456>-prod | LOG_BASED | 70 records | 38.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.491657Z [info ] | User-<789>-prod | LOG_BASED | 204 records | 51.4 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.492917Z [info ] +---------------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025647.505035Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T025648.375200Z [info ] time=2023-01-05 025648 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025649.412441Z [info ] time=2023-01-05 025649 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025650.536650Z [info ] time=2023-01-05 025650 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025651.542919Z [info ] time=2023-01-05 025651 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025652.334863Z [info ] time=2023-01-05 025652 name=target_postgres level=INFO message=Loading 70 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025653.281289Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025653.303586Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 566 rows into 'tap_dynamodb."question_prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025653.354515Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading 204 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T025653.639268Z [info ] time=2023-01-05 025653 name=target_postgres level=INFO message=Loading into tap_dynamodb…aaronsteers
01/05/2023, 5:56 AMquestion_prod
) is being successfully cloned/aliased, although the original stream is still coming through.
2023-01-05T02:56:54.823194Z [info ] time=2023-01-05 02:56:54 name=target_postgres level=INFO message=Loading into tap_dynamodb."prod": {"inserts": 0, "updates": 566, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-05T02:56:55.053296Z [info ] time=2023-01-05 02:56:55 name=target_postgres level=INFO message=Loading into tap_dynamodb."question_prod": {"inserts": 566, "updates": 0, "size_bytes": 549053} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
aaronsteers
01/05/2023, 6:00 AM# ...
stream_maps:
question_prod:
__source__: Question-<123>-prod
questionset_prod:
__source__: QuestionSet-<456>-prod
user_prod:
__source__: User-<789>-prod
Question-<123>-prod: null
QuestionSet-<456>-prod: null
User-<789>-prod: null
aaronsteers
01/05/2023, 6:01 AM# ...
stream_maps:
Question-<123>-prod:
__alias__: question_prod
QuestionSet-<456>-prod:
__alias__: questionset_prod
User-<789>-prod:
__alias__: user_prod
aaronsteers
01/05/2023, 6:03 AMnelis_verhoef
01/05/2023, 6:05 AMAre the carrot-bracketed numbers literally part of the table name, or is that a placeholder?Nope, it us just a place holder, they look something like this
Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
nelis_verhoef
01/05/2023, 6:07 AMnull
, in the next 10-15 minutes.
Thanks for all the help with this one!aaronsteers
01/05/2023, 6:36 AMNope, it us just a place holder, they look something like thisMakes sense. Just wanted to ask, in case special characters were part of the issue (and sounds like they are not). To help isolate the issue (especially since they all end inQuestion-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
-prod
), it might be worthwhile tackling one stream at a time - setting the other streams to null
in the stream map or deselecting all the others with select
or filter
, enabling and aliasing each one at a time to control for others that might be sneaking through.nelis_verhoef
01/05/2023, 6:53 AMstream_maps:
question_prod:
__source__: Question-<123>-prod
questionset_prod:
__source__: QuestionSet-<456>-prod
user_prod:
__source__: User-<789>-prod
Question-<123>-prod: null
QuestionSet-<456>-prod: null
User-<789>-prod: null
It threw this new error, so again it happily did Question
but failed on QuestionSet
because it's missing the key_properties
field.
```2023-01-05T063944.641303Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.642359Z [info ] INFO Syncing log based for stream: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.643366Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.644469Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.645477Z [info ] INFO Scanning table Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod with params: cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.646394Z [info ] INFO TableName = Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.647303Z [info ] INFO Limit = 1000 cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063944.651679Z [info ] 2023-01-05 063944,612 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer
2023-01-05T063944.657629Z [info ] 2023-01-05 063944,622 | INFO | root | Set null tansform as default for 'Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer
2023-01-05T063945.246880Z [info ] INFO Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Completed sync (566 rows) cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063945.258943Z [info ] INFO QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: Starting sync cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063945.262879Z [info ] INFO Syncing log based for stream: QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063945.264479Z [info ] INFO Clearing state because stream has aged out cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063945.265870Z [info ] INFO Must complete full table sync before replicating from dynamodb streams for QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-05T063945.269818Z [info ] 2023-01-05 063945,264 | INFO | root | Set null tansform as default for 'QuestionSet-pzjs3twtm2y4ip6uvfvy6ukhpa-prod' cmd_type=elb consumer=True name=meltano-map-transformer producer=True stdio=stderr string_id=meltano-map-transformer
2023-01-05T063945.272520Z [info ] 2023-01-05 063945,268 |…nelis_verhoef
01/05/2023, 6:54 AMselect
list...nelis_verhoef
01/05/2023, 10:49 PMkey_properties
error when running one by one.
Am I just completely trying to misuse the meltano-map-transformer
or is this weird/unexpected behaviour?
This might be a silly question but is there a straight forward way for me to debug this? For example could I get the raw and transformed streams to write to disk, that way I can step trough where things go wrong?aaronsteers
01/06/2023, 2:36 AMAll streams fail with the sameThis is still progress, I think.error when running one by one.key_properties
aaronsteers
01/06/2023, 2:39 AMAm I just completely trying to misuse theDo I understand correctly that you are seeing the streams come through (more-or-less) correctly under the aliased names? The failure around key_properties is a separate issue (in the target layer) and has a couple options to mitigateor is this weird/unexpected behaviour?meltano-map-transformer
aaronsteers
01/06/2023, 2:44 AMnelis_verhoef
01/06/2023, 5:55 AM___source___
but then continues to complaine about the key properties if I set the original stream to null
so if my mapper looks like this:
mappings:
- name: alias_stream_names
config:
stream_maps:
user_prod_1:
__source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
"__key_properties__": ["id"]
User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod: null
it hates life.
2023-01-06T05:44:51.346010Z [info ] +Sync Summary--------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.350770Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.355206Z [info ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.357058Z [info ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 218 records | 35.3 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.358449Z [info ] +--------------------------------------+--------------------+---------------+---------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.359695Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T05:44:51.519742Z [info ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.522662Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module> cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.524737Z [info ] sys.exit(main()) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.526232Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 373, in main cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.527487Z [info ] persist_lines(config, singer_messages) cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.529310Z [info ] File "/workspaces/mh_warehouse/cdk/src/mh_warehouse/.meltano/loaders/target-postgres/venv/lib/python3.9/site-packages/target_postgres/__init__.py", line 197, in persist_lines cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.530479Z [info ] raise Exception("key_properties field is required") cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T05:44:51.532066Z [info ] Exception: key_properties field is required cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T054451.626756Z [error ] Loader failed
If I just let it write to the prod
table i.e mapper looks like this (while only selecting the User table in the dynamodb tap) and commenting out the key properties:
```mappings:
- name: alias_stream_names
config:
stream_maps:
user_prod_1:
…nelis_verhoef
01/06/2023, 6:00 AMmappers:
- name: meltano-map-transformer
variant: meltano
pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
mappings:
- name: alias_stream_names
config:
stream_maps:
user_prod_1:
__source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
question_prod_1:
__source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
it is like the stream mapper is not iterating through the tables/streams anymore:
```2023-01-06T055822.994755Z [info ] +Sync Summary------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055822.998344Z [info ] | table name | replication method | total records | write speed | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.000691Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.004125Z [info ] | Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 567 records | 102.6 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.010621Z [info ] | User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod | LOG_BASED | 218 records | 81.9 records/second | cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.012924Z [info ] +------------------------------------------+--------------------+---------------+----------------------+ cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.015614Z [info ] INFO Done syncing. cmd_type=elb consumer=False name=tap-dynamodb producer=True stdio=stderr string_id=tap-dynamodb
2023-01-06T055823.609215Z [info ] time=2023-01-06 055823 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055824.711233Z [info ] time=2023-01-06 055824 name=target_postgres level=INFO message=Table '"question_prod_1"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055826.273088Z [info ] time=2023-01-06 055826 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055827.332930Z [info ] time=2023-01-06 055827 name=target_postgres level=INFO message=Table '"prod"' exists cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055828.632480Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."question_prod_1"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055828.763276Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 567 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055828.863333Z [info ] time=2023-01-06 055828 name=target_postgres level=INFO message=Loading 218 rows into 'tap_dynamodb."prod"' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-06T055829.749532Z [info ] time=2023-01-06 055829 name=target_postgres level=INFO message=Loading into tap_dynamodb."questio…edgar_ramirez_mondragon
01/06/2023, 7:22 PMIt now writes data to theDoes it not output anything for thetable, for questions, but it completely ignores the previously workingquestion_prod_1
table.user_prod_1
user_*
table or it just doesn’t map it? (I’m thinking of a potential issue with state)nelis_verhoef
01/08/2023, 10:44 PMDoes it not output anything for theNope, no data is written to thetable or it just doesn’t map it? (I’m thinking of a potential issue with state)user_*
user
table - it's as if there is no mapping to it.edgar_ramirez_mondragon
01/09/2023, 6:39 PMvisch
01/09/2023, 8:15 PMvisch
01/09/2023, 8:58 PMcat out | meltano invoke target-postgres
Also my tables are question_prod_1
as I'd expect as well.visch
01/09/2023, 8:59 PMvisch
01/09/2023, 9:07 PMvisch
01/09/2023, 9:48 PMvisch
01/09/2023, 9:48 PMversion: 1
send_anonymous_usage_stats: true
project_id: "target-postgres"
default_environment: "dev"
plugins:
extractors:
- name: "tap-smoke-test"
namespace: "tap_smoke_test"
executable: "tap-smoke-test"
pip_url: "git+<https://gitlab.com/meltano/tap-smoke-test.git>"
config:
streams:
- stream_name: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
"input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/animals-data.jsonl>"
- stream_name: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
"input_filename": "<https://gitlab.com/meltano/tap-smoke-test/-/raw/main/demo-data/pageviews-data.jsonl>"
loaders:
- name: "target-postgres"
namespace: "target_postgres"
pip_url: -e .
config:
sqlalchemy_url: "<postgresql://postgres:postgres@localhost:5432/postgres>"
target_schema: test
stream_maps:
user_prod_1:
__source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
question_prod_1:
__source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
settings:
- name: sqlalchemy_url
kind: password
- name: target-singer-jsonl
namespace: target_singer_jsonl
pip_url: "git+<https://github.com/visch/target-singer-jsonl@patch-1>"
executable: target-singer-jsonl
settings:
- name: destination
kind: string
- name: local.folder
kind: string
- name: s3.bucket
kind: string
- name: s3.prefix
kind: string
- name: add_record_metadata
kind: boolean
config:
source: local
local:
folder: output/extract/
recursive: true
mappers:
- name: meltano-map-transformer
variant: meltano
pip_url: git+<https://github.com/MeltanoLabs/meltano-map-transform.git>
mappings:
- name: alias_stream_names
config:
stream_maps:
user_prod_1_test:
__source__: User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
question_prod_1_test:
__source__: Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
environments:
- name: dev
I'll be back!nelis_verhoef
01/09/2023, 9:57 PMtap-smoke-test
extractor, will absolutely use it to debug going forward.visch
01/10/2023, 4:04 PMmeltano-map-transform
has a bug I didn't dive into fixing it but what you were seeing was is a pretty bad bug where if you pass in a stream map like. fyi @ @edgar_ramirez_mondragon
{
"stream_maps": {
"question_prod_1_test": {
"__source__": "User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod"
}
}
}
but the first SCHEMA
message is from the User-pz-pzjs3twtm2y4ip6uvfvy6ukhpa-prod
the stream map will just be ignored.
This is really hard to debug with meltano run
and family which is why @nelis_verhoef’s quesiton about how to debug this is very valid. See https://github.com/meltano/meltano/issues/3232 (please upvote this)visch
01/10/2023, 4:06 PMtarget-postgres
for stream_name_splits
, and I"ll default it to True. There's a couple other things I'll get in with this too. Stay tuned.visch
01/10/2023, 8:29 PMstream_name_splits
config flag set to False
that should do the trick for you. PR https://github.com/MeltanoLabs/target-postgres/pull/59
I'm working on getting SDK issues in for this as well to see if we want to support this globally or notvisch
01/10/2023, 9:16 PMvisch
01/10/2023, 9:34 PMstream_maps
work then we should use alias's
instead
{
"stream_maps": {
"User-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
"__alias__": "user_test"
},
"Question-pzjs3twtm2y4ip6uvfvy6ukhpa-prod": {
"__alias__": "question_test"
}
}
}
Works as you'd expectvisch
01/10/2023, 9:37 PMtarget-postgres
so I implemented it here https://github.com/MeltanoLabs/target-postgres/pull/59
Other things we may want to do
1. If we detect dashes in the stream name we should log that this is happening so folks can look at the config stream_name_splits. Seems like a low cost high value thing to do
2. It's really hard to debug mappers.
a. Can't invoke a mapper
b. Can't dump config for a mapper
3. Smoke test should offer key properties or offer a way to override them as the catalog indicator for primary keys doesn't work.nelis_verhoef
01/11/2023, 6:22 AMtap-postgres
fix and the explanation around this @visch! I'll give the postgres solution a cracknelis_verhoef
01/12/2023, 12:08 AM"
to enable postgres to read case sensitive table names we 'd need to only wrap the table name in "
and be sure to not wrap the fully qualified table name in "
. i.e what I'm seeing happening is that I get errors like these:
2023-01-11T23:47:31.827337Z [info ] sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" does not exist cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.828738Z [info ] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-01-11T23:47:31.830161Z [info ] [SQL: ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
so from my understanding the issue is this command:
ALTER TABLE "tap_dynamo_db.Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER
I think it should be:
ALTER TABLE tap_dynamo_db."Question_pzjs3twtm2y4ip6uvfvy6ukhpa_prod" ADD COLUMN "_sdc_table_version" INTEGER
nelis_verhoef
01/12/2023, 12:15 AMdef conform_name(self, name: str, object_type: Optional[str] = None) -> str:
"""Conforming names of tables, schemas, column names."""
return name.replace("-", "_").lower()
and then remove the explicit handling of cased stream/table names?visch
01/12/2023, 1:50 AMvisch
01/12/2023, 1:50 AMvisch
01/12/2023, 3:09 PMtarget-postgres
I'm working through this, but it's going to a bit before I can get the toggle in place if you're interested I wrote up what happened here https://github.com/MeltanoLabs/target-postgres/issues/58
Getting close though!nelis_verhoef
01/17/2023, 4:04 AM