hello, i'm trying to tap postgres and target bigqu...
# troubleshooting
j
hello, i'm trying to tap postgres and target bigquery. i have a column in one of my db tables that is type jsonb, but for some reason the target type is set to REPEATED JSON so it is expecting an array and failing when it does not find one. does anyone know why the target column has the wrong type, and how to fix it? thank you!
v
can you share more information, best case would be a replicable example into target-bigquery, at a minimum the stack trace you have!
meltano.yml woudl be great too
j
here's the relevant pieces of `meltano.yml`:
Copy code
version: 1
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-postgres.git>
    config:
      database: db
      host: [redacted]
      port: [redacted]
      user: [redacted]
      filter_schemas:
      - public
  loaders:
  - name: target-bigquery
    variant: z3z1ma
    pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
    config:
      project: [redacted]
      credentials_path: google_credentials.json
      denormalized: true
      method: batch_job
      upsert: true
      dedupe_before_upsert: true
  - name: target-bq-canvas
    inherit_from: target-bigquery
    config:
      dataset: lk_canvas
will grab a stacktrace in a minute
here's the table in postgres - note the
narrative_json
field has type
jsonb
and here you can see how it shows up in bigquery - note the REPEATED on `narrative_json`:
still trying to pull the stacktrace, but it's just an error about being unable to parse the json as an array, and the above mismatch of data types is the root cause as far as i can tell
here's the trace @visch :
Copy code
2024-10-24T18:28:36.066315Z [info     ] 2024-10-24 18:28:36,049 | ERROR    | target-bigquery      | Draining all sinks and terminating. cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.144763Z [info     ] Exception in thread Thread-7:  cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.145681Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.146994Z [info     ]   File "/usr/local/lib/python3.9/threading.py", line 980, in _bootstrap_inner cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.147987Z [info     ]     self.run()                 cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.148836Z [info     ]   File "/project/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/target_bigquery/batch_job.py", line 63, in run cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.149286Z [info     ]     client.load_table_from_file( cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.149794Z [info     ]   File "/project/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/cloud/bigquery/job/base.py", line 966, in result cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.150739Z [info     ]     return super(_AsyncJob, self).result(timeout=timeout, **kwargs) cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.151180Z [info     ]   File "/project/.meltano/loaders/target-bigquery/venv/lib/python3.9/site-packages/google/api_core/future/polling.py", line 261, in result cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.152081Z [info     ]     raise self._exception      cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
2024-10-24T18:28:38.152554Z [info     ] google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.; reason: invalid, message: Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0; reason: invalid, message: Error while reading data, error message: JSON parsing error in row starting at position 0: Repeated field must be imported as a JSON array. Field: data. cmd_type=elb consumer=True job_name=dev:tap-postgres-to-target-bq-canvas name=target-bq-canvas producer=False run_id=8c88d629-1686-4255-bf02-fbce0ac29643 stdio=stderr string_id=target-bq-canvas
v
You could try overriding the schema in the tap to be an array, not certain the bigquery target will support that but that might help you get unblocked here!
j
yeah, trying to force the column in the tap to an array or force the column in the target to non-repeated were my first thoughts - i wasn't sure what the right way to do that is, i'm somewhat new to meltano
(second thought was to use a stream-map (?) to put the json into an array on its way over, but that seems worse)
What I"m not sure about is if the type mapping will work or not with bigquery, but it's certainly worth a shot!
j
ah perfect, thanks! yeah somehow i didn't find that documentation when googling.
np 1
v
Not a bigquery user myself so 🤷
I've had troulb ewith google and meltano too, meltanos search is pretty good
👍 1
j
i got it all to work by setting
type: ["object", "null"]
for all the json fields in postgres. unclear why that isn't how they were already being exposed 🤷
dancingpenguin 2
np 1
thanks again!
np 1
v
https://meltano.slack.com/archives/C069CQNHDNF/p1729793238310739?thread_ts=1729784981.090099&amp;cid=C069CQNHDNF Can you add
is_nullable
to that query, and then be sure to either clear the cache or run with
--refresh-catalog
To be safe with the catalog cache just run a
rm -rf .meltano
and then run
meltano invoke --dump=catalog tap-json > catalog.json
and shoot over that file. Might be a bug in tap-postgres we could dump that all into an issue
j
sorry not quite following - where did you want me to add
is_nullable
? and did you mean
tap-postgres
instead of
tap-json
in the
invoke
command at the end?
np 1
v
Copy code
select 
column_name,
data_type ,
is_nullable 
from information_schema.columns
I see how it's confusing 1. I'd like the results of the select with
is_nullable
2. I'd like a dump of the discovery call from tap-postgres so
meltano invoke tap-postgres --discover > catalog.json
j
yeah interesting i think the ones that have
is_nullable
as
YES
are the ones that were causing me trouble
v
one more field sorry, can you put
data_type
in there
j
oh that's the result for
where data_type='jsonb'
ty 1
so
catalog.json
is 85000 lines long ..... but the json columns say
Copy code
"type":[
              "string",
              "number",
              "integer",
              "array",
              "object",
              "boolean",
              "null"
            ]
(both the is_nullable ones and not)
v
ahh there it is!
https://github.com/MeltanoLabs/tap-postgres/blob/main/tap_postgres/client.py#L58 We should probably make a table in the Readme that maps postgres column types to the json schema just to make it more clear and then explain why for this case why it's not just object. There's a couple good reasons (we started with object) I just can't remember off the top of my head
Glad you have it fixed, I was concerned there was a bug!
j
hm ok. maybe the default could be for it just to be
object
and then there could be an option for this more expansive behavior? seems unexpected to me ...
v
probably has to be the opposite but making that more known should be a thing, if you could throw an issue in refrencing this it would be helpful! Or even submit a PR 😄
j
i could try to do a PR - maybe with an option similar to
dates_as_string
called
json_as_object
?
v
That sounds perfect to me
j
ok i'll look into doing that next week
ty 1
does running the
tap-postgres
tests require a locally running postgres server? if so, what's the best way to get that running? i tried using docker:
docker run --name my_postgres -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=postgres -p 5432:5432 -d postgres
but am getting the error
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "localhost" (::1), port 5432 failed: FATAL:  role "postgres" does not exist
e
@Jonathan Cass are you on a Mac and by chance ran
brew start services postgres
at some point? Then the ports might be conflicting. I think you can check using
brew services
.
j
ah yes it looks like that might be the issue, thank you!
🙌 2
created PR as https://github.com/MeltanoLabs/tap-postgres/pull/526 - i'm pretty new to contributing to open source, so let me know if i did anything wrong and i can adjust. thanks!
🔥 2
dancingpenguin 2
v
Awesome @Jonathan Cass I took a quick stab https://github.com/MeltanoLabs/tap-postgres/pull/526#pullrequestreview-2410157890 I'm good with approving and merging @Edgar Ramírez (Arch.dev) I'll let you run with it though 😄
j
thanks @visch - updated the PR with your suggested change
v
Sorry for the bad idea 😄
j
😆
v
We'll merge and then release it
🦜 1
j
🚀
v
ty 😄
j
i'm having trouble figuring out how to update my tap-postgres to 0.0.16. i don't specify a version anywhere, and running
meltano lock --all --update
isn't updating it. deleting the lockfile and removing/readding it didn't help. i noticed that the new option isn't up on meltano hub yet - maybe that is the problem? if so, when does that get updated?
e
What's the value of
pip_url
for
tap-postgres
in your
meltano.yml
?
j
git+<https://github.com/MeltanoLabs/tap-postgres.git>
e
Then you should be getting the very latest from the
main
branch. You can try running
meltano install --clean
to refresh the plugin installation, or changing it to
meltanolabs-tap-postgres
(and also doing a
meltano install
)?
j
yeah i think i was getting the latest version, i just didn't see the option documented in meltanohub or the lockfile because of the yml file linked above.
❓ 1
to clarify: • i originally looked in my tap-postgres lockfile for the new json_as_object option. when i didn't see it, i thought that meant i wasn't using the latest version • i did various reinstalls and that didn't change anything • i noticed on Meltano Hub that it said "Please consider adding any settings you have defined locally to this definition on MeltanoHub by making a pull request to the YAML file that defines the settings for this plugin." • i tried just using the new option even though it wasn't documented and it seemed to have an effect • i concluded that i am indeed on the latest version, but my lockfile and meltano hub are not documenting the new option because the YAML file needs an explicit update
➕ 1
v
Ahh I get the confusion now, I thought you said you weren't getting the "latest version" you are getting the latest version and you can override the
settings
attribute but meltanohub doesn't have the new settings available automatically, got it!
e
Yup, I was similarly confused 😅
I'll update the Hub
dancingpenguin 1
j
I thought you said you weren't getting the "latest version"
yes, that's what i said, because that's what i thought haha
🙌 1
➕ 1
thanks @Edgar RamĂ­rez (Arch.dev)!
e
Our automation is broken in a few places 🫠 so I'll just update the tap's metadata manually for now
Done!
dancingpenguin 2