I'm trying to use `tap-postgres` with `target-bigq...
# plugins-general
c
I'm trying to use
tap-postgres
with
target-bigquery
. Both are "standard" flavors: transferwise and adswerve, respectively. But when I come upon a record that has an array type, I get the following error:
Copy code
target-bigquery       | CRITICAL 'type' or 'anyOf' are required fields in property: {'$ref': '#/definitions/sdc_recursive_string_array'}
target-bigquery       | CRITICAL ['Traceback (most recent call last):\n', '  File "/home/chb/code/work/learning_equality/le-pipeline/meltano-projects/get-star
t/.meltano/loaders/target-bigquery/venv/lib/python3.7/site-packages/target_bigquery/__init__.py", line 93, in main\n    for state in state_iterator:\n', '  Fi
le "/home/chb/code/work/learning_equality/le-pipeline/meltano-projects/get-start/.meltano/loaders/target-bigquery/venv/lib/python3.7/site-packages/target_bigq
uery/process.py", line 40, in process\n    for s in handler.handle_record_message(msg):\n', '  File "/home/chb/code/work/learning_equality/le-pipeline/meltano
-projects/get-start/.meltano/loaders/target-bigquery/venv/lib/python3.7/site-packages/target_bigquery/processhandler.py", line 110, in handle_record_message\n
    new_rec = filter_by_schema(schema, msg.record)\n', '  File "/home/chb/code/work/learning_equality/le-pipeline/meltano-projects/get-start/.meltano/loaders/
target-bigquery/venv/lib/python3.7/site-packages/target_bigquery/schema.py", line 75, in filter\n    record[key])  # adswerve fix to match schema field name\n
', '  File "/home/chb/code/work/learning_equality/le-pipeline/meltano-projects/get-start/.meltano/loaders/target-bigquery/venv/lib/python3.7/site-packages/tar
get_bigquery/schema.py", line 84, in filter\n    prop_type, _ = get_type(props)\n', '  File "/home/chb/code/work/learning_equality/le-pipeline/meltano-project
s/get-start/.meltano/loaders/target-bigquery/venv/lib/python3.7/site-packages/target_bigquery/schema.py", line 23, in get_type\n    f"\'type\' or \'anyOf\' ar
e required fields in property: {property}"\n', "ValueError: 'type' or 'anyOf' are required fields in property: {'$ref': '#/definitions/sdc_recursive_string_ar
ray'}\n"]
I've looked at what I assume is the underlying json at
.meltano/run/tap-postgres/tap.properties.json
and I see what appear to be type definitions:
Copy code
"definitions": {
...

 "sdc_recursive_string_array": {
  "type": [
    "null",
    "string",
    "array"
  ],
   "items": {"$ref": "#/definitions/sdc_recursive_string_array"
   }
  ,
I encountered this on python 3.8.7 and 3.7.10
There's this note in the docs that states: "`data_flattening_max_level`
Integer
(Default: 0) Object type RECORD items from taps can be loaded into VARIANT columns as JSON (default) or we can flatten the schema by creating columns automatically. When value is 0 (default) then flattening functionality is turned off."
Looks like the same issue: https://github.com/RealSelf/target-bigquery/issues/12 , but I'm not using the realself variant, rather adwise
d
Are you comfortable enough with Python to see if you can fix the issue in target-bigquery yourself?
c
Hi, Douwe. Thanks for the response. I'm hesitant to go too deep on finding a solution to this issue as we're still evaluating products in this domain. But let me have a look.
d
Yeah, makes sense. I wonder if another variant of tap-postgres that doesn’t use these $refs, or if there’s another variant of target-bigquery that supports them, for example https://github.com/transferwise/pipelinewise-target-bigquery
I suggest adding that as a custom plugin and seeing if it works. Although the fact that the description still says Snoeflake does not inspire confidence 😅
c
Haha, I'll give it a shot. I've got my eye on the simplejson dependency that seems at the heart of everything. Apparently, the initial RealSelf project calls for 3.11.1, whereas meltano wants <4.0.0,>=3.16.0. There's also the singer-python lib too take into consideration...
This is interesting...
d
@casey What are you thinking?
c
you advised another user to see if the pipelinewise fork was a drop-in replacement for the singer-python dependency... and I think it would be good to check that out... look at the divergence in versioning between each set of dependencies!
d
Yeah, PipelineWise's taps and targets and their singer-python are a lot better maintained than the "official" ones under singer-io
c
how easy would it be to point meltano to the pipelinewise variants? Forking meltano, cloning to a local repo, changing the requirments, and then installing via pip from that local repo?
Much easier
That's if you want to use one of the pipelinewise taps or targets in Meltano that isn't supported out of the box yet.
If you want to move a specific tap or target over to pipelinewise-singer-python, that would require forking and modifying
c
are you saying that, if I were to reference a pipelinewise tap or target in my pipeline, meltano would bring in the pipelinewise fork of singer-python automatically?
and I could avoid singer-python?
d
Most likely, yes, because that tap or target would itself define pipelinewise-singer-python as a dependency
But Meltano just installs each plugin using
pip install <pip_url>
, it doesn't specifically inject any new dependencies
c
This sounds good. I'm going to give it a shot. Thanks for the lead.
d
Let me know how it goes!
c
Will do!
Well, I was able to install
Copy code
pipelinewise-singer-python==1.2.0
pipelinewise-tap-postgres==1.7.1
pipelinewise-target-bigquery==1.0.1
but the problem is that, if I let the version of meltano "float" so to speak, pip ends up installing
0.15.1
. No joke--I saw it literally iterate over each version, from 1.69 to 0.15.1, trying to find something that was compatible with the previously installed packages from pipelinewise. The deps at the hear of the matter are simpleson and jsonschema. It's as if the various maintainers agreed to pin versioning in an interleaved fashion, with the transferwise/pipelinewise team pinning on every other range of versions of these two libraries and meltano's choosing the interleaving ranges.
so I get the following:
Copy code
meltano 1.69.0 requires jsonschema<3.0.0,>=2.6.0, but you have jsonschema 3.2.0 which is incompatible. 
meltano 1.69.0 requires simplejson<4.0.0,>=3.16.0, but you have simplejson 3.11.1 which is incompatible

and for meltano 1.68

meltano 1.68.0 depends on simplejson<4.0.0 and >=3.16.0       
pipelinewise-singer-python 1.2.0 depends on simplejson==3.11.1

and for meltano 1.67

The conflict is caused by:                                         
    meltano 1.67.0 depends on jsonschema<3.0.0 and >=2.6.0         
    pipelinewise-singer-python 1.2.0 depends on jsonschema==3.2.0  
    meltano 1.67.0 depends on jsonschema<3.0.0 and >=2.6.0         
    pipelinewise-singer-python 1.1.4 depends on jsonschema==3.2.0  
    meltano 1.67.0 depends on jsonschema<3.0.0 and >=2.6.0         
    pipelinewise-singer-python 1.1.3 depends on jsonschema==3.2.0  
    meltano 1.67.0 depends on simplejson<4.0.0 and >=3.16.0        
    pipelinewise-singer-python 1.1.2 depends on simplejson==3.11.1 
    meltano 1.67.0 depends on simplejson<4.0.0 and >=3.16.0        
    pipelinewise-singer-python 1.1.1 depends on simplejson==3.11.1 
    meltano 1.67.0 depends on simplejson<4.0.0 and >=3.16.0        
    pipelinewise-singer-python 1.1.0 depends on simplejson==3.11.1 
    meltano 1.67.0 depends on simplejson<4.0.0 and >=3.16.0        
    pipelinewise-singer-python 1.0.0 depends on simplejson==3.11.1
d
@casey If we're installing dependencies in the same virtual environment where Meltano lives, something is going wrong 😬 As mentioned under http://meltano.com/docs/command-line-interface.html#add, Meltano automatically installs every plugin in a dedicated venv with its own set of dependencies, exactly to prevent dependency conflicts between different plugins and Meltano itself. There should be no need to install
(pipelinewise-)singer-python
where
meltano
is installed, if you're using
meltano add
to add specific plugins.
@casey Let me know if you'd like to jump on a call for a moment so we can look at this together
j
I'm running into the same error when running
tap-postgres
target-bigquery
so jumping on this thread to follow updates 🙂
d
@jake_hannan Try adding pipelinewise-target-bigquery as a custom plugin and using that instead of adswerve's variant: https://meltano.slack.com/archives/C013EKWA2Q1/p1613597074060900?thread_ts=1613534128.056500&amp;cid=C013EKWA2Q1
j
yep will give that a shot when I have a moment today!
c
Okay. A little more progress: the pipelinewise repos seem to have updated their dependencies automatically, via a dependabot, in some cases only 4 commits after their last release, going from 3.11.1 to 3.17.2 for
simpleson
. So I just installed those via
pip install <git url>
. I cloned meltano and changed the
poetry.lock
and
pyproject.toml
files such that the
jsonschema
is
3.2.0
(it's at
2.6.0
, iirc). Then I did a
poetry build
in the
meltano
project directory. That get's me a wheel file that I then can use with
pip install ./meltano/dist/wheelfilename
.
d
@casey There should be no need to touch Meltano's own dependencies, since every plugin gets its own virtual environment with its own set of dependencies, so it's fine if Meltano uses one version of
jsonschema
, a tap uses another, and a target uses another still.
I would suggest creating a new, clean environment for Meltano and installing only Meltano into it using
pip install meltano
, and then adding new plugins (taps and targets) using
meltano add --custom
, so that they get their own virtual environment.
I'd be happy to jump on a quick Zoom call so we can look at the current state of things together and clear up any confusion around plugins, packages and dependencies 🙂
c
Let me give what you've just suggested a try.
@douwe_maan what version of Python do you suggest using with Meltano?
d
3.8.latest
But 3.7 and 3.6 are also supported
c
meltano invoke --plugin-type extractor pipelinewise-tap-postgres --help
fails after I add it as a custom plugin. Here's my `meltano.yml`:
Copy code
version: 1
send_anonymous_usage_stats: true
project_id: 77721c56-636c-4ea0-af68-95f764fc9319
plugins:
  extractors:
  - name: pipelinewise-tap-postgres
    namespace: pipelinewise_tap_postgres
    pip_url: git+<https://github.com/transferwise/pipelinewise-tap-postgres.git>
    executable: pipelinewise-tap-postgres
    capabilities:
    - discover
    - state
    settings:
    - name: host
    - name: port
    - name: user
    - name: password
    - name: dbname
    - name: filter_schemas
    - name: ssl
    - name: logical_poll_seconds
    - name: break_at_end_lsn
    - name: max_run_seconds
    - name: debug_lsn
    config:
      host: x.x.x.x
      user: usrnamex
      dbname: dbnamex
Copy code
[2021-02-19 14:53:14] meltano install extractor pipelinewise-tap-postgres                                                                                      
Installing 1 plugins...                                                                                                                                        
Installing extractor 'pipelinewise-tap-postgres'...                                                                                                            
Installed extractor 'pipelinewise-tap-postgres'                                                                                                                
                                                                                                                                                               
                                                                                                                                                               
[chb] le-pipeline/meltano/le-meltano via v3.8.7 (le-meltano)                                                                                                  
[2021-02-19 14:54:05] meltano invoke --plugin-type extractor pipelinewise-tap-postgres --help                                                                  
Executable 'pipelinewise-tap-postgres' could not be found. Extractor 'pipelinewise-tap-postgres' may not have been installed yet using `meltano install extract
or pipelinewise-tap-postgres`, or the executable name may be incorrect.                                                                                        
                                                                                                                                                               
[chb] le-pipeline/meltano/le-meltano via v3.8.7 (le-meltano)                                                                                                  
[2021-02-19 14:54:21]
Again, with a brand-new virtualenv:
Copy code
[chb] leq-meltano via v3.8.7 (le-meltano)
[2021-02-19 15:05:44] meltano add --custom extractor pipelinewise-tap-postgres
Adding new custom extractor with name 'pipelinewise-tap-postgres'...

Specify the plugin's namespace, which will serve as the:
- identifier to find related/compatible plugins
- default database schema (`load_schema` extra),
  for use by loaders that support a target schema

Hit Return to accept the default: plugin name with underscores instead of dashes

(namespace) [pipelinewise_tap_postgres]:

Specify the plugin's `pip install` argument, for example:
- PyPI package name:
        pipelinewise-tap-postgres
- Git repository URL:
        git+<https://gitlab.com/meltano/pipelinewise-tap-postgres.git>
- local directory, in editable/development mode:
        -e extract/pipelinewise-tap-postgres

Default: plugin name as PyPI package name

(pip_url) [pipelinewise-tap-postgres]: git+<https://github.com/transferwise/pipelinewise-tap-postgres.git>

Specify the package's executable name

Default: package name derived from `pip_url`

(executable) [pipelinewise-tap-postgres]:

Specify the tap's supported Singer features (executable flags), for example:
        `catalog`: supports the `--catalog` flag
        `discover`: supports the `--discover` flag
        `properties`: supports the `--properties` flag
        `state`: supports the `--state` flag

To find out what features a tap supports, reference its documentation or try one
of the tricks under <https://meltano.com/docs/contributor-guide.html#how-to-test-a-tap>.

Multiple capabilities can be separated using commas.

Default: no capabilities

(capabilities) [[]]: discover,state

Specify the tap's supported settings (`config.json` keys)

Nested properties can be represented using the `.` separator,
e.g. `auth.username` for `{ "auth": { "username": value } }`.

To find out what settings a tap supports, reference its documentation.

Multiple setting names (keys) can be separated using commas.

Default: no settings

(settings) [[]]: host,port,user,password,dbname,filter_schemas,ssl,logical_poll_seconds,break_at_end_lsn,max_run_seconds,debug_lsn
Added extractor 'pipelinewise-tap-postgres' to your Meltano project

Installing extractor 'pipelinewise-tap-postgres'...
Installed extractor 'pipelinewise-tap-postgres'


[chb] leq-meltano via v3.8.7 (le-meltano)
[2021-02-19 15:08:01] meltano invoke pipelinewise-tap-postgres --help
Executable 'pipelinewise-tap-postgres' could not be found. Extractor 'pipelinewise-tap-postgres' may not have been installed yet using `meltano install extractor pipelinewise-tap-postgres`, or the executable name may be incorrect.

[chb] leq-meltano via v3.8.7 (le-meltano)
[2021-02-19 15:08:49]
Groan... despite me giving a
pipelinewise-tap-postgres
when prompted to
Specify the package's executable name
, the binary in
venv/bin
is named
tap-postgres
d
Is it working now?
c
I changed `executable`under
extractors
in
meltano.yml
to
tap-postgres
. Now,
meltano invoke pipelinewise-tap-postgres --help
works. But I wonder if this will pose problems in the future.
d
No, should be good! That's exactly what the
executable
option is for; executable names and package names don't always match, and I don't imagine that'll change for this particular package
c
I'm now starting to understand the utility of meltano and the necessity of having different python venvs for each tap/target
j
I got `pipelinewise-target-bigquery`working after updating the `executable`as @casey mentioned above -- thanks both of you for the help 🙂
d
@jake_hannan Great! Consider contributing the definition to Meltano's index of discoverable plugins: https://meltano.com/docs/contributor-guide.html#discoverable-plugins as an alternative to the existing
adswerve
variant of
target-bigquery
: https://gitlab.com/meltano/meltano/blob/master/src/meltano/core/bundle/discovery.yml#L1132 If you're not sure how exactly you'd add it to
discovery.yml
, it would already help if you could create an issue and share the
meltano.yml
definition you ended up with!
c
@jake_hannan what version of Python were you using when you got this to work?
j
@casey I'm running these commands using docker, and the default for those images is 3.6 https://meltano.com/docs/installation.html#using-pre-built-docker-images
c
Thanks, I'll try it out if I can't get anything working outside of a container.
j
Copy code
loaders:
  - name: pipelinewise-target-bigquery
    namespace: pipelinewise_target_bigquery
    pip_url: pipelinewise-target-bigquery
    executable: target-bigquery
    config:
      project_id: warehouse-name
      default_target_schema: schema-name
      primary_key_required: false
here's my `meltano.yml`for that loader - all of my issues seemed to be around the executable that you mentioned and explicitly making it `target-bigquery`seemingly solved my issues ... not sure if that helps!