magnificent-morning-79243
09/15/2023, 4:46 AMcareful-nail-2651
09/15/2023, 10:21 AMfull-parrot-81023
09/15/2023, 3:07 PM- name: target-snowflake
variant: meltanolabs
pip_url: meltanolabs-target-snowflake
config:
account: ${SNOWFLAKE_ACCOUNT}
database: ${SNOWFLAKE_DATABASE}
default_target_schema: ${SNOWFLAKE_DEFAULT_SCHEMA}
schema: ${SNOWFLAKE_SCHEMA}
user: ${SNOWFLAKE_USER}
warehouse: ${SNOWFLAKE_WAREHOUSE}
The response for meltano run tap-csv target-snowflake --dry-run:
2023-09-15T14:55:23.540493Z [info ] Environment 'dev' is active
2023-09-15T14:55:25.102589Z [info ] Setting 'console' handler log level to 'debug' for dry run
2023-09-15T14:55:25.693958Z [debug ] Remote `discovery.yml` manifest could not be downloaded.
2023-09-15T14:55:25.694135Z [debug ] 404 Client Error: Not Found for url: <https://discovery.meltano.com/discovery.yml?project_id=bf26b5bf-1148-4dd7-beb6-24dce88addd1>
2023-09-15T14:55:26.458180Z [debug ] Found plugin parent parent=tap-csv plugin=tap-csv source=<DefinitionSource.DISCOVERY: 1>
2023-09-15T14:55:26.459196Z [debug ] found plugin in cli invocation plugin_name=tap-csv
2023-09-15T14:55:28.497938Z [debug ] Found plugin parent parent=target-snowflake plugin=target-snowflake source=<DefinitionSource.HUB: 2>
2023-09-15T14:55:28.500111Z [debug ] found plugin in cli invocation plugin_name=target-snowflake
2023-09-15T14:55:28.500310Z [debug ] head of set is extractor as expected block=<meltano.core.plugin.project_plugin.ProjectPlugin object at 0x7f8d2f85b550>
2023-09-15T14:55:28.643617Z [debug ] Variable '$variable_for_dev_tests' is not set in the provided env dictionary.
2023-09-15T14:55:28.643699Z [debug ] Variable '$AZURE_CONNECTION_STRING_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.643768Z [debug ] Variable '$AWS_ACCESS_KEY_ID_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.643839Z [debug ] Variable '$AWS_SECRET_KEY_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.731258Z [debug ] found block block_type=loaders index=1
2023-09-15T14:55:28.731393Z [debug ] blocks idx=1 offset=0
2023-09-15T14:55:28.877574Z [debug ] Variable '$variable_for_dev_tests' is not set in the provided env dictionary.
2023-09-15T14:55:28.877658Z [debug ] Variable '$AZURE_CONNECTION_STRING_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.877712Z [debug ] Variable '$AWS_ACCESS_KEY_ID_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.877776Z [debug ] Variable '$AWS_SECRET_KEY_STATE_BACKEND_DEV' is not set in the provided env dictionary.
2023-09-15T14:55:28.904886Z [debug ] Variable '$SNOWFLAKE_DEFAULT_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.916385Z [debug ] Variable '$SNOWFLAKE_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.939637Z [debug ] Variable '$SNOWFLAKE_DEFAULT_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.950039Z [debug ] Variable '$SNOWFLAKE_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.974250Z [debug ] Variable '$SNOWFLAKE_DEFAULT_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.986175Z [debug ] Variable '$SNOWFLAKE_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.998077Z [debug ] Variable '$MELTANO_LOAD_SCHEMA' is not set in the provided env dictionary.
2023-09-15T14:55:28.998939Z [debug ] found ExtractLoadBlocks set offset=0
2023-09-15T14:55:28.999109Z [debug ] All ExtractLoadBlocks validated, starting execution.
2023-09-15T14:55:30.395169Z [info ] Dry run, but would have run block 1/1.
acoustic-kilobyte-81969
09/15/2023, 7:26 PMpolite-hydrogen-86575
09/16/2023, 11:34 AMfull_table
. Since taps and targets don't know about each other by design, when tap sends full_table, target still upserts while keeping the existing records on the target. How do I make it truncate the target for the tables that are configured as full_table refresh?polite-hydrogen-86575
09/16/2023, 12:35 PMreplication_key_value
(the value stored in the meltano state) falls behind. For example in the source system the max value for the updated_at field is in august, so new records does not get inserted.
Is there a way to configure this properly, like telling the tap to use coalesce(updated_at, created_at)
?alert-easter-79768
09/16/2023, 1:57 PMaverage-carpenter-55609
09/16/2023, 8:41 PMFailed validating 'type' in schema['properties']['amount']: {'type': ['string', 'null']} On instance['amount']:Decimal('500.0')
This is my config, I also tried to manually override the schema:
extractors:
- name: tap-google-sheets
variant: matatika
pip_url: git+<https://github.com/Matatika/tap-google-sheets.git>
config:
child_sheet_name: ''
flattening_enabled: false
flattening_max_depth: 5
stream_maps:
finance:
amount: float(record['amount'])
schema:
amount:
type: ["float","null"]
If i just call invoke on tap-google-sheets i get this in the logs:
2023-09-16 20:24:32,070 | INFO | tap-google-sheets | Tap has custom mapper. Using 1 provided map(s).
{"type": "SCHEMA", "stream": "finance", "schema": {"properties": {"date": {"type": ["string", "null"]}, "what": {"type": ["string", "null"]}, "amount": {"type": ["number", "null"]}, "account": {"type": ["string", "null"]}, "category": {"type": ["string", "null"]}, "one_time": {"type": ["string", "null"]}}, "type": "object"}, "key_properties": []}
which has the correct type on amount, but then a few lines below it seems to use the auto generated schema:
`{"type": "SCHEMA", "stream": "finance", "schema": {"type": "object", "properties": {"date": {"type": ["string", "null"]}, "what": {"type": ["string", "null"]}, "amount": {"type": ["string", "null"]}, "account": {"type": ["string", "null"]}, "category": {"type": ["string", "null"]}, "one_time": {"type": ["string", "null"]}}}, "key_properties": []}
Does anyone know what I'm doing wrong? Thanks in advanceadventurous-iron-91541
09/18/2023, 8:48 AMmeltano run
there's no error. However I couldn't see tha state file in s3. I'm using meltano version 3.0.0
. I dockerize the project files based on this using image meltano/meltano:v3.0.0 https://docs.meltano.com/guide/containerization and run. Here's my `meltano.yaml`:
version: 1
default_environment: dev
project_id: 82c0c0c9-9249-4fbb-adea-2d3f2c6bb210
environments:
- name: dev
- name: staging
- name: prod
state_backend:
uri: <s3://my-bucket/meltano/state>
plugins:
extractors:
- name: tap-facebook--my
inherit_from: tap-facebook
variant: singer-io
pip_url: git+<https://github.com/singer-io/tap-facebook.git>
config:
start_date: '2023-09-18'
account_id: ''
include_deleted: true
insights_buffer_days: 1
select:
- ads_insights.*
loaders:
- name: target-bigquery--facebook-my
inherit_from: target-bigquery
variant: z3z1ma
pip_url: git+<https://github.com/z3z1ma/target-bigquery.git>
config:
credentials_path: .config/cred.json
project: project
dataset: dataset
denormalized: true
method: batch_job
upsert: true
dedupe_before_upsert: true
and when i see the logs i could see there's no error:
2023-09-18T08:40:34.505263Z [info ] smart_open.s3.MultipartWriter('my-bucket', 'meltano/state/dev:tap-facebook--my-to-target-bigquery--facebook-my/lock'): uploading part_num: 1, 16 bytes (total 0.000GB)
2023-09-18T08:40:34.696251Z [info ] No state found for dev:tap-facebook--my-to-target-bigquery--facebook-my
This is logs at the end of the run:
2023-09-18T08:42:19.250693Z [info ] 2023-09-18 08:42:19,250 | INFO | target-bigquery | Target 'target-bigquery' completed reading 883 lines of input (880 records, (0 batch manifests, 2 state messages). cmd_type=elb consumer=True name=target-bigquery--facebook-my producer=False stdio=stderr string_id=target-bigquery--facebook-c2b-my
2023-09-18T08:42:19.251212Z [info ] 2023-09-18 08:42:19,250 | INFO | target-bigquery | [f0453f7fe38b4e83957d55dfb937ac56] Loaded 100238 bytes into data-298903.meltano_ingestion.ads_insights__1695026440. cmd_type=elb consumer=True name=target-bigquery--facebook-my producer=False stdio=stderr string_id=target-bigquery--facebook-my
2023-09-18T08:42:32.423594Z [info ] Block run completed. block_type=ExtractLoadBlocks err=None set_number=0 success=True
polite-hydrogen-86575
09/18/2023, 8:54 AMmetadata:
"public-*":
replication-method: INCREMENTAL
replication-key: update
public-table01:
replication-method: FULL_TABLE
Without public-*
, public-table01
works fine. when I include public-*
it crashes at public-table01
. I think the replication-key defined on public-* also affects the public-table01.
meltano 3.0, python 3.10, tap-postgres v[could not be detected], Meltano SDK v0.31.1
Update: opened a bug ticket https://github.com/meltano/sdk/issues/1964polite-hydrogen-86575
09/18/2023, 10:06 AMpolite-hydrogen-86575
09/18/2023, 11:08 AMid options
1 null
2 null
3 [{"label": "Missing Tracking", "value": 1}, {"label": "Wrong label", "value": 4}]
Schema generated by tap-postgres: "options": {"properties": {}, "type": ["object", "null"]}
target-postgres throws error on validation
Failed validating 'type' in schema['properties']['options']: {'properties': {}, 'type': ['object', 'null']}
On instance['options']:
[{'label': 'Missing Tracking', 'value': 1},
{'label': 'Wrong label', 'value': 4}]
What should I do?most-eve-59826
09/18/2023, 4:11 PMtap-mysql
(transferwise variant) and loading it into postgres via target-postgres
(also transferwise). Iโve observed that varchar column values containing newlines get mangled, such that theyโre turned into regular โ\โ and โnโ characters. Iโm guessing thereโs an issue with escaping/quoting, but couldnโt find any way to control the behavior of either plugin.
Has anyone experienced this before? Did you find a solution?late-alligator-51852
09/18/2023, 6:09 PMtarget-postgres
(meltanolabs variant)
1. Is the batch size of fixed ? How do I find out what it is and how to change it ? I am noticing that in my logs, I see a different number everytime.
Target sink for foo is full. Draining..
METRIC: {"type": "counter", "metric": "record_count", "value": 28711,..}
Target sink for foo is full. Draining..
METRIC: {"type": "counter", "metric": "record_count", "value": 40006,..}
2. When I do a full refresh, I notice that my pod (running the pipeline) crashes due to OOM. If we are using batching, that shouldn't happen right? or am I missing something.
3. Is the batch size decided by the tap, target or both ?
4. What strategy is used to write state info, I am noticing that the job is writing state very less frequently! (once an hour)..more details in 2nd commentgentle-tailor-54214
09/19/2023, 7:54 AMfresh-river-14739
09/19/2023, 11:54 AMalert-easter-79768
09/19/2023, 1:31 PMtap-jira
. While the parent stream gets the unique records, the child stream is fetching duplicate records. How can I tackle this?faint-wall-92783
09/19/2023, 6:02 PMplugins:
extractors:
- name: tap-mysql
variant: transferwise
pip_url: git+<https://github.com/B2tGame/pipelinewise-tap-mysql-ONMO.git@master>
Today, I had this error when was building the docker image
[+] Building 2.5s (10/10) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 534B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 193B 0.0s
=> [internal] load metadata for <http://docker.io/meltano/meltano:latest|docker.io/meltano/meltano:latest> 0.0s
=> [1/6] FROM <http://docker.io/meltano/meltano:latest|docker.io/meltano/meltano:latest> 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 672B 0.0s
=> CACHED [2/6] WORKDIR /project 0.0s
=> CACHED [3/6] COPY ./requirements.txt . 0.0s
=> CACHED [4/6] RUN pip install -r requirements.txt 0.0s
=> CACHED [5/6] COPY . . 0.0s
=> ERROR [6/6] RUN meltano install 2.5s
------
> [6/6] RUN meltano install:
1.066 Extractor 'tap-mysql' is not known to Meltano. Try running `meltano lock --update --all` to ensure your plugins are up to date.
------
Dockerfile:13
--------------------
11 | # Copy over Meltano project directory
12 | COPY . .
13 | >>> RUN meltano install
14 |
15 | # Don't allow changes to containerized project files
--------------------
ERROR: failed to solve: process "/bin/sh -c meltano install" did not complete successfully: exit code: 1
Not sure of the reason .. was there any recent changes on the meltano image referenced in the docker file?
# <http://registry.gitlab.com/meltano/meltano:latest|registry.gitlab.com/meltano/meltano:latest> is also available in GitLab Registry
ARG MELTANO_IMAGE=meltano/meltano:latest
FROM $MELTANO_IMAGE
WORKDIR /project
# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt
# Copy over Meltano project directory
COPY . .
RUN meltano install
# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1
# Expose default port used by `meltano ui`
EXPOSE 5000
ENTRYPOINT ["meltano"]
better-diamond-57621
09/19/2023, 6:13 PMgifted-truck-6667
09/20/2023, 11:27 AMmeltano add extractor tap-surveymonkey --variant danladd
Added extractor 'tap-surveymonkey' to your Meltano project
Variant: danladd
Repository: <https://gitlab.com/danladd/tap-surveymonkey>
Documentation: <https://hub.meltano.com/extractors/tap-surveymonkey--danladd>
Installing extractor 'tap-surveymonkey'...
Extractor 'tap-surveymonkey' could not be installed: failed to install plugin 'tap-surveymonkey'.
Need help fixing this problem? Visit <http://melta.no/> for troubleshooting steps, or to
join our friendly Slack community.
Failed to install plugin(s)
fresh-river-14739
09/20/2023, 11:32 AMmeltano lock --update --all
to ensure your plugins are up to date.
Here is my config meltano.yml config
- name: target-clickhouse
pip_url: git+<https://github.com/usermaven/target-clickhouse.git>
config:
default_target_schema: default
flattening_enabled: true
flattening_max_depth: 3
add_record_metadata: true
sqlalchemy_url: clickhouse+<http://default>:@localhost:8123
acoustic-kilobyte-81969
09/20/2023, 4:29 PMstream_id
from the input stream so i guess somehow controlling this within the tap would seemingly work: https://github.com/transferwise/pipelinewise-target-redshift/blob/master/target_redshift/db_sync.py#L554C29flat-bear-81546
09/21/2023, 1:13 AMmeltano install
installing a tap/target and failing (even with debug logging on) won't show the error? I swear it used to show what the error was from pipgentle-tailor-54214
09/21/2023, 7:56 AMtap-postgres
and target-redshift
, and both are behind SSH bastion servers? The Postgres tap supports SSH tunneling configuration, but that's not the case for the Redshift target (which doesn't seem to be actively maintained BTW). Am I out of luck?damp-vase-67553
09/21/2023, 9:45 AMmeltano add --custom extractor xyz
(sourced using pip -e ../xyz
). I would like to have both, custom extractor, and the meltano project using this extractor live in the same repository.
Is there any way to add the custom extractor with all the settings options being populated automatically instead of specifying them all manually in the interactive prompt?
(I tried consulting docs of both Meltano and SingerSDK, however, with no success ๐ I also tried searching in existing GitHub Issues for both project, but didn't manage to find anything relevant/helpful)future-hospital-39058
09/21/2023, 12:06 PMtap-spreadsheet-anywhere
? ie if a field is missing in a csv file ti should default to null:
https://hub.meltano.com/extractors/tap-spreadsheets-anywhere/
I've been looking at "field_names" without luckswift-furniture-65258
09/21/2023, 7:51 PM{"inputRequests":[{"a":"string","b":"string","c":"string"}]}
Those configuration pairs (a,b,c) are stored in a growing list in a SQL database. My idea was to setup a pipeline where Plugin 1: โExtract those pairs from the SQL and provide them as JSONโ and Plugin 2: I take those pairs now as JSON as input to query the API mentioned in the beginning.
First - I wonder if this is the intended way to go, and how I can achieve this - handing the data result from one plugin to another one as kind of steps.
And Second - how do I configure my local extractor plugin to accept a rather complex json structure as input?
For the REST API I have used the cookiecutter template going the REST setting route - Rows -> Row
contains the actual results, while the rest is just to cope with unsorted results.
Currently my files are:
streams.py
class CustomResponseStream(customRestStream):
"""Define custom stream."""
name = "CustomResponse"
rest_method = "POST"
records_jsonpath = "$.[*]"
path = ""
def prepare_request_payload(
self, context: t.Optional[dict], next_page_token: t.Optional[t.Any]
) -> t.Optional[dict]:
return {
"inputRequests": self.config["inputRequestConfig"],
}
schema = th.PropertiesList(
th.Property("CustomResponse", th.ArrayType(
th.ObjectType(
th.Property("a", th.StringType),
th.Property("b", th.StringType),
th.Property("c", th.StringType),
th.Property("Rows", th.PropertiesList(
th.Property("Row", th.ArrayType(th.StringType)),
))
)
)),
).to_dict()
tap.py
...
config_jsonschema = th.PropertiesList(
th.Property(
"auth_token",
th.StringType,
required=True,
secret=True, # Flag config as protected.
description="The token to authenticate against the API service",
),
th.Property(
"inputRequestConfig",
th.ArrayType(
th.ObjectType(
th.Property("a", th.StringType),
th.Property("b", th.StringType),
th.Property("c", th.StringType)
)
),
required=True,
description="Project IDs to replicate",
),
th.Property(
"api_url",
th.StringType,
default="<https://whateverapi.com/>....",
description="The url for the API service",
),
).to_dict()
...
melodic-jackal-40828
09/22/2023, 5:14 AMwide-actor-85107
09/22/2023, 2:27 PMcolossal-zebra-32658
09/22/2023, 7:18 PM