hoodox
06/22/2022, 12:10 PMvisch
06/22/2022, 12:23 PMtarget-bigquery
here https://hub.meltano.com/loaders/target-bigquery to see if one of them does them as a External BQ Table. I'd assume not as why wouldn't we just load the data into BigQuery if we have all the data? Maybe you could expand on your use case more?
2. No there isn't, if you really wanted to do it in Meltano then I'd say write a Utility https://hub.meltano.com/ . What I do is however I"m running Meltano I tend to write some bash to get things to happen the way I want (git hub actions, git lab actions, SQL Agent Runs (PowerShell, Batch, etc) , Windows Task Scheduler) , and then I put those definitions in ./orchestratealexander_butler
06/22/2022, 1:35 PMplugins:
transformers:
- name: dbt-bigquery
pip_url: dbt-core~=1.1.0 dbt-bigquery~=1.1.0
commands:
stage-gcs:
args: run-operation stage_external_sources
description: Stage external sources in Google Cloud Storage
stage-gcs-refresh:
args: "run-operation stage_external_sources --vars 'ext_full_refresh: true'"
description: Stage external sources in Google Cloud Storage
Third: Add a job with dbt-bigquery:stage-gcs
Take this pattern and adjust how you need to.hoodox
06/22/2022, 2:40 PMhoodox
06/22/2022, 2:46 PMvisch
06/22/2022, 2:47 PMvisch
06/22/2022, 2:48 PMhoodox
06/22/2022, 2:49 PMhoodox
06/22/2022, 3:02 PMhoodox
06/22/2022, 3:07 PMhoodox
06/22/2022, 3:08 PMhoodox
06/22/2022, 5:31 PMhoodox
06/22/2022, 5:31 PMpackages:
- package: dbt-labs/dbt_external_tables
version: 0.8.0,
hoodox
06/22/2022, 5:32 PMhoodox
06/22/2022, 5:33 PMalexander_butler
06/22/2022, 6:20 PMinstall_path
is where dbt puts the installed packages, not where you need to put the yamlhoodox
06/22/2022, 9:26 PMchristoph
06/22/2022, 9:31 PMtransform/models
hoodox
06/23/2022, 8:38 AMplugins:
transformers:
- name: dbt-bigquery
pip_url: dbt-core~=1.1.0 dbt-bigquery~=1.1.0
commands:
stage-gcs:
args: run-operation stage_external_sources
description: Stage external sources in Google Cloud Storage
stage-gcs-refresh:
args: "run-operation stage_external_sources --vars 'ext_full_refresh: true'"
description: Stage external sources in Google Cloud Storage
run meltano --environment=dev run dbt-bigquery:run but it seems nothing is picked up. My other transforms model are called.alexander_butler
06/23/2022, 3:20 PMmeltano --environment=dev run dbt-bigquery:stage-gcs
??hoodox
06/23/2022, 3:24 PMhoodox
06/23/2022, 3:28 PMalexander_butler
06/23/2022, 3:29 PMhoodox
06/23/2022, 3:29 PMalexander_butler
06/23/2022, 3:29 PMalexander_butler
06/23/2022, 3:30 PMhoodox
06/23/2022, 3:30 PMsource.yml version: 2
sources:
- name: keycloak
database: pre_arc_keycloak_is
loader: gcloud storage
tables:
- name: user_entity_gcs
description: "External table of keycloak user_entity, stored as CSV files in Cloud Storage"
external:
location: '<gs://quester-d-lan-keycloak-is/kubernetes/prod/quester-gke-prod/postgres-backup-jobs/postgres-prod/cluster-postgres-prod/user_entity_refresh_220622T04_00_05.csv>'
options:
format: csv
skip_leading_rows: 1
# if you want a partitioned table, file paths MUST be Hive-style:
# '<gs://bucket/path/collector_hour=2020-01-01/>'
# '<gs://bucket/path/collector_hour=2020-01-02/>' (etc)
# hive_partition_uri_prefix: '<gs://bucket/path/>'
# partitions:
# - name: collector_date
# data_type: date
columns:
- name: id
data_type: varchar(255)
description: "ID"
- name: username
data_type: varchar(255)
description: "Username"
- name: email
data_type: varchar(255)
description: "email"
- name: email_constraint
data_type: varchar(255)
description: "email"
- name: email_verified
data_type: varchar(2)
description: "t for verified, f means not verified"
- name: enabled
data_type: varchar(2)
description: "t account enaabled, f account not enabled"
- name: first_name
data_type: varchar(255)
description: "first_name"
- name: last_name
data_type: varchar(255)
description: "last_name"
- name: created_date
data_type: timestamp
description: "created date"
- name: realm_id
data_type: varchar(255)
description: "Keycloak realm"
- name: service_account_client_link
data_type: varchar(255)
description: "the service account link"
alexander_butler
06/23/2022, 3:31 PMdatabase
key piece maybe in the above yaml?alexander_butler
06/23/2022, 3:31 PMhoodox
06/23/2022, 3:34 PMdbt-bigquery stage-gcs
failed with exit code: 1. How do you turn on debug?hoodox
06/23/2022, 3:35 PMalexander_butler
06/23/2022, 3:36 PMschema
key in the source.yml
otherwise the dataset should be the same as the source name, in this case keycloak
alexander_butler
06/23/2022, 3:38 PMmeltano --environment=dev invoke dbt-bigquery --debug run-operation stage_external_sources
alexander_butler
06/23/2022, 3:39 PMmeltano --environment=dev --log-level=debug invoke dbt-bigquery --debug run-operation stage_external_sources
hoodox
06/23/2022, 3:40 PMhoodox
06/23/2022, 3:42 PMquester-d-keycloak-tds-f513
.`pre_arc_keycloak...
153953.545568 [debug] [MainThread]: On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.0.8", "profile_name": "meltano", "target_name": "bigquery", "connection_name": "macro_stage_external_sources"} */
create or replace external table quester-d-keycloak-tds-f513
.pre_arc_keycloak_is
.`user_entity_gcs`(
id varchar(255),
username varchar(255),
email varchar(255),
email_constraint varchar(255),
email_verified varchar(2),
enabled varchar(2),
first_name varchar(255),
last_name varchar(255),
created_date timestamp,
realm_id varchar(255),
service_account_client_link varchar(255))
options (
uris = ['gs://quester-d-lan-keycloak-is/kubernetes/prod/quester-gke-prod/postgres-backup-jobs/postgres-prod/cluster-postgres-prod/user_entity_refresh_220622T04_00_05.csv'], format = 'csv', skip_leading_rows = 1)
153953.939308 [debug] [MainThread]: BigQuery adapter: Retry attempt 1 of 1 after error: BadRequest('Type not found: varchar at [6:20]')
153955.379302 [debug] [MainThread]: BigQuery adapter: Unhandled error while running:
macro stage_external_sources
153955.380556 [debug] [MainThread]: BigQuery adapter: Database Error
Type not found: varchar at [6:20]
153955.382140 [error] [MainThread]: Encountered an error while running operation: Database Error
Type not found: varchar at [6:20]
153955.383299 [debug] [MainThread]:
Traceback (most recent call last):
File "/project/.meltano/transformers/dbt-bigquery/venv/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 174, in exception_handler
yield
File "/project/.meltano/transformers/dbt-bigquery/venv/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 549, in _retry_and_handle
return retry.retry_target(
File "/project/.meltano/transformers/dbt-bigquery/venv/lib/python3.8/site-packages/google/api_core/retry…hoodox
06/23/2022, 3:43 PMhoodox
06/23/2022, 3:43 PMalexander_butler
06/23/2022, 3:44 PMstring
notice that example also has variant
which is snowflake specific so its definitely wronghoodox
06/23/2022, 3:45 PMhoodox
06/23/2022, 3:45 PMalexander_butler
06/23/2022, 3:46 PMalexander_butler
06/23/2022, 3:46 PMhoodox
06/23/2022, 3:50 PMhoodox
06/23/2022, 3:51 PMalexander_butler
06/23/2022, 3:51 PMhoodox
06/23/2022, 3:56 PMhoodox
06/23/2022, 3:56 PMplugins:
transformers:
- name: dbt-bigquery
variant: dbt-labs
pip_url: dbt-core~=1.0.0 dbt-bigquery~=1.0.0
commands:
stage-gcs:
args: run-operation stage_external_sources
description: Stage external sources in Google Cloud Storage
stage-gcs-refresh:
args: "run-operation stage_external_sources --vars 'ext_full_refresh: true'"
description: Stage external sources in Google Cloud Storage
alexander_butler
06/23/2022, 3:57 PMmeltano run dbt-bigquery:stage-gcs dbt-bigquery:run
Obviously throw in any EL tap->target blocks in before if needed.
Then schedule in airflow, cron, or w/ehoodox
06/23/2022, 4:02 PMhoodox
06/23/2022, 4:04 PMalexander_butler
06/23/2022, 4:05 PMhoodox
06/23/2022, 4:08 PMhoodox
06/23/2022, 4:09 PMalexander_butler
06/23/2022, 4:10 PMhoodox
06/24/2022, 3:53 PMsources:
- name: keycloak
# database == project_id in bigquery which is already set in profiles.yml
# database: pre_arc_keycloak_is
schema: pre_arc_keycloak_is
loader: gcloud storage
tables:
- name: user_entity_gcs
description: "External table of keycloak user_entity, stored as CSV files in Cloud Storage"
external:
location: $GCS_LOCAATION
alexander_butler
06/24/2022, 4:29 PMsources:
- name: keycloak
# database == project_id in bigquery which is already set in profiles.yml
# database: pre_arc_keycloak_is
schema: pre_arc_keycloak_is
loader: gcloud storage
tables:
- name: user_entity_gcs
description: "External table of keycloak user_entity, stored as CSV files in Cloud Storage"
external:
location: "{{ env_var('GCS_LOCATION', 'defaultLocationHere') }}"
https://docs.getdbt.com/reference/dbt-jinja-functions/env_varhoodox
06/24/2022, 4:31 PMhoodox
06/24/2022, 9:24 PM[WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.meltano.2_current_copy.selective_daily_registers
alexander_butler
06/24/2022, 10:09 PMalexander_butler
06/24/2022, 10:10 PMalexander_butler
06/24/2022, 10:12 PMhoodox
06/25/2022, 3:19 PMhoodox
06/25/2022, 3:20 PMhoodox
06/25/2022, 3:20 PMhoodox
06/30/2022, 12:29 PMmeltano invoke dbt-bigquery --debug run-operation stage_external_sources
, It was all working fine, then suddenly today it stopped working and I am getting this errorhoodox
06/30/2022, 12:29 PM12:18:48.538921 [debug] [MainThread]: Acquiring new bigquery connection "macro_stage_external_sources"
12:18:48.603770 [info ] [MainThread]: 1 of 1 START external source pre_arc_keycloak_is.user_entity_gcs
12:18:48.625276 [debug] [MainThread]: On "macro_stage_external_sources": cache miss for schema "{self.database}.{self.schema}", this is inefficient
12:18:48.627855 [debug] [MainThread]: Opening a new connection, currently in state init
12:18:49.507631 [info ] [MainThread]: 1 of 1 SKIP
12:18:49.511449 [debug] [MainThread]: Flushing usage events
12:18:49.511920 [debug] [MainThread]: Connection 'macro_stage_external_sources' was properly closed.
hoodox
06/30/2022, 12:30 PMOn "macro_stage_external_sources": cache miss for schema "{self.database}.{self.schema}", this is inefficient
What does this mean?alexander_butler
06/30/2022, 12:58 PMhoodox
06/30/2022, 12:59 PMhoodox
06/30/2022, 2:37 PMalexander_butler
06/30/2022, 3:06 PMhoodox
06/30/2022, 3:07 PM