@here How do you handle post-load deduplication in...
# troubleshooting
r
@here How do you handle post-load deduplication in target BigQuery using DBT?
d
@taylor I think you have some experience with this!
t
@raghav_nayak I’d recommend setting up a view model w/in dbt that dedupes based on either the primary key or some uploaded at field. We do this a bunch at GitLab. https://gitlab.com/gitlab-data/analytics/-/blob/master/transform/snowflake-dbt/mod[…]ab_dotcom_approval_merge_request_rule_sources_dedupe_source.sql this is for Snowflake but the same pattern would hold.
r
@taylor thank you for your quick response. Let me go through this.
@taylor I went your file. you have mentioned the below-given lines in the file.
Copy code
SELECT *
FROM {{ source('gitlab_dotcom', 'approval_merge_request_rule_sources') }}
Can you please tell me whether _gitlab_dotcom and approval_merge_request_rule_sources_ are referring to source database(in my case, Postgres) or destination(BigQuery)? when I run ‘meltano select tap-postgres --list --all’, I get public-<tablename>.XYZ. If the above code is referring to Postgres, then I believe my source would look like
Copy code
SELECT *
FROM {{ source('public', 'tablename') }}
In ELT, I believe the transformation happens on the target(BigQuery). Then I have to change the above source to
Copy code
SELECT *
FROM {{ source('bigquery_dataset_name', 'table_name') }}
Let me know whether my understanding is correct.
with BigQuery select statements, I am getting the below-given error while running the meltano elt with debug options
Copy code
dbt                   | Database Error in model nh (models/sources/nh.sql)
dbt                   |   SELECT without FROM clause cannot use aggregation at [8:27]
dbt                   |   compiled SQL at ../.meltano/transformers/dbt/target/run/my_meltano_project/sources/nh.sql
dbt                   |
dbt                   | Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
meltano               | ERROR Transformation failed (1): Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
meltano               | DEBUG ELT could not be completed: `dbt run` failed
Traceback (most recent call last):
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/cli/elt.py", line 242, in _run_elt
    await _run_transform(elt_context, output_logger)
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/cli/elt.py", line 314, in _run_transform
    await dbt_runner.run(**kwargs, log=transformer_log_writer)
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/core/runner/dbt.py", line 68, in run
    await self.invoke(dbt, log=log, command=cmd)
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/core/runner/dbt.py", line 56, in invoke
    raise RunnerError(
meltano.core.runner.RunnerError: `dbt run` failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/cli/elt.py", line 225, in _redirect_output
    yield
  File "/root/.venv/meltano/lib/python3.8/site-packages/meltano/cli/elt.py", line 246, in _run_elt
    raise CliError(f"ELT could not be completed: {err}") from err
meltano.cli.utils.CliError: ELT could not be completed: `dbt run` failed
meltano               | ELT could not be completed: `dbt run` failed
This is the query i have
Copy code
select * from `<bigquery_dateset>.<table_name>`
WHERE _time_extracted >= (SELECT MAX(_time_extracted))
i tried with
<project>.<bigquery_dateset>.<table_name>
but same results
I am not sure whether I have to setup the dbt_profile.yml file in the transform directory.
i installed dbt but I have not changed anything in the profiles.yml file
t
source
in the SQL file is referring to a dbt source. So in this case it would be referring to a table in your destination, I’m guessing bigquery in your case.
Also, I think in your query you’d need a
FROM
statement in the subquery
r
@taylor In ELT, We load and then transform.. So, I believe we need to transform the BigQuery data in our case. Correct me if I am wrong.
Do you have any videos in your youtube channel for DBT integration?
in our case Postgres(Extract) -> BQ(load/target)
t
You’re right about doing the transformation in BigQuery 👍
I don’t know that we have any videos on the Meltano page for doing transformations. But everything in the /transform directory is a regular dbt project, so anything you can do with dbt you can do in there. You’d invoke dbt by doing
meltano invoke dbt <dbt_command>