Hi! I'm new to python, Meltano but I'm putting eff...
# announcements
l
Hi! I'm new to python, Meltano but I'm putting effort on learning. I'd like to set a pipeline google analytics -> bigquery. I added the custom target https://github.com/RealSelf/target-bigquery but when it comes to the authentication I'm lost:
Download the client_secrets.json file for your service account, and place it on the machine where 
target-bigquery
 will be executed.
Set a 
GOOGLE_APPLICATION_CREDENTIALS
 environment variable on the machine, where the value is the fully qualified path to client_secrets.json
I don't know how to create this environment variable nor understood how to modify meltano.yml to add credentials (and also dataset and table specs). Any tips or tutorials on how to make it work?
k
Hi, Luciano, You can add the env var in
meltano.yml
like the screenshot, then add the value of the env var in
.env
to add
export GOOGLE_APPLICATION_CREDENTIALS = 'path/to/secrets.json'
to check if you are doing it correctly, you can do
meltano config <plugin> list
to see if
GOOGLE_APPLICATION_CREDENTIALS
is loaded properly
d
@luciano_pinto Welcome to Meltano! Have you seen the info at https://meltano.com/#meltano-config and https://meltano.com/#meltano-add? Once you've added
target-bigquery
to your project, you'll have a section in your
meltano.yml
like the one @keat shared. You can then use
meltano config target-bigquery list
to view the available settings and their current values, and you can use
meltano config target-bigquery set <setting> <value>
to store a value in
meltano.yml
. If you'd like to store certain config in the environment instead (so that it's not checked into version control, if you plan to use that), you can use
.env
(as Keat suggested) or the native environment variable functionality provided by your shell or orchestration environment. See also https://meltano.slack.com/archives/CFG3C3C66/p1594667270109000?thread_ts=1594434709.063300&amp;cid=CFG3C3C66 🙂
l
Thanks @keat and @douwe_maan I made some progress with your help. Appreciated. The loader was added and the Pipeline was created successfully. However, it's failing to run. I wonder what this means:
TypeError: Object of type Decimal is not JSON serializable
Copy code
INFO Refreshing access_token
INFO Syncing stream: website_overview
INFO URL being requested: POST <https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json>
INFO Syncing stream: traffic_sources
INFO URL being requested: POST <https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json>
Traceback (most recent call last):
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/bin/target-google-bigquery", line 11, in <module>
load_entry_point('target-bigquery==1.4.0', 'console_scripts', 'target-bigquery')()
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/target_bigquery.py", line 306, in main
state = persist_lines_stream(config['project_id'], config['dataset_id'], input, validate_records=validate_records)
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/target_bigquery.py", line 231, in persist_lines_stream
errors[msg.stream] = bigquery_client.insert_rows_json(tables[msg.stream], [msg.record])
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 2678, in insert_rows_json
timeout=timeout,
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 574, in _call_api
return call()
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/Users/L/meltano-projects/ga-enc-project/.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/google/cloud/_http.py", line 409, in api_request
data = json.dumps(data)
File "/Users/L/anaconda3/lib/python3.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Users/L/anaconda3/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Users/L/anaconda3/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Users/L/anaconda3/lib/python3.7/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Decimal is not JSON serializable
d
@luciano_pinto It sounds like one of your extracted records has a decimal value, that
target-google-bigquery
is not prepared to handle. What tap are you using?
It’s odd that a decimal made its way into the loader in the first place, since the extractor sends records to the loader encoded as JSON, and as the error correctly points out, JSON doesn’t have a decimal type — just float.
l
Hi @douwe_maan I’,m using Google Analytics, the one already built-in by Meltano
d
To figure out what specific record is causing this issue, would you be comfortable going into
.meltano/loaders/target-google-bigquery/venv/lib/python3.7/site-packages/target_bigquery.py
and putting a
try/except TypeError
block around the
insert_rows_json
call on line 231, so that we can put a
breakpoint()
there and find out why
msg.record
contains a Decimal value?
You may be better off with print-based debugging rather than breakpoint, by the way, since
meltano elt
may not actually allow you to interact with the debugger running inside the loader process
Once we figure out where the decimal lives (in what property of what stream), we’ll likely want to use the
schema
extra on the extractor (https://meltano.com/docs/command-line-interface.html#extractor-extra-schema) to tell the loader to interpret that field as a float instead, assuming it’s currently being told to interpret it as a decimal
l
I can try it for sure! Truly appreciate your help.
d
My pleasure!
l
Hi @douwe_maan I tried adding the try/except as you suggested and although the process moved on and some tables were created on BigQuery, there's still an error message: ```Running extract & load... No state was found, complete import. INFO Sending version information to stitchdata.com. To disable sending anonymous usage data, set the config parameter "disable_collection" to true INFO URL being requested: GET https://www.googleapis.com/discovery/v1/apis/analyticsreporting/v4/rest INFO URL being requested: GET https://www.googleapis.com/discovery/v1/apis/analytics/v3/rest INFO URL being requested: GET https://www.googleapis.com/analytics/v3/metadata/ga/columns?alt=json INFO Attempting refresh to obtain initial access_token INFO Refreshing access_token INFO Syncing stream: website_overview INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: traffic_sources INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated INFO Syncing stream: pages INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated Received state is invalid, incremental state has not been updated INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated Received state is invalid, incremental state has not been updated INFO Syncing stream: locations INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated INFO Syncing stream: monthly_active_users INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: four_weekly_active_users INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: two_weekly_active_users INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: weekly_active_users INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: daily_active_users INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json INFO Syncing stream: devices INFO URL being requested: POST https://analyticsreporting.googleapis.com/v4/reports:batchGet?alt=json Received state is invalid, incremental state has not been updated (...) Received state is invalid, incremental state has not been updated INFO Loaded 0 row(s) into ga_enc:website_overview INFO Loaded 0 row(s) into ga_enc:traffic_sources INFO Loaded 0 row(s) into ga_enc:pages INFO Loaded 0 row(s) into ga_enc:locations INFO Loaded 34 row(s) into ga_enc:monthly_active_users INFO Loaded 34 row(s) into ga_enc:four_weekly_active_users INFO Loaded 34 row(s) into ga_enc:two_weekly_active_users INFO Loaded 34 ro…
I don’t know what is causing
Received state is invalid, incremental state has not been updated
All over the code
state
is set to
None
d
@luciano_pinto That error originates here: https://gitlab.com/meltano/meltano/-/blob/2ebb2447fe43a85afe9731a671cf94c36d403933/src/meltano/core/runner/singer.py#L137, but we're not logging any more details than what you're seeing right now. Do you think you could find that file in your Meltano virtual environment, and add
logging.error(str(err), exc_info=True)
above that
logging.warning
call? Relative to your virtual environment, the file should be at
lib/python3.7/site-packages/meltano/core/runner/singer.py
l
Sorry to insist but I still get errors. Could you help? BigQuery is getting some data even with the errors. Seems like when there is a decimal number, it will cause error. Tables with only dates and integers are being loaded into just normal. Sorry to upload the entire error log instead of only the problematic parts... I can't tell what's going on.
@douwe_maan apart from the errors, what would you recommend for me to get a clearer understanding of how to properly define inside meltano.yml: • the reports (I already have the JSON file with dimensions/metrics) • get schemas for tap-google-analytics (i could not use discovery mode)
d
@luciano_pinto To figure out why you're seeing those
Received state is invalid, incremental state has not been updated
errors, did you try logging the state messages as suggested in https://meltano.slack.com/archives/CFG3C3C66/p1596556214389700?thread_ts=1596168228.349000&amp;cid=CFG3C3C66? If you're on the latest version of Meltano, you can also use debug mode to log all messages coming out of the tap and target: https://meltano.com/docs/command-line-interface.html#debugging I'm seeing "ERROR" lines in the output you shared, but they appear to log RECORD messages, not the STATE messages that are causing the issue.
what would you recommend for me to get a clearer understanding of how to properly define  inside meltano.yml:
- the reports (I already have the JSON file with dimensions/metrics)
- get schemas for tap-google-analytics (i could not use discovery mode)
I assume you've seen https://gitlab.com/meltano/tap-google-analytics/blob/master/tap_google_analytics/defaults/default_report_definition.json? If you have a reports definition JSON file of your own, you can store it anywhere in your project and set tap-google-analytics's
reports
setting to the path:
meltano config tap-google-analytics set reports path/to/reports.json
. That will add
reports: path/to/reports.json
to your
meltano.yml
file under
tap-google-analytics
's
config
object.
Once you've specified your own reports file, the tap's discovery mode should work, in
meltano invoke tap-google-analytics --discover
and the features that use it, like
meltano select --list --all tap-google-analytics
Please let me know if that's not giving the desired result!