Hey everyone, question on logging with taps Curre...
# troubleshooting
s
Hey everyone, question on logging with taps Currently, I'm running a custom tap (tap-hibob) into target-bigquery, and am getting an
Emitting State
message containing all my bookmarks that is exceedingly long. What does this message represent? I believe this message is currently poluting my syncing logs, making my job fail in production
This is linked to a recurrent
INFO Updating state with {'bookmarks': {'employee_time_off': {}, 'employees': {
which keeps getting bigger
e
Hey!
getting an
Emitting State
message containing all my bookmarks that is exceedingly long
that message I think is coming from https://github.com/jmriego/pipelinewise-target-bigquery/blob/9738d2c50442e12bd55852aa3612eb0705a51fea/target_bigquery/__init__.py#L56 The reason it’s big is probably because there’s a lots of partitions in a parent-child set of streams.
I believe this message is currently poluting my syncing logs, making my job fail in production
Is there more context, like a traceback, around the failure?
s
Thanks for responding @edgar_ramirez_mondragon ! Oh awesome, that's already one mystery solved! No not really, and that's my issue. The tap generates a few logs in the airflow webserver, gets slow and then crashes. One of the things I'm observing is that every round, the
emitting state
message gets longer and longer
My guess is that it is generating logs that are too long for the airflow webserver to know what to do with. Any idea how to deal with this?
e
Perhaps easiest is tuning down the logs for target-bigquery with the
LOGGING_CONF_FILE
env var, via the singer-python library. It expects it in the
.ini
format: https://github.com/transferwise/pipelinewise-singer-python/blob/da64a10cdbcad48ab373d4dab3d9e6dd6f58556b/singer/logging.conf
s
And on the long run? Is this an airflow issue, or an issue with my stream in itself (like the state shouldnt be that long)
And I'm not sure what you mean by the .ini, this would be a file I add in my root?
e
I don’t think it’s an Airflow issue. I have never encountered such an error related to log length. The state message is long because I assume there’s a large number of contexts for one or more streams. I don’t think there’s a way around that without getting of incremental replication for those streams. With the
.ini
format I was referring to the logging config file format that you can use to silence the info level logs in the target, and that it’s different to the yaml format used by Meltano. I linked above to the default logging config used by target-bigquery, but you can set the
LOGGING_CONF_FILE
env var to point to a different file, e.g.
export LOGGING_CONF_FILE=path/to/logging.conf
, and the target should pick it up.
s
Thank you so much! And in terms of silencing context (let's say I don't care about state), would simply setting
ignore_parent_replication_keys =True
and
replication_method="FULL_TABLE
work?
e
Setting the
replication_key = None
attribute in the stream would make it use full table replication
s
Ok because it seems that I am generating one partition by key, and this even with
replication_key = None
Should I be setting a partitioning key?
I set all my state partitionning keys to none, should solve my issue for now. This is definitely a negative for using the
target-bigquery
though 😆
e
ah yes, setting
state_partitioning_keys = []
in the child stream should reduce the noise