Cross posting this message from <@U06CED15A04> in ...
# plugins-general
p
Cross posting this message from @jo_pearson in case someone in this channel knows how to help 🧵
For the google-analytics tap meltano variant, we've found that choosing a start date in 2020 that goes to the current day outputs reports that vary significantly from the GA UI, and that one-day increments match up much more closely (ex config file below).
{
"key_file_location": "client_secrets.json",
"view_id": "123456789",
"start_date": "2022-01-01T000000Z"
"end_date": "2022-01-02T000000Z"
}
Is there a way to loop through dates in one-day increments? So that the tap pulls a report for 1/1/22, then systematically runs through each day following that until it hits the current day. Example config file below. Without having to manually update for each one-day run?
cc @edward_ryan
For additional context, when we do big backfills in GA we trigger sampling and it degrades the historic data
This was while using the
meltano
variant and @taylor suggested trying the SDK based
meltanolabs
variant. I reimplemented the same functionality with the SDK in the new meltanolabs variant so I think it will act the same way unfortunately. I believe this is really due to google analytics rather than the way data is retrieved data but for our use cases I think we're just pulling unaggregated data (or aggregated at the daily level so its fine grain enough for us) so we havent really run into this issue.
e
Got it! Daily level as in pulling with a start/end date that are one day apart?
p
I'm not sure if it was for GA or something else (maybe facebook ads 🤔) but someone in the community set up some sort of airflow job to iterate and execute individual jobs for 1 day increments, maybe something like that would help
e
We need to backfill over a year's worth of GA data with minimal sampling ... seems like our best bet is to loop through daily pulls?
p
Yeah I'd try something like that. Could you share your report definition? That might be helpful to see
e
@jo_pearson mind sharing the report definition when you get a minute please?
j
Here is one with 5 reports that we're creating
p
this is ours from the squared repo. I havent seen any issue but I suspect its because our report is very simple and the report window is usually small
Copy code
[
  {
    "name": "events",
    "dimensions": [
      "ga:date",
      "ga:eventCategory",
      "ga:eventAction",
      "ga:eventLabel"
    ],
    "metrics": [
      "ga:totalEvents"
    ]
  }
]
from checking out https://developers.google.com/analytics/devguides/reporting/core/v3/reference#samplingLevel , it looks like we just accept the default sampling parameter in the tap but you could try adding the
sampleLevel
parameter in the client.py and setting it to
HIGHER_PRECISION
? Maybe that would be better
e
thanks! think cycling through for daily calls is our best bet
we encountered this issue in our custom GA integrations and resolved by snaking daily/hourly calls; however, doesn't seem like there's a way to do in with the tap maybe something our team can look into down the road
j
I do think adding the higher_precision setting to client.py will also help. Thanks @pat_nadolny !