Hello Someone understand this ? In my configuratio...
# plugins-general
j
Hello Someone understand this ? In my configuration file I select a specific field But in my destination I have a bunch of other stuff as if I wrote
contacts.*
t
does it show what you’d expect if you do
meltano select tap-hubspot --list --all
?
j
@taylor actually it doesn’t
I can see that all the properties are automatic, but I thought that by selecting specific ones, the others would be “turned off”
t
so if you run
meltano --log-level=debug invoke tap-hubspot
what do you see?/
j
Wait please I can’t save the log in a file with
meltano --log-level=debug invoke tap-hubspot > log.txt
So I have to scroll for ages
t
oh wow - that’s super annoying. I’ll file an issue about that
j
Here you have
It uses the cached catalog file
t
try deleting
.meltano/run/tap-hubspot/tap.properties.cache_key
j
But it didn’t change the behaviour
It still created all the columns in the table
😞
t
hrm that’s unfortunate. time to call in backup. @aaronsteers? Douwe is in several meetings at the moment unfortunately
What version of Python are you on btw?
j
3.8.5
t
dang - was hoping I could blame python 3.9 😆
j
I admit that the catalog file of hubspot is particularly long and complex
a
Hi, @juan_sebastian_suarez_valencia. Can I confirm the symptom we are diagnosing? It sounds like from the opening comment that you are filtering down the columns list for a specific table, but your target table is nevertheless being created with all columns. Do I have that correct?
j
Hello @aaronsteers I would like to filter the selection to a specific property in a specific field Hence, since the target is BigQuery, it should only create a column in a specific table It does create the right table : contacts But it creates almost all the columns of that field instead of just 1
a
Okay, I think I know what is going on here… One of the interesting nuances of the Singer spec is that the emitting of SCHEMA messages (which are used to create tables) is a separate message from the emitting of RECORD messages (which contain the actual data). In some tap implementations, the work of filtering down your RECORD data is performed perfectly but the SCHEMA message is not trimmed down at all. When this happens, the full table gets created with all columns, but then only the columns selected actually get populated. Random coincidence - I was actually just working on this related issue in the SDK so this could be automatic for taps that are built off of that framework. For other taps, however, such as with the Hubspot - it might take a code fix on their side (or a PR from us) to get that behavior implemented.
What I sometimes see people doing - and what I’ve done in similar cases - is to write a test in DBT or another tool to assert that the expected-to-be-empty columns actually are empty.
Does this sound like it might explain the behavior you are seeing?
j
Kind of but not exactly I see that there’s no column called firstname although that is the specific property that I’d like However, I see a column called _property_firstname.value_ which is weird because the API doesn’t send the data like that None of the columns is populated though
a
It sounds like the nested column structures may be getting “flattened” on the way into the db… but good to hear they are at least not carrying over the data. The flattening affect does seem to match this screenshot: https://p479.p0.n0.cdn.getcloudapp.com/items/4gu1kNRe/5400d8eb-f89b-4999-bdd0-5204e89ac028.jpg?v=f20b5d0123f5967d0065ee032ad6515d
I’m sorry I’m not more familiar with the Hubspot tap. Out of curiosity, what is your target again?
j
BigQuery
Don’t waste your time @aaronsteers
I will rewrite a tap for hubspot
I’m not sure if I’m going to use the SDK because then I don’t know if Meltano will be able to import it correctly
with the executable problem :(
a
No worries at all. Whether rewriting or sending an MR, I’ll give you a link to the issue where we are doing this in the SDK anyway. And you can borrow that code when it’s ready, even if not building entirely on the SDK. Here’s where you can track progress: Draft: Resolve "Smartly manage `selected_properties` in `get_record_generator()`, `RECORD` messages, and `SCHEMA` messages" (!26) · Merge Requests · meltano / Singer SDK · GitLab
j
Ok thank you 🤝
a
Any time 🙂
If you thumbs up or subscribe on that issue, you’ll get notifications as I make progress on that part of the code.
j
I don’t know how to subscribe to an issue 😮
a
Easiest way is just hit the “thumbs up” (👍 ) but also, there’s a hidden sidebar to the right with a “Notifications” slider:
j
Ok got it 🙂
I did both
(but weirdly after my thumbs up, the notification slider was still off)
a
You’re in 🙂