Hey! I'm using tap-hubspot to tap-postgres and fac...
# troubleshooting
j
Hey! I'm using tap-hubspot to tap-postgres and facing problem with json schema. I checked the doc of singer and saw that schema has been defined, but on postgress all colums is arriving as json. There is a way to normalize and mapping the data to postgres?
p
Hey @joao_vitor_ferreira_silva 👋 - theres two ways to do this, either by using the data flattening setting in the target which will turn nested json into their own columns or its also common to manage denesting the json in your dbt staging models, although some warehouses are better than others at that. I think either way should work for you!
j
Thanks for the answer @pat_nadolny! Helped a lot. But now i'm stuck in another problem 😕 Do you ever saw that? Looks like schema have two columns with the same name and i have no idea how fix that unless mapping all columns by hand.
Do we have any setting to keep the "father object name"? Like "deals.id"
p
@joao_vitor_ferreira_silva if the field isnt needed you could use selection rules to exclude it, that would get you unblocked for now at least
@edgar_ramirez_mondragon do you know of a way to do this? Stream map aliasing can solve it but I dont think either tap or target are SDK based
e
@pat_nadolny I can't think of another way (as you correctly point out neither are SDK-based). I'm curious one can end up with two unnested fields with the same name.
Do we have any setting to keep the "father object name"? Like "deals.id"
@joao_vitor_ferreira_silva do you mean this is a nested array turned into a new table, that by chance got an unnested field with the same name as the parent table?
j
Thanks again @pat_nadolny and @edgar_ramirez_mondragon. I 'll try the solution with selection rules. About the error i got the log
Copy code
ValueError: Duplicate column name produced in schema: phtbccadc__value cmd_type=loader job_id=hubspot-to-postgres name=target-postgres run_id=94cbbb95-ca7f-4c54-9283-b97130176b6b stdio=stderr
so i start to think the problem could be with the columns, right? (Sorry if not, i'm just a begginer with Meltano)
Been more specific, the problem is on file
db_sync.py
of
target-postgres
folder in this piece of code :
def flatten_schema(d, parent_key=[], sep='__', level=0, max_level=0):
items = []
if 'properties' not in d:
return {}
for k, v in d['properties'].items():
new_key = flatten_key(k, parent_key, sep)
if 'type' in v.keys():
if 'object' in v['type'] and 'properties' in v and level < max_level:
items.extend(flatten_schema(v, parent_key + [k], sep=sep, level=level + 1, max_level=max_level).items())
else:
items.append((new_key, v))
else:
if len(v.values()) > 0:
if list(v.values())[0][0]['type'] == 'string':
list(v.values())[0][0]['type'] = ['null', 'string']
items.append((new_key, list(v.values())[0][0]))
elif list(v.values())[0][0]['type'] == 'array':
list(v.values())[0][0]['type'] = ['null', 'array']
items.append((new_key, list(v.values())[0][0]))
elif list(v.values())[0][0]['type'] == 'object':
list(v.values())[0][0]['type'] = ['null', 'object']
items.append((new_key, list(v.values())[0][0]))
key_func = lambda item: item[0]
sorted_items = sorted(items, key=key_func)
for k, g in itertools.groupby(sorted_items, key=key_func):
if len(list(g)) > 1:
raise ValueError('Duplicate column name produced in schema: {}'.format(k))
return dict(sorted_items)
p
@joao_vitor_ferreira_silva yeah that sounds right - maybe sharing some sample data would make it easier to understand whats happening.
meltano invoke tap-hubspot > output.json
will send the output to a file. You can share some of that as long as its non-sensitive data