ashutosh_shanker
10/27/2023, 5:20 AMname: target-parquet
variant: estrategiahq
does not copy source schema to the target. Not sure if I am missing something or how others are using this target.
I am loading a table from posgres and writing to multiple parquet files using target-parquet. Issue is that each parquet file has a different type for a numeric postgress column based on the values in the data as the schma was not copied and written from the source and looks like is generated on the fly.
e.g. amount ( 10,2 )from postgres source is converted to decimal(6,2) in once file and decimal (7,2) in the other file based on the data precision of the data in that file
Is my understanding correct ?ashutosh_shanker
10/27/2023, 5:21 AMashutosh_shanker
10/27/2023, 5:22 AMmeta:
external_location: "read_parquet('{{ env_var('MELTANO_PROJECT_ROOT') }}/output/loader/parquet/{name}/*.parquet')"
- name: payment
ashutosh_shanker
10/27/2023, 5:35 AMdef create_dataframe(list_dict):
fields = set()
for d in list_dict:
fields = fields.union(d.keys())
dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in fields})
return dataframe
write file
def create_dataframe(list_dict):
fields = set()
for d in list_dict:
fields = fields.union(d.keys())
dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in fields})
return dataframe
ashutosh_shanker
10/27/2023, 12:49 PMThe union_by_name option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.
SELECT * FROM read_parquet('flights*.parquet', union_by_name=true);