Hi - it is strange that ```name: target-parquet ...
# troubleshooting
a
Hi - it is strange that
Copy code
name: target-parquet
  variant: estrategiahq
does not copy source schema to the target. Not sure if I am missing something or how others are using this target. I am loading a table from posgres and writing to multiple parquet files using target-parquet. Issue is that each parquet file has a different type for a numeric postgress column based on the values in the data as the schma was not copied and written from the source and looks like is generated on the fly. e.g. amount ( 10,2 )from postgres source is converted to decimal(6,2) in once file and decimal (7,2) in the other file based on the data precision of the data in that file Is my understanding correct ?
When I am loading these parquet file in dbt-duckdb , I start seeing conversion errors as each parquet file has different data type for the same column
dbt code
Copy code
meta:
          external_location: "read_parquet('{{ env_var('MELTANO_PROJECT_ROOT') }}/output/loader/parquet/{name}/*.parquet')"
      - name: payment
target-parquet code - https://github.com/estrategiahq/target-parquet/blob/master/target_parquet/__init__.py
Copy code
def create_dataframe(list_dict):
    fields = set()
    for d in list_dict:
        fields = fields.union(d.keys())
    dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in fields})
    return dataframe
write file
Copy code
def create_dataframe(list_dict):
    fields = set()
    for d in list_dict:
        fields = fields.union(d.keys())
    dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in fields})
    return dataframe
This seems to be solving the issue for me as of now
Copy code
The union_by_name option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, NULL values are filled in.

SELECT * FROM read_parquet('flights*.parquet', union_by_name=true);