I assume there s something wrong with my entity setting as b Meltano #troubleshooting

I assume there's something wrong with my entity se...

jannis_voigt

03/07/2023, 9:48 PM

I assume there's something wrong with my entity setting as but I don't know how to change it to create a new schema for every .csv file Thanks for you help.

jacob_matson

03/07/2023, 9:54 PM

might need a wildcard selector. i.e.

Copy code

path: ../csv_files/*.csv

have you tried that?

pat_nadolny

03/08/2023, 1:24 AM

@jannis_voigt I havent used this tap in a while but I think if you have different files with different schemas then they each might need to be their own entity. Like:

Copy code

config:
      files:
      - entity: customers
        path: ../csv_files/customers.csv
        keys:
        - id
      - entity: orders
        path: ../csv_files/orders.csv
        keys:
        - id

If you have many files with the same schema in a directory then it will read all of them as one "stream".

jannis_voigt

03/09/2023, 5:21 PM

Do you know how it works if I want to have different tables in the same scheme, one table per csv file?

pat_nadolny

03/09/2023, 5:33 PM

Each one of these file entities will become its own table. So if you ran

meltano run tap-csv target-postgres

with the config above you should get two tables: customers and orders. And by default they'd go in a schema named after the tap like

tap_csv.customers

and

tap_csv.orders

jannis_voigt

03/09/2023, 6:07 PM

Hm ok. I can have the files in the same schema (I accidentally wrote schema instead of table in my initial post) but I want meltano to extract / load all files within my folder without having to write each file from the folder in my tap-csv config. Basically the following, but it doesn't work: tap-csv meltano doc: `path`: Local path (relative to the project's root) to the file to be ingested. Note that this may be a directory, in which case all files in that directory and any of its subdirectories will be recursively processed

pat_nadolny

03/09/2023, 6:36 PM

Ohhh I see. So you want it to read a bunch of files from a directory as a single table. Are they all the same structure? Like same columns each each?

pat_nadolny

03/09/2023, 6:39 PM

This works for me when I test

/.../top_directory/my_meltano_project/meltano.yml

Copy code

- name: tap-csv
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-csv.git>
    config:
      files:
      - entity: test_file
        path: ../test_data/
        keys:
          - col1

given a directory one level above in

/.../top_directory/test_data/

that contains file1.csv and file2.csv

pat_nadolny

03/09/2023, 6:40 PM

What commands are you running? Can you share any log output that might be helpful?

jannis_voigt

03/09/2023, 7:02 PM

Ohhh I see. So you want it to read a bunch of files from a directory as a single table. Are they all the same structure? Like same columns each each? I want each csv file in a different table. Their structure is not the same, however they are pretty similar. The "Problem" is that it is running without error but does something else than I expect. Here's my Log: ´´´ 2023-03-09T184815.871673Z [info ] Environment 'prod' is active 2023-03-09T184831.819228Z [info ] 2023-03-09 184831,818 | INFO | tap-csv | Beginning full_table sync of 'ARR'... cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184831.820016Z [info ] 2023-03-09 184831,819 | INFO | tap-csv | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.059589Z [info ] 2023-03-09 184832,059 | WARNING | tap-csv | Properties ('ARR Report', 'ARR Sales', 'Diff') were present in the 'ARR' stream but not found in catalog schema. Ignoring. cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.120179Z [info ] 2023-03-09 184832,119 | INFO | singer_sdk.metrics | INFO METRIC: {"metric_type": "timer", "metric": "sync_duration", "value": 0.3004000186920166, "tags": {"stream": "ARR", "context": {}, "status": "succeeded"}} cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.120926Z [info ] 2023-03-09 184832,120 | INFO | singer_sdk.metrics | INFO METRIC: {"metric_type": "counter", "metric": "record_count", "value": 830, "tags": {"stream": "ARR", "context": {}}} cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.691066Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Table '"arr"' exists cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184832.803903Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Loading 74 rows into 'landing."arr"' cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184832.916883Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Loading into landing."arr": {"inserts": 0, "updates": 74, "size_bytes": 8809} cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184833.029453Z [info ] Incremental state has been updated at 2023-03-09 184833.029296. 2023-03-09T184833.076673Z [info ] Block run completed. block_type=ExtractLoadBlocks err=None set_number=0 success=True ´´´ the command i run is

meltano run tap-csv target-postgres

Thank you for your help.

pat_nadolny

03/09/2023, 7:42 PM

I want each csv file in a different table. Their structure is not the same, however they are pretty similar.

I think you'll need to define these as separate entities in your config. The tap doesnt know to separate the files as different streams/tables unless you define them separately

pat_nadolny

03/09/2023, 7:43 PM

That aside it is weird that its not finding your other file. I'd suggest testing that you can sync the other file if you explicitly reference it in your config. I'm wondering if there might be an issue reading that file for some reason

jannis_voigt

03/10/2023, 8:39 AM

Ok thank you, with different entities it works just fine.

Open in Slack

Previous Next