I assume there's something wrong with my entity se...
# troubleshooting
j
I assume there's something wrong with my entity setting as but I don't know how to change it to create a new schema for every .csv file Thanks for you help.
j
might need a wildcard selector. i.e.
Copy code
path: ../csv_files/*.csv
have you tried that?
p
@jannis_voigt I havent used this tap in a while but I think if you have different files with different schemas then they each might need to be their own entity. Like:
Copy code
config:
      files:
      - entity: customers
        path: ../csv_files/customers.csv
        keys:
        - id
      - entity: orders
        path: ../csv_files/orders.csv
        keys:
        - id
If you have many files with the same schema in a directory then it will read all of them as one "stream".
j
Do you know how it works if I want to have different tables in the same scheme, one table per csv file?
p
Each one of these file entities will become its own table. So if you ran
meltano run tap-csv target-postgres
with the config above you should get two tables: customers and orders. And by default they'd go in a schema named after the tap like
tap_csv.customers
and
tap_csv.orders
j
Hm ok. I can have the files in the same schema (I accidentally wrote schema instead of table in my initial post) but I want meltano to extract / load all files within my folder without having to write each file from the folder in my tap-csv config. Basically the following, but it doesn't work: tap-csv meltano doc: `path`: Local path (relative to the project's root) to the file to be ingested. Note that this may be a directory, in which case all files in that directory and any of its subdirectories will be recursively processed
p
Ohhh I see. So you want it to read a bunch of files from a directory as a single table. Are they all the same structure? Like same columns each each?
This works for me when I test
/.../top_directory/my_meltano_project/meltano.yml
Copy code
- name: tap-csv
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/tap-csv.git>
    config:
      files:
      - entity: test_file
        path: ../test_data/
        keys:
          - col1
given a directory one level above in
/.../top_directory/test_data/
that contains file1.csv and file2.csv
What commands are you running? Can you share any log output that might be helpful?
j
Ohhh I see. So you want it to read a bunch of files from a directory as a single table. Are they all the same structure? Like same columns each each? I want each csv file in a different table. Their structure is not the same, however they are pretty similar. The "Problem" is that it is running without error but does something else than I expect. Here's my Log: ´´´ 2023-03-09T184815.871673Z [info ] Environment 'prod' is active 2023-03-09T184831.819228Z [info ] 2023-03-09 184831,818 | INFO | tap-csv | Beginning full_table sync of 'ARR'... cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184831.820016Z [info ] 2023-03-09 184831,819 | INFO | tap-csv | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.059589Z [info ] 2023-03-09 184832,059 | WARNING | tap-csv | Properties ('ARR Report', 'ARR Sales', 'Diff') were present in the 'ARR' stream but not found in catalog schema. Ignoring. cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.120179Z [info ] 2023-03-09 184832,119 | INFO | singer_sdk.metrics | INFO METRIC: {"metric_type": "timer", "metric": "sync_duration", "value": 0.3004000186920166, "tags": {"stream": "ARR", "context": {}, "status": "succeeded"}} cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.120926Z [info ] 2023-03-09 184832,120 | INFO | singer_sdk.metrics | INFO METRIC: {"metric_type": "counter", "metric": "record_count", "value": 830, "tags": {"stream": "ARR", "context": {}}} cmd_type=elb consumer=False name=tap-csv producer=True stdio=stderr string_id=tap-csv 2023-03-09T184832.691066Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Table '"arr"' exists cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184832.803903Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Loading 74 rows into 'landing."arr"' cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184832.916883Z [info ] time=2023-03-09 184832 name=target_postgres level=INFO message=Loading into landing."arr": {"inserts": 0, "updates": 74, "size_bytes": 8809} cmd_type=elb consumer=True name=target-pg-landing producer=False stdio=stderr string_id=target-pg-landing 2023-03-09T184833.029453Z [info ] Incremental state has been updated at 2023-03-09 184833.029296. 2023-03-09T184833.076673Z [info ] Block run completed. block_type=ExtractLoadBlocks err=None set_number=0 success=True ´´´ the command i run is
meltano run tap-csv target-postgres
Thank you for your help.
p
I want each csv file in a different table. Their structure is not the same, however they are pretty similar.
I think you'll need to define these as separate entities in your config. The tap doesnt know to separate the files as different streams/tables unless you define them separately
That aside it is weird that its not finding your other file. I'd suggest testing that you can sync the other file if you explicitly reference it in your config. I'm wondering if there might be an issue reading that file for some reason
j
Ok thank you, with different entities it works just fine.