Hey all, :wave: Im running into an issue with the...
# plugins-general
j
Hey all, 👋 Im running into an issue with the
tap-csv
--discover
capability: Question: How to I configure the tap to correctly infer more than strings using the discover capability? Context: The types of the csv columns are inferred to all be
["string", "null"]
which doesnt match the csv Im extracting. When running
meltano invoke tap-csv --config ./config.json --discover > ./output.json
where
config.json
is
Copy code
{
  "files": [
    {
      "entity": "some_entity",
      "path": "./test.csv"
    }
  ]
}
and
test.csv
is
ID,FirstName,Email,Nationality,DateOfBirth,IsMarried,Height,Weight,Address
1,John,<mailto:johndoe@example.com|johndoe@example.com>,American,1990-05-15,true,175.5,68.2,123 Main St
2,Jane,<mailto:janesmith@example.com|janesmith@example.com>,Canadian,1985-09-22,false,163.8,55.7,456 Elm St
3,Michael,<mailto:michaeljohnson@example.com|michaeljohnson@example.com>,British,1992-02-10,true,180.0,72.1,789 Oak Ave
4,Sarah,<mailto:sarahwilliams@example.com|sarahwilliams@example.com>,Australian,1988-11-30,true,155.2,48.9,321 Pine St
Thanks in advance for your expertise! Regards, Jakob
p
Hey @jakob_vestergaard_offersen - the tap is built to only infer columns as strings right now https://github.com/MeltanoLabs/tap-csv/blob/c9c223ed3889c21bcf5a15909007928d0d858007/tap_csv/client.py#L139 but you have a variety of options: 1. Use mappers to cast types between the tap/target 2. Leave everything as strings and cast in the target system. A common pattern is to use dbt to rename and cast types once data arrives in the warehouse. 3. Use an alternative tap that also supports CSV files - see https://hub.meltano.com/extractors/tap-spreadsheets-anywhere and the newer https://github.com/MeltanoLabs/tap-universal-file 4. Create a PR to make tap-csv smarter at inferring types. Theres a fair amount of examples out there if you wanted to do this but inferring is a little tricky implement
j
Thanks for the help and quick answer 🙏