alex_b
04/25/2023, 11:14 AMiris
dataset as a CSV file:
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
I have been tinkering with the tap-csv
extractor to no avail. Should I define my own custom extractor?
Many thanks!mert_bakir
04/25/2023, 11:38 AMvisch
04/25/2023, 12:31 PMalex_b
04/25/2023, 3:10 PMuuid.uuid4()
or an infinite int generatoralex_b
04/26/2023, 6:31 AMawk '{printf "%s,%s\n", NR==1 ? "id" : NR-1, $0}' iris.csv > iris_with_id.csv
user
04/26/2023, 9:36 AMalex_b
04/26/2023, 11:04 AMtap_csv
requires passing the keys
parameter for each file. If I use the target-postgres
sink to complete the EL pipeline, this parameter is then used as the primary key for the table. This is an issue since this example's values are not unique, so I cannot load the raw data without losing rows.
I tried the mapper approach, but I could not figure out how to generate a unique value per row with the built-in functions. I also thought I could use one of the metadata columns by overriding __key_properties__
in the sink config, but _sdc_extracted_at
and the others are not unique as well.
I also tried to define the table schema before running the EL pipeline with an autoincrement id column, but then the loader would try to insert null values inside.
At this point, I concluded I would be better off using a bash script