Hi team I am relatively new to meltano and I set a simple fl Meltano #troubleshooting

Hi team! I am relatively new to meltano and I set ...

pedro_carneiro

05/03/2022, 8:31 PM

Hi team! I am relatively new to meltano and I set a simple flow which executes a tap-csv followed by a target-sqlite. it worked, I am quite sold on meltano ! 🙂 the data appeared in my sqlite folder. the original csv has 640k rows and the process took about 6 minutes. Furthermore i received multiple "Incremental state has been updated ..." messages. does this mean that meltano performed the process in "batches"? I would expect 640k rows from a single file to be much faster than this, am I doing anything wrong? thanks in advance for the help! My meltano.yml (most relevant piece of it):

Copy code

- name: tap-csv
    variant: meltanolabs
    config:
      files:
      - entity: bikes
        path: /Users/pedrocarneiro/peter/Loka/Data Engineering Course/Week_2/datalake/raw/bikes.csv
        keys: ['ride_id', 'rideable_type', 'started_at', 'ended_at','start_station_name', 'start_station_id', 'end_station_name','end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng','member_casual']
  
- name: target-sqlite
    variant: meltanolabs
    pip_url: git+<https://github.com/MeltanoLabs/target-sqlite.git>
    config:
      database: my_database

edgar_ramirez_mondragon

05/03/2022, 8:58 PM

Hi @pedro_carneiro! It does process the records in batches. The default batch_size is 50 but you can set to some higher value

pedro_carneiro

05/03/2022, 9:01 PM

ohhh so the bottleneck is on the sqlite side! thank you @edgar_ramirez_mondragon 🙌

Open in Slack

Previous Next