Hi. I'm using tap-spreadsheets-anywhere and doing xlsx ingestion. But the stream contains all rows i...
f
Hi. I'm using tap-spreadsheets-anywhere and doing xlsx ingestion. But the stream contains all rows in
.xlsx
sheet instead of only data rows. Example: My xlsx contains 387 lines of data and 613 blank lines (total 1000). When running the tap to target-jsonl, all the blank lines is being ingested, staying only with metadata columns. How can i solve this problem? My tap configuration:
Copy code
plugins:
  extractors:
  - name: tap-spreadsheets-anywhere--facebook
    inherit_from: tap-spreadsheets-anywhere
    config:
      tables:
      - path: s3://<my-bucket>
        name: facebook_overview
        pattern: social_media/facebook/overview_social_data/facebook_overview.*.xlsx
        start_date: '2000-01-01T00:00:00+00'
        key_properties: ["Post ID"]
        format: excel
        worksheet_name: "FB - Posts Table"
        sample_rate: 10
        max_sampling_read: 2000
        max_sampled_files: 3
        prefer_schema_as_string: true
e
have you tried setting
max_sampling_read
to a lower value?