Hi. I'm using tap-spreadsheets-anywhere and doing xlsx ingestion. But the stream contains all rows in
.xlsx
sheet instead of only data rows.
Example: My xlsx contains 387 lines of data and 613 blank lines (total 1000).
When running the tap to target-jsonl, all the blank lines is being ingested, staying only with metadata columns. How can i solve this problem?
My tap configuration:
plugins:
extractors:
- name: tap-spreadsheets-anywhere--facebook
inherit_from: tap-spreadsheets-anywhere
config:
tables:
- path: s3://<my-bucket>
name: facebook_overview
pattern: social_media/facebook/overview_social_data/facebook_overview.*.xlsx
start_date: '2000-01-01T00:00:00+00'
key_properties: ["Post ID"]
format: excel
worksheet_name: "FB - Posts Table"
sample_rate: 10
max_sampling_read: 2000
max_sampled_files: 3
prefer_schema_as_string: true