Hi all! I am just starting out with Meltano so I a...
# troubleshooting
i
Hi all! I am just starting out with Meltano so I apologize in advance but I seem to be having trouble with what should be a standard transfer of files from s3 to Snowflake. I am following this tutorial https://meltano.com/blog/get-your-data-out-of-s3-and-into-snowflake-with-meltano/ but already when I test the extraction from s3 to a local path running "meltano run tap-s3-csv target-jsonl" I see from the logs that the connection succeeds and the file is found, the problem is in the extraction because I get the "message=Python int too large to convert to C long". I have tried with different csv so it should not be a matter of file size, has anybody experienced this kind of obstacle? Any help is appreciated
j
Heya! if you run
Copy code
meltano invoke --dump=catalog tap-s3-csv > catalog.json
do the column types make sense comparing to the CSV you are retrieving?
h
Do you have some fairly long integers in your CSVs? From the error and what I can tell, python integers can hold much larger values than the C long type, but for some reason the tap tries to convert one to the other. You could try to use a different tap, like tap-spreadsheets-anywhere if you want to try that specific file, or retry with a different csv file without the long integers if you just want to get a feel for how it works.
i
HI! Thanks for the contribution, the thing is that I have even removed the numerical fields on the csv (25 rows with 5 string columns just to test the extraction), and this is confirmed by the catalog.json suggested in the first answer so I can't tell what it is being converted to C. I have looked at the git repo for the tap and I think it is a something going on in the sync.py script but hopefully I will figure it out
j
I believe minimally reproducible example will be the way to go. Would you mind preparing one?
i
Sure! This is the configuration file https://github.com/isareply/meltanodemo/blob/master/s3-to-snowflake-test/meltano.yml (the test.csv in the same repo is in the s3 bucket and the aws keys are in an .env file) - it should be pretty straightforward from the docs but I guess I am missing something, thank you for your time!
j
I'm not able to replicate the problem running
Copy code
meltano run tap-s3-csv target-jsonl
recipe
Copy code
mkdir -p tmp && cd $_
git clone git@github.com:isareply/meltanodemo.git
cd meltanodemo
python3 -m venv venv && source venv/bin/activate
pip install meltano
aws --profile dataplatform-test s3 cp test.csv <s3://dpt-jps-sandbox/tmp/test.csv>
cd s3-to-snowflake-test
replace the
meltano.yml
with attached version check select
Copy code
❯ meltano select --list tap-s3-csv
2023-07-05T17:22:14.265194Z [info     ] The default environment 'dev' will be ignored for `meltano select`. To configure a specific environment, please use the option `--environment=<environment name>`.
2023-07-05T17:22:15.251298Z [warning  ] A catalog file was found, but it will be ignored as the extractor does not advertise the `catalog` or `properties` capability
Legend:
	SelectionType.SELECTED
	SelectionType.EXCLUDED
	SelectionType.AUTOMATIC

Enabled patterns:
	*.*

Selected attributes:
	[SelectionType.SELECTED] test._sdc_extra
	[SelectionType.SELECTED] test._sdc_source_bucket
	[SelectionType.SELECTED] test._sdc_source_file
	[SelectionType.SELECTED] test._sdc_source_lineno
	[SelectionType.SELECTED] test.field1
	[SelectionType.SELECTED] test.field2
	[SelectionType.SELECTED] test.field3
	[SelectionType.SELECTED] test.field4
	[SelectionType.SELECTED] test.field5
run it
Copy code
❯ meltano run tap-s3-csv target-jsonl
2023-07-05T17:24:44.241256Z [info     ] Environment 'dev' is active
2023-07-05T17:24:46.236458Z [warning  ] No state was found, complete import.
...
2023-07-05T17:24:49.982994Z [info     ] Incremental state has been updated at 2023-07-05 17:24:49.982941.
2023-07-05T17:24:49.987608Z [info     ] Block run completed.           block_type=ExtractLoadBlocks err=None set_number=0 success=True
check the first jsonl entry
Copy code
❯ head -1 output/test.jsonl | jq
{
  "field1": "value1_1",
  "field2": "value1_2",
  "field3": "value1_3",
  "field4": "value1_4",
  "field5": "value1_5",
  "_sdc_source_bucket": "dpt-jps-sandbox",
  "_sdc_source_file": "tmp/test.csv",
  "_sdc_source_lineno": 2
}