eric_goddard
12/14/2021, 1:35 PMpipelinewise-tap-s3-csv
extractor. For my PoC i’m loading CSVs into postgres. When using local CSVs with the tap-csv
extractor the data is loaded into my postgres target, but when using the tap-s3-csv
extractor along with the --log-level=debug
option I can see that the CSVs on S3 are found, but they aren’t loaded into the postgres target. The command that works is
meltano --log-level=debug elt tap-csv target-postgres
while
meltano --log-level=debug elt tap-s3-csv target-postgres
doesn’t. If anyone is aware of what I may be missing, the help would be greatly appreciated 🙂 . Part of my meltano.yml
is in the thread.eric_goddard
12/14/2021, 1:36 PMplugins:
extractors:
- name: tap-csv
variant: meltanolabs
pip_url: git+<https://github.com/MeltanoLabs/tap-csv.git>
config:
files:
- entity: competitors
path: ../tmp/competitors.csv
keys:
- ID
- name: tap-s3-csv
namespace: musicleague_s3
pip_url: git+<https://github.com/transferwise/pipelinewise-tap-s3-csv>
executable: tap-s3-csv
capabilities:
- catalog
- discover
- state
config:
bucket: bucket_name
start_date: "2021-12-13"
tables:
- table_name: competitors
search_prefix: musicleague
search_pattern: "competitors\\.csv"
key_properties: ["ID"]
delimiter: ","
loaders:
- name: target-postgres
variant: transferwise
pip_url: pipelinewise-target-postgres
transformers:
- name: dbt
pip_url: dbt==0.21.1
files:
- name: dbt
pip_url: git+<https://gitlab.com/meltano/files-dbt.git@config-version-2>
update:
transform/profile/profiles.yml: false
visch
12/14/2021, 2:32 PMI can see that the CSVs on S3 are found, but they aren't loaded into the postgres target
Can you show how you "see" that csvs are found?
tables:
- table_name: competitors
search_prefix: musicleague
search_pattern: "competitors\\.csv"
key_properties: ["ID"]
delimiter: ","
Is going to almost certainly be the issue. It's just finding which thing is off. What jumps out to me is competitors\\.csv
I'd try just using competitors.csv
or .csv
eric_goddard
12/14/2021, 2:37 PMcompetitors.csv
first, with the same results. I can see the files were found because the logs contain
2021-12-14T14:35:18.384766Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Checking bucket "<bucket>" for keys matching "competitors\.
csv" name=tap-s3-csv stdio=stderr type=discovery
2021-12-14T14:35:18.384848Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Skipping files which have a LastModified value older than 2021-12-13 00:00:00+00:00 n
ame=tap-s3-csv stdio=stderr type=discovery
2021-12-14T14:35:18.634940Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Found 4 files. name=tap-s3-csv stdio=stderr type=discovery
2021-12-14T14:35:18.636623Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Will download key "musicleague/competitors.csv" as it was last modified 2021-12-13 19
:15:50+00:00 name=tap-s3-csv stdio=stderr type=discovery
2021-12-14T14:35:18.637040Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Sampling musicleague/competitors.csv (max records: 1000, sample rate: 5) name=tap-s3-
csv stdio=stderr type=discovery
2021-12-14T14:35:18.986515Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Sampled 7 rows from musicleague/competitors.csv name=tap-s3-csv stdio=stderr type=dis
covery
2021-12-14T14:35:19.000312Z [info ] time=2021-12-14 08:35:18 name=tap_s3_csv level=INFO message=Finished discover name=tap-s3-csv stdio=stderr type=discovery
2021-12-14T14:35:19.075018Z [info ] name=tap-s3-csv stdio=stderr type=discovery
eric_goddard
12/14/2021, 2:39 PMtap-s3-csv
extractor are
2021-12-14T14:35:19.421533Z [info ] time=2021-12-14 08:35:19 name=botocore.credentials level=INFO message=Found credentials in environment variables. cmd_type=extractor job_id=2021-12-14T143516--tap-s3-csv--target-postgres name=tap-s3-csv run_id=750da636-c1db-4592-b615-04bb2e09cd45 stdio=stderr
2021-12-14T14:35:19.853839Z [info ] time=2021-12-14 08:35:19 name=tap_s3_csv level=WARNING message=I have direct access to the bucket without assuming the configured role. cmd_type=extractor job_id=2021-12-14T143516--tap-s3-csv--target-postgres name=tap-s3-csv run_id=750da636-c1db-4592-b615-04bb2e09cd45 stdio=stderr
2021-12-14T14:35:19.921320Z [debug ] Deleted configuration at /Users/eric/dev/learning/musicleague-dbt/meltano/.meltano/run/elt/2021-12-14T143516--tap-s3-csv--target-postgres/750da636-c1db-4592-b615-04bb2e09cd45/target.b6758593-d498-4680-8855-37fcea3e1f49.config.json
2021-12-14T14:35:19.921720Z [debug ] Deleted configuration at /Users/eric/dev/learning/musicleague-dbt/meltano/.meltano/run/elt/2021-12-14T143516--tap-s3-csv--target-postgres/750da636-c1db-4592-b615-04bb2e09cd45/tap.83533fcf-6206-435a-81b0-a054d676d220.config.json
2021-12-14T14:35:19.921836Z [info ] Extract & load complete! job_id=2021-12-14T143516--tap-s3-csv--target-postgres name=meltano run_id=750da636-c1db-4592-b615-04bb2e09cd45
2021-12-14T14:35:19.921965Z [info ] Transformation skipped. job_id=2021-12-14T143516--tap-s3-csv--target-postgres name=meltano run_id=750da636-c1db-4592-b615-04bb2e09cd45
. When I run the command using the tap-csv
extractor, the logs contain all of the inserts into postgresvisch
12/14/2021, 2:57 PMmeltano select --all tap-s3-csv
No select could be iteric_goddard
12/14/2021, 2:58 PMeric_goddard
12/14/2021, 2:59 PMcapabilities
?visch
12/14/2021, 3:17 PMmeltano select --all tap-s3-csv
it added a select: * . * to your meltano.ymlvisch
12/14/2021, 3:17 PMmeltano select --list tap-s3-csv
eric_goddard
12/14/2021, 3:51 PMmeltano select --list tap-s3-csv
duplicated the
select:
- '*.*'
eric_goddard
12/14/2021, 4:29 PMmeltano select --list tap-s3-csv
now returns:
meltano select --list tap-s3-csv
Legend:
selected
excluded
automatic
Enabled patterns:
*.*
Selected attributes:
[automatic] competitors.ID
[selected ] competitors.Name
[selected ] competitors._sdc_extra
[selected ] competitors._sdc_source_bucket
[selected ] competitors._sdc_source_file
[selected ] competitors._sdc_source_lineno
visch
12/14/2021, 4:32 PMeric_goddard
12/14/2021, 4:44 PMedgar_ramirez_mondragon
12/14/2021, 5:13 PMmeltano invoke tap-s3-csv
eric_goddard
12/14/2021, 5:14 PM❯ meltano invoke tap-s3-csv
time=2021-12-14 11:13:40 name=botocore.credentials level=INFO message=Found credentials in environment variables.
time=2021-12-14 11:13:40 name=tap_s3_csv level=WARNING message=I have direct access to the bucket without assuming the configured role.
eric_goddard
12/14/2021, 5:16 PMmeltano invoke --dump=catalog tap-s3-csv
outputs info about the stream and metadataeric_goddard
12/15/2021, 4:31 PMplugins:
extractors:
- name: tap-s3-csv
namespace: tap_s3_csv
variant: fishtown-analytics
pip_url: git+<https://github.com/boggdan95/tap-s3-csv.git>
executable: tap-s3-csv
capabilities:
- state
settings:
- name: aws_access_key_id
kind: string
- name: aws_secret_access_key
kind: password
- name: start_date
kind: string
- name: bucket
kind: string
- name: tables
kind: object
config:
aws_access_key_id: $AWS_ACCESS_KEY_ID
aws_secret_access_key: $AWS_SECRET_ACCESS_KEY
bucket: bucket_name
start_date: '2021-12-13 00:00:00'
tables:
- name: competitors
pattern: musicleague/competitors.csv
key_properties:
- ID
search_prefix: musicleague
format: csv
delimiter: ','
Putting this here so that hopefully it can help someone else searching for tap-s3-csv
. Thanks everyone!