(HELP NEEDED) hi everyone, i hope you're doing wel...
# troubleshooting
l
(HELP NEEDED) hi everyone, i hope you're doing well. name's leonardo and was wondering if anyone would be able to help me with (what i think should be) a minor request. i'm gonna keep it short and simple: i need to use tap-csv to extract data from multiple directories, all following a set of rules defined by a previously used csv loader. how do i extract all of these csv files using a single, reusable tap-csv? as in, they're all there, following a simple set of rules (in this case, "postgres/{stream_name}/{datestamp}/data.csv"). ignore the /csv/ one, it's a part of another task i got covered already. this is a code challenge for a company i'm trying to join their internship program (as a data engineer), so any help would be appreciated. i saw other contestants extracting their files manually (one by one), which i find to be extremely... inadequate.
e
Hi @Leonardo Duarte! What have you tried so far?
l
hey there, @Edgar Ramírez (Arch.dev)! i haven't tried anything yet, considering i don't know how exactly how to define this filesystem rule for tap-csv. i'm looking for answers in the tap-csv docs, and don't quite know where to start.
here's my meltano yml. the challenge is to extract the whole pgsql db (which has already been done) with said filesystem rules to my local drive, and then i have to upload the files extracted to another pgsql. i'm currently at the part where i have to upload them to the other pgsql db. i also have to extract data from a parallel csv file, which i don't need help for, so feel free to ignore the extra csv in there.
the thing is, i don't know how to pass all of these directories to a single tap-csv plugin. is that possible?
e
you can have multiple entries in `files`:
Copy code
files:
      - entity: categories
        path: /path/to/categories/data.csv
        keys: [...]
      - entity: customers
        path: /path/to/customers/data.csv
        keys: [...]
🙌 1
❤️ 1
l
@Edgar Ramírez (Arch.dev) that's awesome! thank you so much. now, how do i insert both the entity name (which would be {stream_name}) and current date ({datestamp}) in path? considering that i can't insert a {datestamp} or {stream_name}, as it seems that it doesn't work...
manually inserting current date and entity name works, so the path itself is alright. i just wish i could get the current entity name and yyyy-MM-dd in the directory path...
and yeah, your solution works, multiple entities inside of files certainly do the trick... now all i need is to replace their hard-coded entity name with their current-block entity name, and that hard-coded datestamp into something i could recursively use, as in CurrentDay or something like that.
r
From the README:
`path`: Local path to the file to be ingested. Note that this may be a directory, in which case all files in that directory and any of its subdirectories will be recursively processed
-- although, since it looks like each file has its own unique key, there wouldn't be any merit in doing this.
l
so, i saw that i can export the current date via bash before executing the task by: export CURRENT_DATE=$(date +'%Y-%m-%d'); then referencing it with ${CURRENT_DATE}. any idea how i'd be able to schedule this export via, let's say, an airflow DAG?