HELP NEEDED hi everyone i hope you re doing well name s leo Meltano #troubleshooting

(HELP NEEDED) hi everyone, i hope you're doing wel...

Leonardo Duarte

02/06/2025, 10:26 PM

(HELP NEEDED) hi everyone, i hope you're doing well. name's leonardo and was wondering if anyone would be able to help me with (what i think should be) a minor request. i'm gonna keep it short and simple: i need to use tap-csv to extract data from multiple directories, all following a set of rules defined by a previously used csv loader. how do i extract all of these csv files using a single, reusable tap-csv? as in, they're all there, following a simple set of rules (in this case, "postgres/{stream_name}/{datestamp}/data.csv"). ignore the /csv/ one, it's a part of another task i got covered already. this is a code challenge for a company i'm trying to join their internship program (as a data engineer), so any help would be appreciated. i saw other contestants extracting their files manually (one by one), which i find to be extremely... inadequate.

Edgar Ramírez (Arch.dev)

02/06/2025, 10:31 PM

Hi @Leonardo Duarte! What have you tried so far?

Leonardo Duarte

02/06/2025, 10:33 PM

hey there, @Edgar Ramírez (Arch.dev)! i haven't tried anything yet, considering i don't know how exactly how to define this filesystem rule for tap-csv. i'm looking for answers in the tap-csv docs, and don't quite know where to start.

Leonardo Duarte

02/06/2025, 10:38 PM

here's my meltano yml. the challenge is to extract the whole pgsql db (which has already been done) with said filesystem rules to my local drive, and then i have to upload the files extracted to another pgsql. i'm currently at the part where i have to upload them to the other pgsql db. i also have to extract data from a parallel csv file, which i don't need help for, so feel free to ignore the extra csv in there.

meltano.yml

Leonardo Duarte

02/06/2025, 10:40 PM

the thing is, i don't know how to pass all of these directories to a single tap-csv plugin. is that possible?

Edgar Ramírez (Arch.dev)

02/07/2025, 12:39 AM

you can have multiple entries in `files`:

Copy code

files:
      - entity: categories
        path: /path/to/categories/data.csv
        keys: [...]
      - entity: customers
        path: /path/to/customers/data.csv
        keys: [...]

🙌 1

❤️ 1

Leonardo Duarte

02/07/2025, 1:53 AM

@Edgar Ramírez (Arch.dev) that's awesome! thank you so much. now, how do i insert both the entity name (which would be {stream_name}) and current date ({datestamp}) in path? considering that i can't insert a {datestamp} or {stream_name}, as it seems that it doesn't work...

Leonardo Duarte

02/07/2025, 1:54 AM

manually inserting current date and entity name works, so the path itself is alright. i just wish i could get the current entity name and yyyy-MM-dd in the directory path...

Leonardo Duarte

02/07/2025, 2:14 AM

and yeah, your solution works, multiple entities inside of files certainly do the trick... now all i need is to replace their hard-coded entity name with their current-block entity name, and that hard-coded datestamp into something i could recursively use, as in CurrentDay or something like that.

Reuben (Matatika)

02/07/2025, 4:03 AM

From the README:

`path`: Local path to the file to be ingested. Note that this may be a directory, in which case all files in that directory and any of its subdirectories will be recursively processed

-- although, since it looks like each file has its own unique key, there wouldn't be any merit in doing this.

Leonardo Duarte

02/07/2025, 4:48 AM

so, i saw that i can export the current date via bash before executing the task by: export CURRENT_DATE=$(date +'%Y-%m-%d'); then referencing it with ${CURRENT_DATE}. any idea how i'd be able to schedule this export via, let's say, an airflow DAG?

3 Views

Open in Slack

Previous Next