Greetings (again)! I am creating a `tap-postgres t...
# singer-targets
s
Greetings (again)! I am creating a
tap-postgres target-postgres
EL to read and store a table with 5,500,000+ rows and 45+ columns (and unfortunately, I need them all). With
use_copy: true
, the data ingestion takes an hour to complete, while in other solutions it would take less than 10 minutes (as expected of a
COPY
statement). Any ideas on how can I optimise this??
Additional note: it seems that the
tap-postgres
is the real issue: I added a CSV in between (so
tap-postgres target_csv
and
tap-csv target-postgres
). The first takes almost an hour, while the last took 6-7 minutes. So any ideas on how to optimise (speed-up) the data input in the
tap-postgres
?? Tried the configurations, but found nothing helpful in this matter
e
Enabling debug logging might reveal something, but I{m suspecting the tap might be doing some redundant processing of every record.
s
Thank you for the reply! The database server is down for maintenance, I will be able to test next week and I will post anything I find here
👍 1
Hello again @Edgar Ramírez (Arch.dev)! I debugged this ELT (a log file with 13+ GB of content), and there are no traces of redundancy, every record appears only once. The
target-postgres
batch_processing_time is an average of 20 s to each batch, and it sums to ~6 minutes of total
COPY
operation time, which is acceptable. The
tap-postgres
average extraction time interval between records is ~0.001 s, what leads (roughly) to a total of 90 minutes of data reading.
👀 1
e
This is very useful info @Samuel Nogueira Farrus! Are you using the latest tap-postgres?
s
Yes, latest version
👍 1
e