https://linen.dev logo
#singer-tap-development
Title
# singer-tap-development
h

Henning Holgersen

06/29/2023, 11:54 AM
Hi! Instead of doing useful things, I have pursued an LLM-related idea for ingesting pretty much any unstructured document and chunking the data. Combined with something like map-embeddings and target-chroma, this might be useful for people who have a lot of files on a drive somewhere, and a data-science group yelling “we want to do LLMs!“. Lots of the code was stolen from the upcoming
tap-file
: https://github.com/radbrt/tap-text-anywhere. This is mainly a proof-of-concept to check the interest, thoughts are welcome. The Meltano file in the repo is preconfigured with an s3 bucket with two PDF files so anyone should be able to fork, install and load from s3.