Hi everyone, I am looking for a solution to my pro...
# getting-started
m
Hi everyone, I am looking for a solution to my problem which is as follows: I have over 100 different data sources, be it HTTP/SOAP endpoints, FTP servers. In those sources I am retrieving files of different formats: JSON, XML, CSV and others. I am looking for a tool to help me handle the growing number of sources in an easy way, which would not require creating custom flows for each of the sources. We would need to normalize data prior to placing it in our staging database for further processing. So basically ingest data from N sources and save it in normalized format to our database, be it Snowflake, Cassandra or even MySQL. Is Meltano a tool which could help me with that? I am trying to understand the ELT flows and how do they actually work and how they actually save the data to destination storage. I will be honest I am kinda confused how a lot of different formats could be saved to destination storage in a way which allows its further processing/transformations. For instance how meltano saves large CSVs, be it 50GB, to CAssandra for instance? Does it dynamically create a unique table for single source and dynamically creates column names by using CSV headers? And if there are no headers how does it work? how is a 100GB XML files saved to Cassandra or Snowflake for instance, an XML which contains lets say bus connections details from one of the bus companies, where each records is a separate trip. Does it takes the records naming and dynamically creates structures? I am starting with researching the subject, I plan to play with it on local machine but not sure if it would not be a wasted time