hello everybody. looks like quite an obvious quest...
# getting-started
s
hello everybody. looks like quite an obvious question but can not find json file extractor (should look like tap-jsonl for example). please enlighten me )
a
I don't know if this exists quite yet - would make a fun Speedrun though! You could probably use tap-smoke-test since it does read in jsonl files... although it wasn't really intended for "real" use cases.
Can you say a bit more about your requirements? Such as: do you need to get data from the cloud, do you need incremental support, and how stable is the source schema?
p
Would tap-spreadsheets-anywhere work? I see json listed as one of the formats in the tap's readme
s
@aaronsteers thanks for the insights, will keep in mind the workaround with tap-smoke-test. I am a data analyst on vacation ) and finally have time to implement a pet-project idea on how to curate thousands of incoming everyday messages (slack, messengers, emails etc.) via AI( aiming to gpt3 ). Started with PoC for telegram (extracting messages via https://github.com/pyrogram/pyrogram) -python->JSON->PowerQuery (EXCEL). Already a great relief, using it every day ). Now, in the second iteration, moving parts of the solution to meltano/airbyte etc. (probably meltano) + snowflake/spark(bitnami)/databricks/redshift (going to test it all) + dbt + power bi. In this second move the smallest iteration would be to use meltano to upload the JSON to the dwh's. In further iterations was thinking about implementing a tap for telegram with incremental capabilities yes.
@pat_nadolny looks like what I need! Thank you. Will check it during this week )
Yep that works JSON->Snowflake: Meltano.yml: plugins: extractors: - name: tap-spreadsheets-anywhere variant: ets pip_url: git+https://github.com/ets/tap-spreadsheets-anywhere.git loaders: - name: target-snowflake variant: transferwise pip_url: pipelinewise-target-snowflake environments: - name: dev config: plugins: extractors: - name: tap-spreadsheets-anywhere config: tables: - path: file//C/temp name: telegram_messages format: json key_properties: [] start_date: '2017-05-01T000000Z' pattern: pf500.json loaders: - name: target-snowflake config: user: <secretuser> account: <secretaccount> dbname: raw warehouse: COMPUTE_WH default_target_schema: raw file_format: raw.raw.mlt primary_key_required: false I even went through the same error @aaronsteers was facing 2 years ago 🙂 "Must specify the full search path starting from database for RAW" https://www.giters.com/transferwise/pipelinewise-target-snowflake/issues/75 Thanks to everybody 👍
a
Wow! A trip through the wayback machine! 😅
Glad you got it resolved, @sergey_vdovin!