I've checked over a new Stream .. tested it in poe...
# troubleshooting
e
I've checked over a new Stream .. tested it in poetry.. everything seems to run.. but when I go to push this into postgresql it gives me a error as if my JSON is no good.. is there any better way to debug this besides eyeballing the output of my tap ? It was working just fine , wiped out the whole thing and redeployed from scratch but .. seems something in postgresql is no longer happy or the input to it is not valid JSON
Copy code
target-postgres | loader    | time=2021-10-26 01:09:51 name=target_postgres level=ERROR message=Unable to parse:
target-postgres | loader    | Database connected, dbname = mydb_name username = juju_meltano
target-postgres | loader    | 
target-postgres | loader    | Traceback (most recent call last):
target-postgres | loader    |   File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/bin/target-postgres", line 8, in <module>
target-postgres | loader    |     sys.exit(main())
target-postgres | loader    |   File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 373, in main
target-postgres | loader    |     persist_lines(config, singer_messages)
target-postgres | loader    |   File "/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 101, in persist_lines
target-postgres | loader    |     o = json.loads(line)
target-postgres | loader    |   File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
target-postgres | loader    |     return _default_decoder.decode(s)
target-postgres | loader    |   File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
target-postgres | loader    |     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
target-postgres | loader    |   File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
target-postgres | loader    |     raise JSONDecodeError("Expecting value", s, err.value) from None
target-postgres | loader    | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Has the transferwise variant been supplanted in the move to meltanohub or ? I think my input looks fine and I've double checked some of it by just copy-pasting it into VSCode and having it lint it
a
Sounds like the cause of the json parsing failure is still somewhat unclear. Have you tried already running with
--log-level=DEBUG
in case that offers another clue?
e
I guess I am in the GUI and need to drop back into the CLI first.. keep forgetting to try that
when developing the tap on host everything works, data comes out
one odd behavior I noticed though was.. I've tried my first attempt at using the catalog feature .. to trigger only 1 stream of my now 2 streams in a single tap.. but I can see both streams are triggering.. perhaps that is the cause.. it expecting one or the others schema
Copy code
- name: tap-ibkr-tickers
    namespace: tap_ibkr
    catalog: extract/tap-ibkr-tickers.catalog.json
    executable: /stonks-tap-ibkr/tap-ibkr/tap-ibkr.sh
    config:
      host_tws_thrift: 127.0.0.1
      tws_thrift_port: 9090
  - name: tap-ibkr-news
    namespace: tap_ibkr
    executable: /stonks-tap-ibkr/tap-ibkr/tap-ibkr.sh
    catalog: extract/tap-ibkr-news.catalog.json
    config:
      host_tws_thrift: 127.0.0.1
      tws_thrift_port: 9090
      target_host: 1.2.3.4
      target_username: myusername
      target_password: 1234
  loaders:
  - name: target-postgres
    variant: transferwise
    pip_url: git+<https://github.com/transferwise/pipelinewise-target-postgres.git>
    config:
      host: 1.2.3.4
      port: 5432
      user: myusername
      password: 1234
      dbname: stonks
      schema: public
I think my command to invoke is not recognizing log level.. but I went ahead and ran a CLI
meltano invoke tap-ibkr-news
```Installing the current project: tap-ibkr (0.0.2) time=2021-10-26 012725 name=tap-ibkr level=INFO message=tap-ibkr v0.0.2, Meltano SDK v0.3.10) time=2021-10-26 012725 name=tap-ibkr level=INFO message=Skipping parse of env var settings... time=2021-10-26 012725 name=tap-ibkr level=INFO message=Config validation passed with 0 errors and 0 warnings. time=2021-10-26 012725 name=root level=INFO message=Operator '{MAPPER_ELSE_OPTION}=None' was not found. Unmapped streams will be included in output. time=2021-10-26 012725 name=tap-ibkr level=INFO message=Beginning full_table sync of 'ib_news'... time=2021-10-26 012725 name=tap-ibkr level=INFO message=Tap has custom mapper. Using 1 provided map(s). {"type": "SCHEMA", "stream": "ib_news", "schema": {"properties": {"symbol": {"type": ["string", "null"]}, "contract_id": {"type": ["integer", "null"]}, "contract_id_query_datetime_max": {"format": "date-time", "type": ["string", "null"]}, "provider_code": {"type": ["string", "null"]}, "article_id": {"type": ["string", "null"]}, "article_timestamp": {"format": "date-time", "type": ["string", "null"]}, "headline": {"type": ["string", "null"]}, "extra_data": {"type": ["string", "null"]}, "query_start_time": {"format": "date-time", "type": ["string", "null"]}}, "type": "object"}, "key_properties": ["article_id"]} time=2021-10-26 012725 name=tap-ibkr level=INFO message=Beginning by gathering latest ticker symbols known! Database connected, dbname = stonks username = juju_meltano time=2021-10-26 012725 name=tap-ibkr level=INFO message=Now we take those good symbols and query IBKR! time=2021-10-26 012725 name=tap-ibkr level=INFO message=connected to IBKR, beginning query of news... with 29816 results {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "rytm", "contract_id": 291378079, "article_id": "BRFUPDN$0f89bb2b", "headline": "BofA Securities downgraded Rhythm Pharmaceuticals (RYTM) to Underperform with target $17", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723084Z"} {"type": "STATE", "value": {"bookmarks": {"ib_news": {}}}} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "rytm", "contract_id": 291378079, "article_id": "BRFUPDN$0f89bb2d", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723261Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "rytm", "contract_id": 291378079, "article_id": "BRFUPDN$0fcc8ed1", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723316Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "form", "contract_id": 18416237, "article_id": "BRFUPDN$0f19c9df", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723368Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "form", "contract_id": 18416237, "article_id": "BRFUPDN$0f3ac696", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723410Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "form", "contract_id": 18416237, "article_id": "BRFUPDN$0f778f32", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723451Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "mnkd", "contract_id": 268671846, "article_id": "BRFUPDN$0ef82c1f", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723492Z"} {"type": "RECORD", "stream": "ib_news", "record": {"symbol": "mpw", "contract_id": 35111040, "article_id": "BRFUPDN$0ee361f7", "headline": "", "query_start_time": "2021-10-25T232725.606646Z"}, "time_extracted": "2021-10-25T232735.723533Z"} {"type": "RECORD", "stream": "ib_news"…
I guess I will recheck my catalog work.. but so far I output the catalog.. then remove whichever stream I do not want to query.. or so I thought.. but this line here seems to be telling me it's tossing the
article_id
when querying news.. so I must need to fix my catalog request
Copy code
time=2021-10-26 01:27:35 name=tap-ibkr level=WARNING message=Property 'article_id' was present in the 'ib_tickers' stream but not found in catalog schema. Ignoring.
time=2021-10-26 01:27:35 name=tap-ibkr level=WARNING message=Property 'alphabet_partition' was present in the 'ib_tickers' stream but not found in catalog schema. Ignoring.
v
This problem is frustrating https://gitlab.com/meltano/sdk/-/issues/228 is in to cover it. Right now the only way I have to test this is to 🤞 and hope that target-jsonl fails as well (normally it does). Then patch in a log info on the Record right before the failure in target-jsonl. It'll print the record that's the issue, from there it's normally pretty easy to find the issue Yeah it's a pretty crappy way to do it, but it's the best I've got 😄
You can do the same with target-postgres, go in to
/home/ubuntu/meltano_proj_repo/.meltano/loaders/target-postgres/venv/lib/python3.8/site-packages/target_postgres/__init__.py", line 101, in persist_lines
and add a log.info on the record right before line 101
🤷
e
Derek, you are awesome and make me want to see how I can contribute to the documentation with things like this .. but this exact workaround likely needs to be posted or pinned somewhere so I can refer back to it
I'm in the day job now , will try this later tonight
ah just saw your tip on JSON L.. I will do that as part of my workflow great link
Update: printed out the line... and indeed it's
None
or empty
I must not be... using catalogs correctly
I thought you just.. output your catalog.. remove the stream you don't want.. then , in the name of your tap.. drop in whichever catalog JSON you want to use...
and as I suspected.. the taps ignoring my 2nd stream and .. trying to load data I guess into my first stream
I'll review the docs again
If I cannot do that.. I may need to move on and create a 2nd tap altogether
until I can come back to this
for reference, here's my news catalog.. where I attempt to ignore ticker symbols stream and just download news
tap-ibkr-news.catalog.json
v
Normally you put "select": false for the streams you don't want. Meltano helps with that with the
select
attribute anything selected is used, things not selected are not
Not sure if that's what you're saying it's hard to follow
e
I just read this.. actually in the docs
Copy code
Meltano makes it easy to select specific entities and attributes for inclusion or exclusion using meltano select and the select extractor extra, which let you specify inclusion and exclusion rules that can contain Unix shell-style wildcards to match multiple entities and/or attributes at once.
so.. yeah was totally missing that and was trying to submit catalogs as my selection
I think this was the key bit I missed and.. I can just erase these manually generated catalogs altogether.. and I am now reading how to
select
guessing it's something like this
Copy code
- name: tap-ibkr-news
    namespace: tap_ibkr
    executable: /stonks-tap-ibkr/tap-ibkr/tap-ibkr.sh
    select: "insert some logic to ignore ticker symbols"
    config:
      host_tws_thrift: 127.0.0.1
      tws_thrift_port: 9090
      target_host: 1.2.3.4
      target_username: myusername
      target_password: 1234
v
😄 yes the catalog generation peice of Meltano is one of the greatest time savers.
select: "insert some logic to ignore ticker symbols"
Well it's normally setup as you select streams, you don't filter out values of data
v
so something like
Copy code
select: 
  - "stocks.*"
would select your stocks but not your news
e
so my streams look like this
Copy code
from singer_sdk import typing as th  # JSON Schema typing helpers
from tap_ibkr.client import IBKRStream, KludgeStream

from tap_ibkr.utils.getAllPossibleSymbols import get_all_tickers


class TickersStream(IBKRStream):

    name = "ib_tickers"
    primary_keys = ["contract_id", "query_start_time"]
    replication_key = None

    partitions = get_all_tickers()

    schema = th.PropertiesList(
        th.Property("symbol", th.StringType),
        th.Property("sec_type", th.StringType),
        th.Property("primary_exchange", th.StringType),
        th.Property("exchange", th.StringType),
        th.Property("currency", th.StringType),
        th.Property("contract_id", th.IntegerType),
        th.Property("query_start_time", th.DateTimeType)
    ).to_dict()


class NewsStream(KludgeStream):

    name = "ib_news"
    primary_keys = ["article_id"]
    replication_key = None

    schema = th.PropertiesList(
        th.Property("symbol", th.StringType),
        th.Property("contract_id", th.IntegerType),
        th.Property("provider_code", th.StringType),
        th.Property("article_id", th.StringType),
        th.Property("article_timestamp", th.DateTimeType),
        th.Property("headline", th.StringType),
        th.Property("extra_data", th.StringType),
        th.Property("query_start_time", th.DateTimeType)
    ).to_dict()
So my select for news will be
Copy code
select:
  - "ib_news.*"
v
exactly!
e
the easiest way to open source this tap.. might be make it compatible with yahoo finance..
I would write an equivalent thrift service or maybe just use REST.. unclear
awesome @visch, I am so close here to a big leap.. essentially will be downloading all analyst headlines in 15 minutes of the US market
afterwhich I will just be trying to do some NLP work
I demo'ed this in the bank and will keep pushing to see how I can best demo it
v
I'll be curious about the NLP stuff I have no idea past hooking into existing libraries 😄
e
I usually do image recognition.. because NLP is very costly to get done well
but I can share once I know more.. last NLP stuff I did was sentiment analysis.. this will be somewhat similar
I need to parse the headline for the analysts sentiment, the target price of a stock.. and the firms name
I work mostly in pytorch
im going to destroy my meltano container... and start again with this
in Sweden they say "Holding my thumbs" I think it's because it gets so ****ing cold here.. but then again MI it's even colder than Stockholm
v
ha, it depends in Michigan too. I think where you're at. I think you're colder than us normally up in Stock Holm. For us the "UP" upper penisula / north Michigan is much colder
holding my thumbs, I didn't hear about that one I like it
e
hmmm seems after modification it's still not quite getting the hint about just wanting the news here's my meltano.yml snippet
Copy code
plugins:
  extractors:
  - name: tap-ibkr-tickers
    select:
      - "ib_tickers.*"
    namespace: tap_ibkr
    executable: /home/emcp/Dev/git/JRGEMCP_STONKS/stonks-tap-ibkr/tap-ibkr/tap-ibkr.sh
    config:
      host_tws_thrift: 127.0.0.1
      tws_thrift_port: 9090
  - name: tap-ibkr-news
    select:
      - "ib_news.*"
I'll try to look deeper into the docs..
v
which stream isn't working?
I mean tap
tap-ibkr-tickers
or
tap-ibkr-news
and does
tap-ibkr-news
have inheirt_from on it?
e
they're actually the same tap
I want to schedule stream A manually.. from Stream B
so I have a manual pipeline for each, referred by name
Copy code
- name: ibkr-to-postgres-tickers
  extractor: tap-ibkr-tickers
  loader: target-postgres
  transform: skip
  interval: '@once'
  start_date: 2021-09-06 20:53:11.568572
- name: ibkr-to-postgres-news
  extractor: tap-ibkr-news
  loader: target-postgres
  transform: skip
  interval: '@once'
  start_date: 2021-09-06 20:53:11.568572
I'm reading "not all taps support this" so.. perhaps somewhere in the tap I need to codify this ability
one thing I sort of dislike about the docs right now.. is it's very CLI heavy or centric.. meaning if I never use or lean into the CLI.. there's very little example material to show me how to get what I want.. using the GUI or meltano.yml file
I can understand the need for it though.. a CLI to debug the data source
Copy code
(meltano) ubuntu@juju-2dd159-248:~/meltano_proj_repo$ meltano select --list tap_ibkr
Extractor 'tap_ibkr' is not known to Meltano
(meltano) ubuntu@juju-2dd159-248:~/meltano_proj_repo$ meltano select --list tap-ibkr
Extractor 'tap-ibkr' is not known to Meltano
(meltano) ubuntu@juju-2dd159-248:~/meltano_proj_repo$ meltano select --list tap-ibkr-news
Cannot list the selected attributes: Could not find catalog. Verify that the tap supports discovery mode and advertises the `discover` capability as well as either `catalog` or `properties`
(meltano) ubuntu@juju-2dd159-248:~/meltano_proj_repo$ meltano select --list tap-ibkr-tickers
Cannot list the selected attributes: Could not find catalog. Verify that the tap supports discovery mode and advertises the `discover` capability as well as either `catalog` or `properties`
(meltano) ubuntu@juju-2dd159-248:~/meltano_proj_repo$
seems I need to do some work on my tap
I'll try later today to drop the double quotes
Seems the docs don't have that in their examples
didn't help.. I think I need to probably verify that my tap is 1. using
select
correctly and 2. if it needs custom tap code how or where do I start