michael_cooper
08/26/2020, 7:49 PMtap-slack
and target-snowflake
to work, but I keep getting this error: target-snowflake | INFO channels: buffer has expired, flushing.
The tap will eventually finish syncing, but then it just stalls.douwe_maan
08/26/2020, 7:54 PMbuffer has expired, flushing
isn't an error, it's actually the target working as designed, saving up records to insert in a buffer, that is flushed when the buffer hits a certain size (batch size), or when the buffer hasn't been flushed in a while (when it expires)douwe_maan
08/26/2020, 7:54 PMdouwe_maan
08/26/2020, 7:55 PMdouwe_maan
08/26/2020, 7:56 PMmichael_cooper
08/26/2020, 8:02 PMelt
and invoke
.douwe_maan
08/26/2020, 8:32 PMmeltano --log-level=debug elt
output?michael_cooper
08/26/2020, 8:43 PMtap-slack | INFO Finished Sync..
tap-slack (out) | {"type": "STATE", "value": {"bookmarks": {"users": {"updated": "2020-08-26T18:01:37.000000Z"}, "messages": {"C0GMF42VB": "2020-08-01T00:00:00", "C0GMKAS5U": "2020-08-01T00:00:00", "C11UGLU0J": "2020-08-01T00:00:00", "C247XTP42": "2020-08-01T00:00:00", "C6Q8YA1K8": "2020-08-01T00:00:00", "C6TUXLR8S": "2020-08-01T00:00:00", "CAS4RGG65": "2020-08-01T00:00:00", "CBRGR0067": "2020-08-01T00:00:00", "CCMTMS5A9": "2020-08-01T00:00:00", "CD59FU3QU": "2020-08-01T00:00:00", "CDBNTD2EL": "2020-08-01T00:00:00", "CJ58P5WHZ": "2020-08-01T00:00:00", "CKBP65GAZ": "2020-08-01T00:00:00", "CKDN39H62": "2020-08-01T00:00:00", "CLN8CB132": "2020-08-01T00:00:00", "CLT52B5DY": "2020-08-01T00:00:00", "CM0TEAAM6": "2020-08-01T00:00:00", "CP83KKPF1": "2020-08-01T00:00:00", "CRC8PNVLG": "2020-08-01T00:00:00", "CSY0793RT": "2020-08-01T00:00:00", "C010RE8P77Y": "2020-08-01T00:00:00", "C0140JDFC56": "2020-08-01T00:00:00", "C0147J5NG95": "2020-08-01T00:00:00", "C014CS4EL74": "2020-08-01T00:00:00", "C015SS4DBV2": "2020-08-01T00:00:00", "C0162DW9399": "2020-08-01T00:00:00", "C016B9CGW30": "2020-08-01T00:00:00", "C016GL175K7": "2020-08-01T00:00:00", "C016J1PDJGJ": "2020-08-01T00:00:00", "C016J7QNR8T": "2020-08-01T00:00:00", "C016S52NXDY": "2020-08-01T00:00:00", "C01728YC7JB": "2020-08-01T00:00:00", "C017Z1XQS76": "2020-08-01T00:00:00", "CQ19GT0RY": "2020-08-01T00:00:00", "CQBP6QYBZ": "2020-08-01T00:00:00", "C0136S1HRQU": "2020-08-01T00:00:00", "C0180Q695A7": "2020-08-01T00:00:00"}, "files": "2020-08-01T00:00:00", "remote_files": "2020-08-01T00:00:00"}}}
michael_cooper
08/26/2020, 8:44 PMdouwe_maan
08/26/2020, 8:54 PMtarget-jsonl
: https://meltano.com/plugins/loaders/jsonl.html) so that we can figure out if this an issue in target-snowflake or Meltano itself?michael_cooper
08/26/2020, 9:05 PMmeltano | Loading failed (1): FileNotFoundError: [Errno 2] No such file or directory: 'output/channels.jsonl'
douwe_maan
08/26/2020, 9:05 PMoutput
directory to already exist, can you briefly create one in your project root?michael_cooper
08/26/2020, 9:11 PMmichael_cooper
08/26/2020, 9:11 PMdouwe_maan
08/26/2020, 9:12 PMSounds like a new feature to add the directory if it doesn't exist!I know! I created https://gitlab.com/meltano/meltano/-/issues/2185 a month ago, but I'm not sure if this should be fixed in Meltano or in
target-jsonl
itselfdouwe_maan
08/26/2020, 9:14 PMAnyway, it completes with JSONL.All right, that's interesting. Then I'm curious what's going on in
target-snowflake
that's preventing it from flushing its records and quitting when the tap quits.
Are you comfortable enough with Python to add some print-based debugging statements to the the process_input
function at https://gitlab.com/meltano/target-snowflake/-/blob/master/target_snowflake/__init__.py#L29, so that we can get a better idea of where it's getting stuck? In your project, that file would live at .meltano/loaders/target-snowflake/venv/lib/python3.6/site-packages/target_snowflake/__init__.py
douwe_maan
08/26/2020, 9:16 PMmichael_cooper
08/27/2020, 3:12 PMprocess_input
and I wanted to see what the lines
parameter was. that print statement only ever triggers once thoughout the tap-slack
sync, and it prints this:
target-snowflake (out) | lines: <_io.TextIOWrapper name='<stdin>' encoding='utf-8'>
douwe_maan
08/27/2020, 3:16 PMfor line in lines:
will wait to yield new lines as long as stdin
as open. I'm curious if we ever get to the statement after for line in lines
when the tap quits and stdin is closed: target.flush_all_cached_records()
, and if that call ever completes, or if that's the actual call we're getting stuck on. I suggest putting log statements before and after that call!douwe_maan
08/27/2020, 3:31 PMmichael_cooper
08/27/2020, 5:31 PMprocess_inputs
def process_input(config, lines):
print("Starting process_input")
"""
The core processing loop for any Target
Loop through the lines sent in sys.stdin, process each one and run DDL and
batch DML operations.
"""
target = TargetSnowflake(config)
# Loop over lines from stdin
for line in lines:
print("Processing Line: ", line)
target.process_line(line)
print("Finished Line: ", line)
# If the tap finished its execution, flush the records for any remaining
# streams that still have records cached (i.e. row_count < batch_size)
print("Flushing all cached records")
target.flush_all_cached_records()
print("Flushed all cached records")
And these are the logs right before it fails:
```meltano | WARNING Received state is invalid, incremental state has not been updated
target-snowflake (out) | Processing Line: {"type": "STATE", "value": {"bookmarks": {"users": {"updated": "2020-08-27T152907.000000Z"}, "messages": {"C0GMF42VB": "2020-08-01T000000", "C0GMKAS5U": "2020-08-01T000000", "C11UGLU0J": "2020-08-01T000000", "C247XTP42": "2020-08-01T000000", "C6Q8YA1K8": "2020-08-01T000000", "C6TUXLR8S": "2020-08-01T000000", "CAS4RGG65": "2020-08-01T000000", "CBRGR0067": "2020-08-01T000000", "CCMTMS5A9": "2020-08-01T000000", "CD59FU3QU": "2020-08-01T000000", "CDBNTD2EL": "2020-08-01T000000", "CJ58P5WHZ": "2020-08-01T000000", "CKBP65GAZ": "2020-08-01T000000", "CKDN39H62": "2020-08-01T000000", "CLN8CB132": "2020-08-01T000000", "CLT52B5DY": "2020-08-01T000000", "CM0TEAAM6": "2020-08-01T000000", "CP83KKPF1": "2020-08-01T000000", "CRC8PNVLG": "2020-08-01T000000", "CSY0793RT": "2020-08-01T000000", "C010RE8P77Y": "2020-08-01T000000", "C0140JDFC56": "2020-08-01T000000", "C0147J5NG95": "2020-08-01T000000", "C014CS4EL74": "2020-08-01T000000", "C015SS4DBV2": "2020-08-01T000000", "C0162DW9399": "2020-08-01T000000", "C016B9CGW30": "2020-08-01T000000", "C016GL175K7": "2020-08-01T000000", "C016J1PDJGJ": "2020-08-01T000000", "C016J7QNR8T": "2020-08-01T000000", "C016S52NXDY": "2020-08-01T000000", "C01728YC7JB": "2020-08-01T000000", "C017Z1XQS76": "2020-08-01T000000", "CQ19GT0RY": "2020-08-01T000000", "CQBP6QYBZ": "2020-08-01T000000", "C0136S1HRQU": "2020-08-01T000000", "C0180Q695A7": "2020-08-01T000000"}}}}
meltano | WARNING Received state is invalid, incremental state has not been updated
target-snowflake (out) |
meltano | WARNING Received state is invalid, incremental state has not been updated
target-snowflake (out) | Finished Line: {"type": "STATE", "value": {"bookmarks": {"users": {"updated": "2020-08-27T152907.000000Z"}, "messages": {"C0GMF42VB": "2020-08-01T000000", "C0GMKAS5U": "2020-08-01T000000", "C11UGLU0J": "2020-08-01T000000", "C247XTP42": "2020-08-01T000000", "C6Q8YA1K8": "2020-08-01T000000", "C6TUXLR8S": "2020-08-01T000000", "CAS4RGG65": "2020-08-01T000000", "CBRGR0067": "2020-08-01T000000", "CCMTMS5A9": "2020-08-01T000000", "CD59FU3QU": "2020-08-01T000000", "CDBNTD2EL": "2020-08-01T000000", "CJ58P5WHZ": "2020-08-01T000000", "CKBP65GAZ": "2020-08-01T000000", "CKDN39H62": "2020-08-01T000000", "CLN8CB132": "2020-08-01T000000", "CLT52B5DY": "2020-08-01T000000", "CM0TEAAM6": "2020-08-01T000000", "CP83KKPF1": "2020-08-01T000000", "CRC8PNVLG": "2020-08-01T000000", "CSY0793RT": "2020-08-01T000000", "C010RE8P77Y": "2020-08-01T000000", "C0140JDFC56": "2020-08-01T000000", "C0147J5NG95": "2020-08-01T000000", "C014CS4EL74": "2020-08-01T000000", "C015SS4DBV2": "2020-08-01T000000", "C0162DW9399": "2020-08-01T000000", "C016B9CGW30": "2020-08-01T000000", "C016GL175K7": "2020-08-01T000000", "C016J1PDJGJ": "2020-08-01T000000", "C016J7QNR8T": "2020-08-01T000000", "C016S52NXDY": "2020-08-01T000000", "C01728YC7JB": "2020-08-01T000000", "C017Z1XQS76": "20…douwe_maan
08/27/2020, 5:32 PMdouwe_maan
08/27/2020, 5:33 PMdouwe_maan
08/27/2020, 5:34 PM<http://LOGGER.info|LOGGER.info>(message)
michael_cooper
08/27/2020, 6:37 PMmichael_cooper
08/27/2020, 6:37 PMteams
douwe_maan
08/27/2020, 8:07 PMCRITICAL Not allowed type update for teams.id: ('FLOAT', 'VARCHAR(16777216)')
) indicates that the teams.id
column already exists with type FLOAT
, but that the latest SCHEMA
message defines the id
column as a VARCHAR
instead, which is a change that's not supported: "id": { "type": ["null", "string"] }
Do you know where the float teams.id
column may be coming from? I expect that if you simply drop the table and try again, the target will be able to create the table correctly this time.michael_cooper
08/27/2020, 9:17 PMteams
table. When I ran the Slack tap, the two tables for slack and Github then conflicted.michael_cooper
08/27/2020, 9:18 PMdouwe_maan
08/27/2020, 9:22 PMschema
setting is $MELTANO_EXTRACTOR_NAMESPACE
, which automatically expands to the ELT pipeline's extractor's namespace (tap_github
, tap_slack
, etc) as documented under https://meltano.com/docs/integration.html#pipeline-environment-variables. Would that suffice?
If you'd like to have more control over the schema, you can add new preferred_schema: MY_SCHEMA
properties to your extractor definitions in meltano.yml
, which you can then reference from target-snowflake's schema
value as $MELTANO_EXTRACT__PREFERRED_SCHEMA
.
We're going to make that the default behavior in https://gitlab.com/meltano/meltano/-/issues/2282, because it makes it easier to override the schema and more clear what's going on.douwe_maan
08/27/2020, 9:22 PMpreferred_schema
property would be an example of a custom (https://meltano.com/docs/configuration.html#custom-settings) plugin extra (https://meltano.com/docs/configuration.html#plugin-extras)michael_cooper
08/27/2020, 9:41 PMSF_SCHEMA
within a .env
file? It thus it defaults to produce the schema as tap_<TAP_NAME>
within your target?
And with the new setting, does that mean all you have to do is add preferred_schema: CUSTOM_SCHEMA_NAME
to the extractor definition, and then it will automatically use the preferred schema without having to add any other settings elsewhere?douwe_maan
08/27/2020, 9:46 PMDoes that mean you don't have to define theÂCorrect! within aÂSF_SCHEMA
 file?.env
It thus it defaults to produce the schema asÂBy default, it'll use the extractor's within your target?tap_<TAP_NAME>
namespace
, as defined in meltano.yml
for your custom plugins. If you've followed the recommendation, it'll look like tap_github
for tap-github
.
And with the new setting, does that mean all you have to do is addÂUntil we implement https://gitlab.com/meltano/meltano/-/issues/2282 which would make this the default, you'd also have to explicitly add to the extractor definition, and then it will automatically use the preferred schema without having to add any other settings elsewhere?preferred_schema: CUSTOM_SCHEMA_NAME
schema: $MELTANO_EXTRACT__PREFERRED_SCHEMA
to target-snowflake
's config
object in meltano.yml
.