What鈥檚 the difference between the `Per record` and...
# singer-target-development
h
What鈥檚 the difference between the
Per record
and
Per batch
options for
serialization_method
with singer target sdk?
d
cc @aaronsteers
a
Hi, @hassan_syyid - The difference is just in how your target prefers to write records. Are you writing one record at a time (such as with a singleton API endpoint), or do you need to write many rows at once in order to get the best performance (such as with Snowflake/Redshift that require loading via CSV)?
h
Ahh makes sense thanks
a
馃憤
h
Writing an Airtable target currently 馃檪
a
Oh cool! Do you know which camp that falls into? Or perhaps it supports both?
h
I think it鈥檚 better suited for batch
a
Yeah, I see it does support batch... I was curious so I wanted to check it out. What's weird/interesting is that it seems there are different API contracts depending on how you create the objects in the GUI interface. For a random object in my account, I found these docs. On the one hand this particular endpoint doesn't want to accept more than 10 records per POST, but on the other hand it also doesn't want greater than 5 total requests per second. (50 per second is much better than only 5 records per second.) Your mileage may differ though, depending on how your objects are setup.
Because rate limiting may become an issue for you, I'll link this issue. We don't have formal rate limit handling yet but you can use your own custom logic and/or post into this issue with ideas/proposals for improved central SDK-based handling: Formal handling of API rate limits (#140) 路 Issues 路 Meltano / Meltano SDK for Singer Taps and Targets 路 GitLab
h
@aaronsteers If I change
DEFAULT_BATCH_SIZE_ROWS
will the default
process_rows
create batches of max 10 rows?
a
Sorry - you would think so... but actually I think you want
Sink.max_size
I'll log an issue to clean up the ambiguity.
Setting
max_size = 10
should force
is_full
to report 'true' whenever the sync reaches 10 records, which then causes the
process_batch()
method to be called.
h
Do I need to empty the
context["records"]
myself/
a
Nope. The context will be disposed of when you're done.
h
Copy code
Uploaded 10 | success=True
Uploaded 20 | success=False
level=ERROR message={"error":{"type":"INVALID_RECORDS","message":"A maximum of 10 records can be created per request but you have provided 20."}}
=Uploaded 28 | success=False
level=ERROR message={"error":{"type":"INVALID_RECORDS","message":"A maximum of 10 records can be created per request but you have provided 28."}}
Are you certain?
Seems like
records
count is going up
a
That would be a bug. 馃悰 Can you go ahead and try resetting the 'records' entry and see if that resolves it?
h
Yup, adding
context["records"] = []
this fixed it
a
Okay, thanks for the real-time feedback. I'm logging that as a bug and will fix in the next release.
Bugs notwithstanding, very cool to see the logs of records being posted! 馃檪
Did you end up using a generic REST / requests library approach, or custom airtable library for auth and posting updates?
h
Just used requests for now
Got a little prototype working which is super cool
Barely wrote any code 馃槄
d
Just how we like it 馃槃
a
Nice! Looks like
<100
lines of code. Maybe a new record? 馃檪
s
Hey guys, I just found this thread about the target-airtable. I actually need one, too. How can I use @hassan_syyid鈥檚 implementation to install it into a python venv and use it together with a singer tap? 馃
h
You should be able to use it normally. Usually what I do is create a venv for the tap and one for the target. So you can do:
Copy code
tap-quickbooks --config config.json --catalog catalog.json > data.txt
switch venv
Copy code
cat data.txt | target-airtable --config config.json