Rob Norman
08/07/2025, 7:02 AMSELECT <list of columns> FROM <table> ORDER BY `created_at` ASC;
My tap configuration is pretty basic, there's nothing weird going on:
plugins:
extractors:
- name: tap-mysql
variant: transferwise
pip_url:
git+<https://github.com/transferwise/pipelinewise.git#subdirectory=singer-connectors/tap-mysql>
select:
- ${TAP_MYSQL_DATABASE}-<table>.*
settings:
- name: engine
value: mysql
config:
session_sqls:
- SET @@session.wait_timeout=28800
- SET @@session.net_read_timeout=3600
- SET @@session.innodb_lock_wait_timeout=3600
metadata:
'*-<table>':
replication-method: INCREMENTAL
replication-key: created_at
Am I just fundamentally missing something because trying to read the entire table in one go seems insane to me? Do I need to set the replication-method to FULL_TABLE for the first load and manually fiddle the state or something?Dylan Sprayberry
08/08/2025, 1:31 PMDo I need to set the replication-method to FULL_TABLE for the first load and manually fiddle the state or something?That wouldn't actually result in any difference in the query that's run against your source db; full table will request the entire contents of the table for every extraction. Similarly I wouldn't expect it to have any difference in terms of how your db responds to the query (eg whether or not it uses your index). The lack of indexing via the query plan feels like a server-side configuration issu that's worth looking into regardless, but some flavors of tap mysql have a
LIMIT
or INCREMENTAL_LIMIT
property that's accepted in the config that would (as you'd expect) add a limit to your query clause (see this code block). The risks with using this approach are that
1. if all of the records yielded from a given query have the same replication key value as the one already stored as a bookmark, it'll never continue advancing.
2. Generally the db is going to need to scan the entire table anyway to order the records before applying the index, so low performance dbs might struggle to get though a real big table anywa.
Seems ilke the pipelinewise version doesn't include that property, but it could be worth looking into