I'm doing some business recently Redis Volume reduction work , It involves data migration , and Redis The data migration of the system seems quite simple , One to one data migration only requires slave Row configuration masterauth
and slaveof Two parameters are enough , Of course, the migration process involves other special circumstances, which need special treatment .

After the above steps are completed ,
Waiting for the instance to switch , However, we also need to check the synchronization before switching instances , Data consistency, etc . A strange phenomenon was found while checking the instance synchronization : Migrating 540 Individual instances out of instances (20 individual )
lag Relatively high , And there is an increasing trend , It's strange offset The value has always been 0.

After a period of observation , It was found that this phenomenon did not disappear , and lag It's also increasing with time ,offset Values are always maintained 0. Is this data slave It hasn't been applied ? 
cli upper slave After discovery slave Of dbsize and master Of dbsize Basically consistent ;monitor slave
It is also found that the command is synchronized normally ;master set One key, stay slave You can also read this correctly key; I'm looking at it slave Log on :
[13739] 10 Aug 15:49:46.017 * Connecting to MASTER xxxx.xxxx.xxx:xxx
[13739] 10 Aug 15:49:46.017 * MASTER <-> SLAVE sync started
[13739] 10 Aug 15:49:46.018 * Non blocking connect for SYNC fired the event.
[13739] 10 Aug 15:49:46.032 * Master replied to PING, replication can
[13739] 10 Aug 15:49:46.092 * Partial resynchronization not possible (no cached
[13739] 10 Aug 15:49:46.120 # Unexpected reply to PSYNC from master: -LOADING
Redisis loading the dataset in memory
[13739] 10 Aug 15:49:46.120 * Retrying with SYNC...
[13739] 10 Aug 15:49:46.641 * MASTER <-> SLAVE sync: receiving 26845995 bytes
[13739] 10 Aug 15:49:47.276 * MASTER <-> SLAVE sync: Flushing old data
[13739] 10 Aug 15:49:47.276 * MASTER <-> SLAVE sync: Loading DB in memory
[13739] 10 Aug 15:49:47.620 * MASTER <-> SLAVE sync: Finished with success
[13739] 10 Aug 15:49:47.621 * Background append only file rewriting started by
pid 22605
[22605] 10 Aug 15:49:48.724 * SYNC append only file rewrite performed
[22605] 10 Aug 15:49:48.725 * AOF rewrite: 0 MB of memory used by copy-on-write
[13739] 10 Aug 15:49:48.822 * Background AOF rewrite terminated with success
[13739] 10 Aug 15:49:48.822 * Parent diff successfully flushed to the rewritten
AOF (1148bytes)
[13739] 10 Aug 15:49:48.822 * Background AOF rewrite finished successfully
It seems to be normal, too :(

So here comes the question... , Since our data can be synchronized properly , Why? master Information displayed on slave It's been delayed ? Do you? ? 
Open the code and find this lag ,offset How to calculate :
if (slave->replstate == SLAVE_STATE_ONLINE)
lag = time(NULL) - slave->repl_ack_time;

info = sdscatprintf(info,
slave->repl_ack_off, lag);
long long repl_ack_off; /* Replication ack offset, if this is a slave. */
long long repl_ack_time;/* Replication ack time, if this is a slave. */

You can find this lag Yes master The current time of the slave adopt ACK I got it when I got it , At this point, we can suspect that it is slave It hasn't been ACK ?

Here, through to lag,offset The value is normal master Node execution monitor Commands can be found , Of these examples slave It did ACK Ordered back
, And this kind of node that seems to be abnormal does not exist ACK return .

Here it seems to have found the crux of the problem , So why is this seemingly unusual slave Do not send ACK to master What about ? This has to be removed layer by layer slave The veil of movement .

Check again slave Running log , And compared with other examples , It seems that we can find some different places :

You can see from the above figure , It can be sent normally ACK to master There are several more lines in the instance log , And these logs may be sent or not ACK The key to .

You can get the following information by looking at the source code : 
replicationCron Function is executed once per second , If the current instance is configured masterhost, Then check the status of synchronization server.repl_state ,
If this state by : REPL_STATE_CONNECT /* Must connect to master */, Then they will try and Master Establishing a connection
connectWithMaster(), If the connection is established properly , So and master Data synchronization  syncWithMaster(), And update the initial delivery PING
Package to Master, wait for Master
Can use PONG, When received Master Response to , And carry on AUTH After operation ,slave Partial synchronization is attempted , Full synchronization occurs when partial synchronization fails
slaveTryPartialResynchronization(), And the synchronization instruction sent in this function is PSYNC instructions , When master The response is not
+FULLRESYNC perhaps +CONTINUE Time , So the unified view Master incognizance PSYNC
instructions , And then , In order to be compatible with the old version of the synchronization mode , It will be used here SYNC Instruction Reissue Master.
psync_result = slaveTryPartialResynchronization(fd);
if (psync_result == PSYNC_CONTINUE) {
redisLog(REDIS_NOTICE, "MASTER <-> SLAVE sync: Master accepted a Partial
/* Fall back to SYNC if needed. Otherwise psync_result == PSYNC_FULLRESYNC
* and the server.repl_master_runid and repl_master_initial_offset are
* already populated. */
if (psync_result == PSYNC_NOT_SUPPORTED) {
redisLog(REDIS_NOTICE,"Retrying with SYNC...");

After execution of play instructions , If it's successful, we'll go on to the next step , Full synchronous reception RDB,FLUSHDB,LOADRDB And so on .

At this point, we can explain the inconsistency between the log output of the two instances , It can be considered that an instance uses the PSYNC Synchronization mode of , Another example uses the SYNC
The way . Used SYNC Examples of synchronous mode server.repl_master_initial_offset = -1 , and Used PSYNC Examples of synchronous mode
server.repl_master_initial_offset = 1 .( For this variable, you can use the gdb Tools to verify ,gdb Use and cherish risks ~)

And now back to sending ACK Where is the function :
/* Send ACK to master from time to time.
* Note that we do not send periodic acks to masters that don't
* support PSYNC and replication offsets. */
if (server.masterhost && server.master &&
!(server.master->flags & CLIENT_PRE_PSYNC))
#define CLIENT_PRE_PSYNC (1<<16) /* Instance don't understand PSYNC. */

That is to say, using SYNC Examples of synchronous mode (server.master->flags & CLIENT_PRE_PSYNC)
This condition is not satisfied , So this function will not be executed .

Why? slave Do not send ACK to master We found the root cause , Because slave The synchronization mode used is SYNC mode . 
Then use PSYNC and SYNC Time ,master Will you do different things ? What is the impact on synchronized data ?show me the code :

if (!strcasecmp(c->argv[0]->ptr,"psync")) {
if (masterTryPartialResynchronization(c) == C_OK) {
return; /* No full resync needed, return. */
} else {
char *master_replid = c->argv[1]->ptr;
if (master_replid[0] != '?') server.stat_sync_partial_err++;
} else {
/* If a slave uses SYNC, we are dealing with an old implementation
* of the replication protocol (like redis-cli --slave). Flag the client
* so that we don't expect to receive REPLCONF ACK feedbacks. */
---- It is also shown here , Used SYNC Mode ,master Not expected slave send out ACK come back
c->flags |= CLIENT_PRE_PSYNC;

You can see ,master Yes PSYNC and SYNC The entrance of the two synchronization modes is the same , The difference is that PSYNC Partial synchronization is possible , and SYNC Full synchronization is only possible .

Summary : 
redis slave There are two ways to synchronize data ,PSYNC and SYNC , When slave use PSYNC
When data synchronization fails , Will try to use SYNC Mode synchronization , And use SYNC When synchronizing data with , It won't be given Master send out ACK data , cause master See on slave Of lag
Inaccurate information .

lag This value may not be used to determine one slave Is there a delay , How long is the delay . We can base it on master_last_io_seconds
To judge this slave Is there a delay ; Or we need to find out through peripheral monitoring .