Post

0 followers Follow
0
Avatar

Network timeout messages - ERR-4213: Timeout (or dropped connection)

Question

In the MINE log, for all the plog transmissions I noticed the following messages (below) concerning the transmission of the plog files. This is a dedicated network for performance testing, so no issues at the network side.

2013/10/01 23:31:38 INFO> Restarting transfer of /u04/oracle/DBVisit/replicate/prd_345/mine/7282.plog at offset 309474824. 2013/10/01 23:31:39 INFO> Waiting for more data in online redolog 7283 in thread 1... 2013/10/01 23:31:39 ERROR> ERR-4213: Timeout (or dropped connection) trying to send data for command engine apply propagate plog 1 7282 83023366 83067137 83067140 309474824 /u04/oracle/DBVisit/replicate/prd_345/apply/7282.plog (client). 2013/10/01 23:31:41 INFO> Waiting for more data in online redolog 7283 in thread 1... 2013/10/01 23:31:41 WARN> WARN-4502: Timeout waiting for incoming data (timeout 60s) - get command. 2013/10/01 23:31:41 INFO> Send of file to apply needed reconnect to succeed: stored as /u04/oracle/DBVisit/replicate/prd_345/apply/7282.plog, registered: OK: Plog information registered.. [engine apply propagate plog 1 7282 83023366 83067137 83067140 309474824 /u04/oracle/DBVisit/replicate/prd_345/apply/7282.plog] 2013/10/01 23:31:41 INFO> Send of file to apply succesfull, stored as /u04/oracle/DBVisit/replicate/prd_345/apply/7282.plog, registered: OK: Plog information registered.. [engine apply propagate plog 1 7282 83023366 83067137 83067140 309474824 /u04/oracle/DBVisit/replicate/prd_345/apply/7282.plog] 2013/10/01 23:31:41 INFO> Sending file /u04/oracle/DBVisit/replicate/prd_345/mine/7283.plog to apply as 7283.plog (incremental send) 2013/10/01 23:31:43 INFO> Waiting for more data in online redolog 7283 in thread 1... 2013/10/01 23:31:43 ERROR> ERR-4213: Timeout (or dropped connection) trying to send data for command engine fileinfo plog 7283.plog (client). 2013/10/01 23:31:45 INFO> Waiting for more data in online redolog 7283 in thread 1...

Answer

The ERR-4213 messages may point to some 'issue' at the network level, although the details of this are not specified, and so you should review the connection with your network admin team. The key, however, is that Dbvisit Replicate will pick up and try again after hitting one of these points - and if the processing continues past this point then all is good. Messages to be particularly aware of, and which would indicate an issue (rather than a temporary glitch), would be in the APPLY log where you can see the application waiting for extended periods of time to receive plog information, when you know that there is activity happening on the primary which ought to be replicated.

Another possible reason is that APPLY is very busy with the number of Inserts (etc) and so a timeout is happening on the networking thread. If this is the case then a workaround is to increase the timeout setting on the following variables:

set MINE.NETWORK_QUALITY WAN
set NETWORK_QUALITY WAN
set _TCP_CONNECT_TIMEOUT 300
set _TCP_RECEIVE_TIMEOUT 300
set _TCP_SEND_TIMEOUT 300

The steps to updating these settings are:

  1. Shutdown MINE and APPLY and Console.
  2. Add the above settings to the end of the *MINE.ddc.
  3. Restart the MINE process.
  4. Restart the APPLY process once MINE has been restarted
  5. Start console and verify that the settings have been updated with the show variable_name command.

No changes are required for APPLY. The changes made in the MINE.ddc take affect in the DDC DB (database) so that applies to both MINE and APPLY.

Mike Donovan Answered

Please sign in to leave a comment.