java MongoDB SDK故障切换不工作
我已经使用三台机器(192.168.122.21、192.168.122.147和192.168.122.148)设置了一个副本集,并且我正在使用Java SDK与MongoDB集群交互:
ArrayList<ServerAddress> addrs = new ArrayList<ServerAddress>();
addrs.add(new ServerAddress("192.168.122.21", 27017));
addrs.add(new ServerAddress("192.168.122.147", 27017));
addrs.add(new ServerAddress("192.168.122.148", 27017));
this.mongoClient = new MongoClient(addrs);
this.db = this.mongoClient.getDB(this.db_name);
this.collection = this.db.getCollection(this.collection_name);
建立连接后,我会多次插入一个简单的测试文档:
for (int i = 0; i < this.inserts; i++) {
try {
this.collection.insert(new BasicDBObject(String.valueOf(i), "test"));
} catch (Exception e) {
System.out.println("Error on inserting element: " + i);
e.printStackTrace();
}
}
在模拟主服务器的节点崩溃(断电)时,MongoDB群集执行成功的故障切换:
19:08:03.907+0100 [rsHealthPoll] replSet info 192.168.122.21:27017 is down (or slow to respond):
19:08:03.907+0100 [rsHealthPoll] replSet member 192.168.122.21:27017 is now in state DOWN
19:08:04.153+0100 [rsMgr] replSet info electSelf 1
19:08:04.154+0100 [rsMgr] replSet couldn't elect self, only received -9999 votes
19:08:05.648+0100 [conn15] replSet info voting yea for 192.168.122.148:27017 (2)
19:08:10.681+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
19:08:10.910+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
19:08:16.394+0100 [rsMgr] replSet not trying to elect self as responded yea to someone else recently
19:08:22.876+.
19:08:22.912+0100 [rsHealthPoll] replset info 192.168.122.21:27017 heartbeat failed, retrying
19:08:23.623+0100 [SyncSourceFeedbackThread] replset setting syncSourceFeedback to 192.168.122.148:27017
19:08:23.917+0100 [rsHealthPoll] replSet member 192.168.122.148:27017 is now in state PRIMARY
客户端的MongoDB驱动程序也可以识别这一点:
Dec 01, 2014 7:08:16 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: Read timed out
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.SocketTimeoutException: connect timed out
Dec 01, 2014 7:08:36 PM com.mongodb.DBTCPConnector setMasterAddress
WARNING: Primary switching from /192.168.122.21:27017 to /192.168.122.148:27017
但它仍在尝试连接到旧节点(永远):
Dec 01, 2014 7:08:50 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException - message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
.....
Dec 01, 2014 7:10:43 PM com.mongodb.ConnectionStatus$UpdatableNode update
WARNING: Server seen down: /192.168.122.21:27017 - java.io.IOException -message: couldn't connect to [/192.168.122.21:27017] bc:java.net.NoRouteToHostException: No route to host
数据库上的文档数从主文档失败和次文档变为主文档的那一刻起保持不变。以下是该过程中来自同一节点的输出:
"rs0":SECONDARY> db.test_collection.find().count() 12260161
"rs0":PRIMARY> db.test_collection.find().count() 12260161
更新: 使用未确认的WriteConcern,它可以按设计工作。插入操作也会在新主机上执行,并且选择过程中的所有操作都会丢失
在WriteConcern已确认的情况下,操作似乎在无限期地等待来自崩溃主机的确认。这可以解释为什么程序会在崩溃的服务器再次启动并作为辅助服务器加入集群后继续运行。但在我的情况下,我不希望驱动程序永远等待,它应该在一段时间后引发一个错误
更新: 在主服务器上终止mongod进程时,WriteConcern确认也按预期工作。在这种情况下,故障切换只需约3秒钟。在这段时间内,不会执行任何插入操作,并且在选择新的主节点后,插入操作将继续
因此,我只在模拟节点故障(断电/网络关闭)时遇到问题。在这种情况下,操作将挂起,直到故障节点再次启动
# 1 楼答案
你的应用程序还能用吗?由于该服务器仍在您的种子列表中,据我所知,驱动程序将尝试连接到该服务器。只要您的种子列表中的任何其他服务器都可以获得主状态,您的应用程序就应该仍然工作
# 2 楼答案
显式指定连接超时值解决了该错误。另见:http://api.mongodb.org/java/2.7.0/com/mongodb/MongoOptions.html