这篇文章主要介绍“hadoop 2.4 namenode源码分析”,在日常操作中,相信很多人在hadoop 2.4 namenode源码分析问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”hadoop 2.4 namenode源码分析”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!
在hadoop nn的HA,对于主备节点的选举,是通过ActiveStandbyElector来实现的。源码上有针对该类的解释。
小弟英文不才,翻译一下。该类主要使用了zookeeper实现了主节点的选举,对于成功选举的主节点,会在zookeeper上创建零时节点。如果创建成功,NN会变成active,而其余nn节点会成备用节点。
下面还是来具体分析一下ActiveStandbyElector类的作用,ActiveStandbyElector主要实现了选举,选举流程主要是通过创建零时节点的方式实现,如果创建成功。可以认为是获取到对应的LOCK,该节点可以成为active。如果没有成功创建该节点,可以认为为standby节点,对于standby节点,需要一直监听该LOCK节点的状态。如果发生节点的事件,就去尝试选举。基本流程就是这样。
下面,来看一下ActiveStandbyElector类的主要方法和流程。对于熟悉zookeeper的同学来说,zookeeper的必须要实现watcher接口,其中可以实现自己的各种事件的处理逻辑。
在ActiveStandbyElector中,采用了
内部类来实现Watcher接口,其process方法,调用了processWatchEvent来实现具体的业务处理。
下面来分析该processWatchEvent的具体逻辑:
//处理zk的事件
synchronized void processWatchEvent(ZooKeeper zk, WatchedEvent event) {
Event.EventType eventType = event.getType();
if (isStaleClient(zk)) return;
LOG.debug("Watcher event type: " + eventType + " with state:"
+ event.getState() + " for path:" + event.getPath()
+ " connectionState: " + zkConnectionState
+ " for " + this);
if (eventType == Event.EventType.None) {
//会话本身的时间,如连接。失去连接。
// the connection state has changed
switch (event.getState()) {
case SyncConnected:
LOG.info("Session connected.");
// if the listener was asked to move to safe state then it needs to
// be undone
ConnectionState prevConnectionState = zkConnectionState;
zkConnectionState = ConnectionState.CONNECTED;
if (prevConnectionState == ConnectionState.DISCONNECTED &&
wantToBeInElection) {
monitorActiveStatus();//监控节点
}
break;
case Disconnected:
LOG.info("Session disconnected. Entering neutral mode...");
// ask the app to move to safe state because zookeeper connection
// is not active and we dont know our state
zkConnectionState = ConnectionState.DISCONNECTED;
enterNeutralMode();
break;
case Expired:
// the connection got terminated because of session timeout
// call listener to reconnect
LOG.info("Session expired. Entering neutral mode and rejoining...");
enterNeutralMode();
reJoinElection(0);//参与选举
break;
case SaslAuthenticated:
LOG.info("Successfully authenticated to ZooKeeper using SASL.");
break;
default:
fatalError("Unexpected Zookeeper watch event state: "
+ event.getState());
break;
}
return;
}
// a watch on lock path in zookeeper has fired. so something has changed on
// the lock. ideally we should check that the path is the same as the lock
// path but trusting zookeeper for now
//节点事件
String path = event.getPath();
if (path != null) {
switch (eventType) {
case NodeDeleted:
if (state == State.ACTIVE) {
enterNeutralMode();//该方法目前未实现
}
joinElectionInternal();//开始选举
break;
case NodeDataChanged:
monitorActiveStatus();//继续监控该节点,尝试成为active
break;
default:
LOG.debug("Unexpected node event: " + eventType + " for path: " + path);
monitorActiveStatus();
}
return;
}
// some unexpected error has occurred
fatalError("Unexpected watch error from Zookeeper");
}
而joinElectionInternal,选举的核心方法就是,
选举就是通过对zkLokFilePath节点的创建,来完成。这个采用了zk的异步回调。
从该类的定义,可以看出,本身就是实现了zk的两个接口。
StatCallback需要实现的方法,如下:
对于两个方法的实现,ActiveStandbyElector内部实现几乎是一样的。这里不再贴上源码,有兴趣的可以自己去看源码。
贴上实现方法,有注释。呵呵
public synchronized void processResult(int rc, String path, Object ctx,
String name) {
if (isStaleClient(ctx)) return;
LOG.debug("CreateNode result: " + rc + " for path: " + path
+ " connectionState: " + zkConnectionState +
" for " + this);
Code code = Code.get(rc);//为了方便使用,这里自定义了一组状态
if (isSuccess(code)) {//成功返回,成功创建zklocakpath节点
// we successfully created the znode. we are the leader. start monitoring
if (becomeActive()) {//要将本节点上的NN变成active
monitorActiveStatus();//继续监控节点状态
} else {
reJoinElectionAfterFailureToBecomeActive();//失败,继续选举尝试
}
return;
}
if (isNodeExists(code)) {//节点存在,说明已经有active,wait即可
if (createRetryCount == 0) {
// znode exists and we did not retry the operation. so a different
// instance has created it. become standby and monitor lock.
becomeStandby();
}
// if we had retried then the znode could have been created by our first
// attempt to the server (that we lost) and this node exists response is
// for the second attempt. verify this case via ephemeral node owner. this
// will happen on the callback for monitoring the lock.
monitorActiveStatus();//不过努力成为active的动作不能停
return;
}
String errorMessage = "Received create error from Zookeeper. code:"
+ code.toString() + " for path " + path;
LOG.debug(errorMessage);
if (shouldRetry(code)) {
if (createRetryCount < maxRetryNum) {
LOG.debug("Retrying createNode createRetryCount: " + createRetryCount);
++createRetryCount;
createLockNodeAsync();
return;
}
errorMessage = errorMessage
+ ". Not retrying further znode create connection errors.";
} else if (isSessionExpired(code)) {
// This isn't fatal - the client Watcher will re-join the election
LOG.warn("Lock acquisition failed because session was lost");
return;
}
fatalError(errorMessage);
}
对于becomeStandby,becomeActive这些状态的改变,有ZKFailoverController来实现。
到此,关于“hadoop 2.4 namenode源码分析”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注天达云网站,小编会继续努力为大家带来更多实用的文章!