License: (CC 3.0) BY-NC-SA
What is Apache Hadoop?
The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
Single Point Failure
dual-system hot backup
Facebook AvatarNode: active-active mode
- hdfs client use virtual ip add to access AvatarNode
- if primary AvatarNode fail, use standby AvatarNode
- standby AvatarNode set access mode to safe mode (read-only), use NFS to merge FsImage and EditLog commited by primary AvatarNode, finally the virtual ip address is set to standby AvatarNode.
Trouble Shoot
ssh refuse
apt-get install openssh-server
ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
ssh localhost
re-format filesystem aborted
hadoop namenode -format require a unexist dir, remove the prompt dir rm -rf
Does not contain a valid host:port authority
fs.default.name is not set correctly, for me, is just a type error