hdfs-du

Visualize your HDFS cluster usage

  • 所有者: twitter-archive/hdfs-du
  • 平台:
  • 許可證: Apache License 2.0
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

HDFS-DU Build Status

hdfsdu UI screenshot

HDFS-DU is an interactive visualization of the Hadoop distributed file system. The project aims to monitor different snapshots for the entire HDFS system in an interactive way, showing the size of the folders, the rate at which the size increases / decreases, and to highlight inefficient file storage.

HDFS-DU provides the following in a web UI:

  • A tree-map visualization where each node is a folder in HDFS. The area of each node can be relative to the size or number of descendants.
  • A file-tree visualization showing the topology of the file system.

HDFS-DU is built using the following front-end technologies:

Follow @hdfsdu on Twitter to stay in touch!

Examples

Below is a screenshot of the HDFS-DU UI. The UI is made up of two linked visualizations. The left visualization is a tree-map which shows parent-child relationships through containment. The right visualization is a file-tree, which displays two levels of depth from the current selected node in the file system. The file-tree visualization displays extra information for each node on hover.

hdfsdu UI screenshot

You can drill down clicking on nodes in either the tree-map or the file-tree.

There are two possible layouts for the tree-map. In the first layout the area of a node is proportional to the total file size of its descendants. In the second layout the area of a node is proportional to the count of its descendants.

To compute the color of a node, its size, including all its descendants, is divided by the number of those descendants. The color is assigned using this value such that a lighter color means more files for a given size. This helps to highlight inefficient nodes which contain too many small files.

hdfsdu UI screenshot

Quickstart

To get started with hdfs-du, first clone the hdfs-du GitHub repository:

git clone https://github.com/twitter/hdfs-du.git
cd hdfs-du

Next, you can try running the hdfs-du demo on your local machine. The demo
starts a local web server which serves the front-end client resources and sample
data. Start the demo with the following command and then browse to
http://localhost:20000/index.html:

./demo.sh

Running HDFS-DU with your own data

To visualize your own cluster, you need to generate an HDFS-DU data set. Currently this is a
multi-step process:

  • Export the HDFS fsimage file for offline processing with the
    Offline Image Viewer
  • Process the fsimage export with Pig.
  • Post-process the pig data set.

Simplifying this process is certainly possible (hey, it was hack week :)

First, SSH to your secondary name node, dump the fsimage in delimited format, and copy to HDFS.

hadoop oiv -i /path/to/current/fsimage -o fsimage-delimited.tsv -p Delimited
hadoop fs -copyFromLocal fsimage-delimited.tsv .

Now let's process fsimage export. Uncomment the register statement in
pig/src/test/resources/hdfsdu.pig, build the UDF, and run Pig to process the fsimage export.

cd /path/to/hdfs-du/pig
mvn package
pig -param INPUT=fsimage-delimited.tsv -param OUTPUT=hdfsdu.out pig/src/test/resources/hdfsdu.pig

Lastly we need to copy the dataset local and perform a quick post-processing step.

hadoop fs -getmerge hdfsdu.out .
python src/main/python/leaf.py hdfsdu.out/hdfsdu.out > hdfsdu.data

Now we're ready to start HDFS-DU!

./start.sh /path/to/hdfsdu.data

Point your web browser to http://localhost:20000 and see what your
cluster looks like.

How to contribute

Bug fixes, features, and documentation improvements are welcome! Please fork the project and send us
a pull request on GitHub. You can submit issues on Github
as well.

Here are some high-level goals we'd love to see contributions for:

  • Improve the front-end client
  • Create a new back-end for a different runtime environment

Authors

License

Copyright 2012 Twitter, Inc.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

主要指標

概覽
名稱與所有者twitter-archive/hdfs-du
主編程語言JavaScript
編程語言Java (語言數: 7)
平台
許可證Apache License 2.0
所有者活动
創建於2012-08-07 17:52:22
推送於2020-10-13 07:09:39
最后一次提交2019-06-06 18:02:24
發布數1
最新版本名稱v0.1.0 (發布於 )
第一版名稱v0.1.0 (發布於 )
用户参与
星數228
關注者數135
派生數83
提交數5
已啟用問題?
問題數10
打開的問題數10
拉請求數1
打開的拉請求數5
關閉的拉請求數2
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?