Hadoop
-
setup hdfs
-
create directory in hdfs as support bundles.
Data preprocessing
-
mount bugzilla attachment nfs share bugs.eng.vmware.com:/ifs/sjc-bugs/bugs
-
Input should be PR numbers, then we scan the bugzilla directory for support bundle files.
-
Support bundle file is extracted using python gzip lib. sample code.
import gzip f = gzip.open('file.txt.gz', 'rb') file_content = f.read() f.close()
-
append the file we need to hdfs.
some other things need to consider??? ignore the duplicate support bundle.
-
for esxi host, A id number is added as the tag at the head of each line. :aaaaaa
-
for virtual machine, A host id and virtual machine id is added. ::aaaaaaa
-
two relation database tables are used to store the information of host id and vm id.
UI data Input
Data Input, range of PR numbers? 1111-19999
UI Qurey Input:
-
Congfiguration query?
file: /etc/vmware/config configuration: vhv.enabled
-
log query:
file: /var/log/vmkernel.log log pattern: "error xxxx"
UI Query result display
configuration query
-
PIE chart
-
BAR chart
log query
For esxi host display
host namme | bugzilla id | occurance times
For vm display
vm name | host name | occurance times