bundleanalytics

Hadoop

Data preprocessing

mount bugzilla attachment nfs share bugs.eng.vmware.com:/ifs/sjc-bugs/bugs
Input should be PR numbers, then we scan the bugzilla directory for support bundle files.
Support bundle file is extracted using python gzip lib. sample code.

import gzip f = gzip.open('file.txt.gz', 'rb') file_content = f.read() f.close()
append the file we need to hdfs.

some other things need to consider??? ignore the duplicate support bundle.

for esxi host, A id number is added as the tag at the head of each line. :aaaaaa
for virtual machine, A host id and virtual machine id is added. ::aaaaaaa
two relation database tables are used to store the information of host id and vm id.

UI data Input

Data Input, range of PR numbers? 1111-19999

UI Qurey Input:

UI Query result display

configuration query

log query

For esxi host display

host namme | bugzilla id | occurance times

For vm display

vm name | host name | occurance times

gsliu/bundleanalytics