Decreasing the overhead on the Garbage Collection of SSD by using Machine Learning(put it into practice by MQSim)
We use MQSim which is the SSD simulator by CMU to reach our goal. You can see information and parameters in MQSim by clicking the link.
When the free space is lower than the threshold of GC, the SSD keeps doing GC operation. Moreover, there may have lots of valid pages of the victim block that is selected by the GC policy. The GC operation will take lots of time to copy the valid pages. Also, the read/write commands will be pended when the controller keeps doing GC operation. In the thesis, we want to apply the machine learning method to the GC mechanism. Collect the data in the FTL of SSD, data selection, data preprocessing and train the data by machine learning method. The machine learning model controls the GC mechanism and triggers the GC based on the prediction of the model. It is more flexible to trigger the GC than the original method that is triggering by the threshold. After applying the machine learning to trigger the GC operation, the GC operation can be delayed. It makes the valid pages more possibility to be invalid pages. Reducing the execution count and overhead of the GC.
we also set two thresholds such as soft threshold and hard threshold. When the free space is less then the hard threshold, the garbage collection should be triggered immediately because the hard threshold means the least available space. When the free space is less than the soft threshold, we will ask the GC detector (that is trained by the supervised learning method) to predict whether the garbage collection should be executed.
- Just following the instruction on MQSim website( parameters setting).
- Choose the I/O Block trace you want(you can find camelab_trace or IOTTA
- In main.cpp, you can find
fs.open("output.txt", fstream::out);
it can document your training data that you want into output.txt - In /ssd/GC_and_WL_Unit_Page_Level.cpp, the Check_gc_required() function. Once the write command comes, the
Check_gc_required()
will be triggered. You can modify the code to get your machine learning features in FTL.
In /ssd/GC_and_WL_Unit_Page_Level.cpp
else if ( free_block_pool_size < 102 ) {
} // else if
else if ( predict_one_C(obj, Simulator->Time() , \
percent_of_invalid, percent_of_valid, percent_of_free, block
} // else if
else // predict don't execute gc and still have more than op space
return;
Please comment this code while you're collecting the training data
- You can see here
- Install Cython
pip3 install Cython
- See model.pyx, you should load you machine learning model in PyClass init object.
class PyClass(object):
def __init__(self):
# load your machine learning model here
def predict_one(Feature in MQSim)
# you can do data processing here and return the prediction of model
cdef public object createPyClass():
return PyClass()
# Convert your Python obj to C++ obj
cdef public int predict_one_C( object p, data_type your features ... )
return p.predict_one( your features ... )
# Your python function can be called by C++
- Build your extension setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize('model.pyx'))
# build your model.pyx
python3 setup.py build_ext --inplace
then you can get model.c, model.h, model.so
-
Use Machine Learning Model in MQSim
- I have already added
extern PyObject * obj
in global.h, and you can call your obj in anywhere. - In main.cpp
// Cython PyImport_AppendInittab("model", PyInit_model); Py_Initialize(); PyImport_ImportModule("model"); obj = createPyClass();
let py pointer link to extern C++ pointer.
- In /ssd/GC_and_WL_Unit_Page_Level.cpp , put features in MQSim to your machine learning model and do some extra operation based on your prediction.
// Free block space < Hard Threshold else if ( free_block_pool_size < your Hard Threshold ) { } // else if else if ( predict_one_C(obj, your features ) } // else if else // predict don't execute gc and still have more than op space return;
you should uncomment this code in order to do the GC based on your prediction.
- I have already added
- Check your python path and version
INCLUDES := $(addprefix -I,$(SRC_DIR)) -I/usr/include/python3.5m -I/usr/include/python3.5m
- Check every cpp python path
include <python3.5m/Python.h>
- Link .so (Must be your file afer
python3 setup.py build_ext --inplace
)
MQSim: $(OBJ)
$(LD) $(CC_FLAGS) $^ -o $@ -lpython3.5m -lpthread -ldl -lutil -lm ./model.cpython-35m-x86_64-linux-gnu.so
-
If you want to record the training data by MQSim your can delete
./model.cpython-35m-x86_64-linux-gnu.so
-
After
make
command, you will get MQSim.exe
./MQSim -i ssdconfig.xml -w workload.xml