Pinned Repositories
3D_Point_Cloud_hole_repair_filling
A software for 3D point cloud data hole repair.
add-noise
add noise of a certain SNR to audio files
aiyinyue
alexa-printer-backend
Service that bridges voice assistants with IoT clients over AMQP.
android-webrtc-vad
webrtc-vad(单独抽取webrtc中的vad模块,编译成so库移植android平台使用)
ApproxMVBB
Fast algorithms to compute an approximation of the minimal volume oriented bounding box of a point cloud in 3D.
ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
awesome-3D-vision
3D computer vision incuding SLAM,VSALM,Deep Learning,Structured light,Stereo,Three-dimensional reconstruction,Computer vision,Machine Learning and so on
PRIDNet
Code for the paper "Pyramid Real Image Denoising Network"
Speaker_Verification
Academic Project for Speech and Speaker Recognition course DD2119
MingmChen's Repositories
MingmChen/cnn
基于Java实现CNN,并附MNIST和语音(MFCC特征)性别识别示例。
MingmChen/EECS481-Pupil-Tracking
EECS445 Project - Pupil Tracking
MingmChen/Emotion-Detection-in-Videos
The aim of this work is to recognize the six emotions (happiness, sadness, disgust, surprise, fear and anger) based on human facial expressions extracted from videos. To achieve this, we are considering people of different ethnicity, age and gender where each one of them reacts very different when they express their emotions. We collected a data set of 149 videos that included short videos from both, females and males, expressing each of the the emotions described before. The data set was built by students and each of them recorded a video expressing all the emotions with no directions or instructions at all. Some videos included more body parts than others. In other cases, videos have objects in the background an even different light setups. We wanted this to be as general as possible with no restrictions at all, so it could be a very good indicator of our main goal. The code detect_faces.py just detects faces from the video and we saved this video in the dimension 240x320. Using this algorithm creates shaky videos. Thus we then stabilized all videos. This can be done via a code or online free stabilizers are also available. After which we used the stabilized videos and ran it through code emotion_classification_videos_faces.py. in the code we developed a method to extract features based on histogram of dense optical flows (HOF) and we used a support vector machine (SVM) classifier to tackle the recognition problem. For each video at each frame we extracted optical flows. Optical flows measure the motion relative to an observer between two frames at each point of them. Therefore, at each point in the image you will have two values that describes the vector representing the motion between the two frames: the magnitude and the angle. In our case, since videos have a resolution of 240x320, each frame will have a feature descriptor of dimensions 240x320x2. So, the final video descriptor will have a dimension of #framesx240x320x2. In order to make a video comparable to other inputs (because inputs of different length will not be comparable with each other), we need to somehow find a way to summarize the video into a single descriptor. We achieve this by calculating a histogram of the optical flows. This is, separate the extracted flows into categories and count the number of flows for each category. In more details, we split the scene into a grid of s by s bins (10 in this case) in order to record the location of each feature, and then categorized the direction of the flow as one of the 8 different motion directions considered in this problem. After this, we count for each direction the number of flows occurring in each direction bin. Finally, we end up with an s by s by 8 bins descriptor per each frame. Now, the summarizing step for each video could be the average of the histograms in each grid (average pooling method) or we could just pick the maximum value of the histograms by grid throughout all the frames on a video (max pooling For the classification process, we used support vector machine (SVM) with a non linear kernel classifier, discussed in class, to recognize the new facial expressions. We also considered a Naïve Bayes classifier, but it is widely known that svm outperforms the last method in the computer vision field. A confusion matrix can be made to plot results better.
MingmChen/exposure-fusion
Exposure Fusion in Matlab
MingmChen/eye-gaze
Repository for Eye Gaze Detection and Tracking
MingmChen/face-landmark-android
Android AR Camera
MingmChen/halideraw
Basic example of reading and processing RAW files in Halide
MingmChen/hdrnet
An implementation of 'Deep Bilateral Learning for Real-Time Image Enhancement', SIGGRAPH 2017
MingmChen/He-or-She
Pattern Recognition: Detect gender of the speaker based on various features like MFCC, pitch, short-time energy, energy entropy, zero-crossing rate and spectral centroid. Also contains an android application to detect gender based on the pitch of a person's voice. Uses different kinds of filters (mean, median, mode). The former uses a machine learning approach while the latter is a more direct and naive approach.
MingmChen/hybridblurdetector
This project aims to detect blur in images along with a coefficient of confidence, i.e. the extent upto which an image is blurred.
MingmChen/Mesh-processing-library
C++ library and programs that demonstrate mesh processing techniques in computer graphics published at ACM SIGGRAPH in 1992–1998
MingmChen/OctreeBSplines
Implicit Hierarchical B-Splines Surface Reconstruction based on Octree Distance Field
MingmChen/RandomizedRedundantDCTDenoising
real-time denoising
MingmChen/rokid-blacksiren
Rokid语音服务中,针对嵌入式设备环境的前端降噪模块
MingmChen/sonic-ndk
Android NDK wrapper for libsonic
MingmChen/Speaker-Recognition-Program
This program recognizes the speaker from a pre-existing database with 90% accuracy.
MingmChen/speaker-verification
Implementation of state of the art d-vector approach for speaker verification
MingmChen/SpeakerVoiceIdentifier
SpeakerVoiceIdentifier can recognize the voice of a speaker by learning.
MingmChen/speech-mfcc
基于MFCC语音特征提取和识别
MingmChen/SPP-2.26.2017
MingmChen/tensorflow-wavenet
A TensorFlow implementation of DeepMind's WaveNet paper
MingmChen/vad
VAD(voice activity detection) implement and using for baidu voice recognition