Idea and approach
Main idea was to find mouth on image and try to use convolutional neural network to classify emotions on images. However, lack of data and short deadlines changed approach to something simpler. It was decided to use landmarks of characteristic parts of mouth and based on features that was extracted from those landmarks classify emotions with some simpler methods such as Random forest, Naive Bayes, Decision Tree and K-Nearest Neighbour.

Project workflow main steps:

  • Collecting the dataset
  • Detect mouth landmarks
  • Feature extracting
  • Training the alghorithm
  • Statistic evaluation on algorithms
  • Building application


First step in workflow was to gather data for algorithm training. This task was done in two different steps:

  1. Searching and downloading existing datasets that contains face images and emotion labels.
  2. Manually searching and labeling images.
Existing datasets
There are several existing datasets that contains face images and emotion labels for them, yet only one dataset was available to download for free. Available dataset can be downloaded from the following website . It contains 139 labeled images in eight emotions categories. Among that categories are also 4 categories that are detected in this project:
  • Anger
  • Happiness
  • Sadness
  • Normal resting face
After removing images that doesn’t contain required emotions only 90 images was left from this dataset. Since this number of images was not enough, other images were needed to be acquired manually.

Manually searching and labeling images
With process of manually searching and labeling images another 110 pictures were collected making projects dataset contains 200 images in total. Process of manually searching and labeling images is simple but laborious: seeking for images on google, downloading them and correctly labeling them.


Mouth detection
Mouth detection is a crucial part of this project. After searching the web, we agreed to use face recognition with 68 landmarks from dlib library. Dlib library uses Histogram of Oriented gradients algorithm which was trained on iBUG 300-W dataset. More about histogram of oriented gradients can be found here. After detection of all 68 landmarks, only landmarks from 48 to 68 were recorded because those landmarks describe mouth. Next step is to see how this 20 landmarks describe each emotion and extract features that do the best distinguish between them.


  1. Area - detect opened mouth
  2. MSE - detect opened mouth
  3. Curvativie 1. - detect smile mouth
  4. Curvativie 2. - detect sad mouth
  5. Corners - detect corners for neutral or sad
  6. General shape - detect sad or neutral or angry
  7. Scaled distances - detect distances
Based on this features, algorithm were trained.


All coding in this project was done in program language Python (v 2.7.12). Main reason is that Dlib library for mouth feature extraction is supported in Python, C and C++ program languages. Taking into consideration short project deadline, Python was better choice than C and C++. (All source code is available at GitHub)

During the project several program scripts were created:

  1. webcamDetector.py
  2. extractFeatures.py
  3. trainAlgorithms.py
  4. testIt.py
  5. statistics.py
  6. confusionPlot.py

This is the main application of project that uses all knowledge that project provided. Main goal of this application is to detect emotions of multiple humans from real time webcam stream. Application requires tree files:

  1. “features.txt” - file containing features for detection algorithm training
  2. “labels.csv”- file containing labels for detection algorithm training
  3. “shape_predictor_68_face_landmarks.dat” - file conating landmarks for mouth detection

This script generate “features.txt” file. It uses images as input, detect mouth on them and then calculate features of the mouth. Features are saved to file for each image in one line separated by commas. Script requires file “shape_predictor_68_face_landmarks.dat”.

This script trains algorithms and writes statistics. Algorithms trained are Random forest, Decision tree, Naive Bayes and K-nearest neighbours. Their performances are averaged on 100 tests on dataset that is splitted in ratio 80:20 (training:test). Also performance statistic is written for each and every algorithm. Script requires “features.txt” and “labels.csv” files.

This is a simple application to test human performance on images of mouths. Input files are images, emotion labels for images and “shape_predictor_68_face_landmarks.dat” file. Application crops the mouth from the image, displays it to the subject and writes his answer to simple “<number of tester>.csv” file. Answers are separated by commas. The correct answers for the images are written in first line of file.

This scripts takes subjects’ and machines’ answers and generate confusion matrices for them. Subjects’ answers are provided in the “.csv” files generated by the “testIt.py” script. However machines’ answers are calculated ad hoc (“features.txt” and “labels.txt” files must be provided).

This script plots provided confusion matrix.

Statistics and conclusion


There are two major statistical evaluations in this project:

  1. Evaluation of the algorithms.
  2. Evaluation between subjects’ performance and machines’ performance.

Evaluation of the algorithms
Script “trainAlgorithms.py” provides accuarcy for each of the four trained algorithms: Random forest, Decision tree, K-nearest neighbours and Naive Bayes. Following graph provides their average results 100 training cycles: As it can be noticed, the best algorithm is Random forest. Therefore, Random forest is used in webcamDetector application and for further evaluation against subjects’ performance. Confusion matrix for Random forest algorithm is shown on next graph:

Evaluation between subjects’ performance and machines’ performance
Evaluation between subjects’ performance and machines’ performance is done between 7 test subjects on 19 images that express all 4 different emotions and Random forest algorithm that was previously trained. In next histogram it is shown that subjects performed roughly 6% better than the machine. Also here are confusion matrices for subjects’ performance and machines’ performance:


There are three reasons why algorithm did not perform well:
  • Small dataset, probably if dataset was bigger, algorithm will generalise better.
  • Task is really hard, even human struggled on detecting emotion from the mouth images.
  • Features that are describing mouth are made up really quickly and on spot. Therefore, there might be some other features that will distinguish between emotions even better.
Sad: Angry:


To improve detection of emotions based on mouth shape it is important to gather more data and extract features that are based not only on shape but also on the colour. Convolutional neural network could be a good solution. Also proposed solution for problem to detect emotion for talking person lays into measuring emotions during several frames of video and than calculate average emotion. People never use emotion only for a glimpse of a second.


Franko Hržić

University of Rijeka Faculty of Engineering

Martin Polzhofer
Medical University of Graz

Szilveszter Domany
University of Szeged, Hungary
Computer Science MSc



This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6


Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.


i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;

print 'It took ' + i + ' iterations to sort the deck.';



  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.


  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.


  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.





Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99


Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99


  • Disabled
  • Disabled