Face recognition attendance system

Face recognition attendance system

As part of an R&D project, Itransition developed a face recognition PoC with a 99%+ accuracy and successfully integrated it with our corporate security system.

Case study

Face recognition attendance system

Context

To keep tabs on innovations and expand its technological competencies, Itransition runs a company-wide R&D program. Within this program, the company supports its employees’ research initiatives, giving the necessary resources to discover, analyze and test innovative tools and methodologies, build prototypes, and document the gained experience. The accumulated knowledge from 30+ R&D projects initiated by our employees is then applied in software production.

A team of Itransition’s employees interested in machine learning and computer vision offered to develop a real-time face recognition employee attendance system to validate the staff coming in and out of Itransition’s office buildings.

On the one hand, the application would support facility access management, on the other hand, it would enable employees to access the offices without checking in with their ID cards. At the same time, this R&D project promised to be a valuable contribution to our software production knowledge base that could be further applied in machine learning consulting provided by Itransition to its clients globally.

Despite a variety of open-source face recognition frameworks available, there was no ready-made solution to implement. The available algorithms processed only high-resolution static shots and performed insufficiently, with only 1-5 FPS and an 80-95% accuracy. Our goal was to use the findings of the global development community, customize and extend the existing implementations, and deliver a proof of concept (PoC) that would provide a near 100% accuracy of face recognition, in real time.

R&D process

Using a proprietary evaluation system, Itransition’s R&D team researched and compared 40+ combinations of open-source technologies, including face detectors, aligners, and encoders, network architectures and mathematical methods of image comparison. The technologies that met performance and accuracy requirements under the tests were selected for the PoC.

Itransition’s engineers arranged the R&D process applying the best practices of ML system investigation, scoring and development.

The process comprised the following stages:

R&D process stages

Vision definition

First, our team visualized the workflow of the future face recognition attendance system. The solution captures employees with a web camera, detects and aligns faces into bounding boxes, encodes them, and combs through the pre-loaded reference photo base to find similar faces. After calculating the similarity indexes of the recognized and reference faces, the application lists the results from the most to least matching faces.

App's flow

Then, we defined the major modules—detectors, encoders, and verifiers—and developed a pluggable architecture that allowed us to switch these building blocks and experiment with various tools and their combinations.

The app’s general logic

Evaluation system

Itransition developed a technology evaluation system for qualitative analysis of the continuously evolving machine learning algorithm.

We collected a reference set of 2,000+ employees' photos, segmented and labelled them with the employees’ names using custom tools. Our team installed a web camera at the office entrance to aggregate test data while maximizing the similarity of test shots with the employees’ real faces. Finally, we defined performance metrics and developed an output script that calculated the success and failure rates of image recognition based on the metrics.

Investigation & PoC development

The team carried out technology research, analysis, and probation while developing the algorithm. From the Stanford University reports and lectures to GitHub projects, we sourced and applied the latest know-how, ready-to-use frameworks, pre-trained neural networks, and pre-configured network architectures to find the most suitable technologies. Our requirements included high accuracy, low level of false positives, maximum speed, and ability to process video frames in real time with a moderate consumption of GPU.

We brought together various technology implementations as modules with the same interface, each time testing them against the evaluation script. This approach allowed us to see the progress and the regress, and discover bottlenecks in each iteration.

Detectors

  • Face recognition
  • DLib CNN
  • DLib HOG
  • Faced
  • TensorFlow face detector
  • OpenCV

Encoders

  • Face recognition DLib encoder
  • VGG16 face encoder
  • Keras Facenet encoder
  • OpenFace
  • Keras VGGFace etc.

Verifiers

  • Euclidean distance
  • Cosine similarity
  • DNN

Apart from traditional math-based vector comparison methods, we investigated the deep learning approach to facial recognition. Our team trained a deep neural network (DNN) using the open-source VGGFace2 photo set, however, with the same results as we got when using the Cosine Similarity approach.

In order to achieve a higher accuracy of the recognition algorithm, the DNN required a lot more training with larger datasets, so we excluded it from our PoC development pipeline.

Our experiments resulted in the following high-performing combination: a TensorFlow model with the YOLO algorithm for face detection, no additional face alignment, the ResNet50 network for face coding, and the improved Cosine Similarity for verification.

Technologies
Purpose
KerasFor wrapping deep learning layers and operations into neat building blocks
TensorFlow + YOLOFor real-time face detection
CuPyFor mathematical calculations
OpenCVFor image processing
PyQT5As a GUI for the demo

PoC beta-testing and improvement

Once the logic was built, we developed a full-screen demo of the face recognition employee attendance system and deployed it at one of Itransition’s offices for beta-testing. Our employees acted as crowdsourcing testers and provided us with the feedback on the recognition accuracy. For this purpose, we added a visual interface so that the employees could clearly see whether they are recognized by the solution or not.

To get field performance metrics, our team developed a script collecting and categorizing live camera shots, recognition data, and the app screenshots for further analysis.

The app tracks real-time webcam footage and indicates the name of the detected employee, as well as three other people with similar facial features.

Within the beta-testing period, we were improving the algorithm’s accuracy and processing speed by aligning faces vertically, filtering blurry pictures, and interpolating the values produced by the validation function. To accelerate image processing, our team transferred all the calculations to a video card and separated image detection, encoding, and verification threads.

PoC evolution

At the next step, we integrated the face recognition attendance solution with Itransition’s corporate security system, namely with video surveillance and electronic locks at the office entrance.

Version 1

We put our experimental solution onto System on Chip (SoC) hardware to make its operation independent from the rest of the infrastructure.

The characteristics of the solution were the following:

  • Could be placed near the entrance with minimal installation efforts
  • Requires minimal changes to the network configuration
  • Supports 2 video streams from entrance monitoring IP cameras
  • Is as affordable as possible

For the hardware, we chose NVIDIA Jetson Nano that showed better performance than general-purpose SoCs (like Raspberry Pi or ASUS Tinker Board). However, to improve the solution’s performance further, we made the following significant changes:

  • We applied several memory consumption optimizations to stay within the hardware limitations.
  • We combined the frames from two video streams into one side-by-side image before passing them for face detection to double the processing speed.
  • We increased the confidence threshold to prevent false-positive results.

As a result, the performance reached 6-8 FPS on two video streams with all the decoding and processing steps performed on the $99-worth SoC. With this solution in place, Itransition’s employees could enter the office using either Face ID or proxy cards.

Version 2

The “synthetic” performance of Jetson was above average; however, since many video frames were blurred or had face positioned at the wrong angle, multiple negative samples could be returned before the appropriate frame was captured. With 6-8 FPS, the probability of missing such frames was quite high while the delay could be from 1 to 3. Therefore, we decided to move forward with GeForce RTX 2070, which was more powerful.

After moving to the new hardware, the solution could process every single video frame. The delay was reduced from 1-3 seconds to ~250 milliseconds with asynchronous processing of each video stream. The hardware capacity allowed us to add a tablet displaying enhanced video that was re-streamed from the server. We also rewrote the server on AIOHTTP, added WebRTC support, and split the server into several microservices.

Version 3

For the best recognition results, users were to look straight into the camera for 0.5-1 second. To attract their attention to this requirement, we mounted a tablet below the camera displaying the video stream with real-time face detection that also signaled if the user’s face was recognized or not. If successful, the tablet would display the employee’s name, photo and latest enter/exit events.

Our designers created a sleek interface for the display, with the tablet put behind a branded aluminum and plexiglass case.

As the next step, we decided to use the tablet’s 8-megapixel front camera to capture the video instead of standalone IP-cameras. This would allow running basic face detection on the client side (the tablet) with only the frames with detectable faces sent to the server, which reduced the bandwidth requirements and minimized the server load.

The browser application

To make the solution portable, we decided to implement it as a browser application built with JavaScript.

After briefly considering the face-api.js library, we went for the PICO algorithm, which is based on decision trees and is capable of processing to 250 FPS. Using PICO, we managed to achieve the satisfactory result of 200-230 FPS. To mitigate the problem of the algorithm missing faces on images taken at more than 15 degrees and thus making the animation choppy, we interpolated the animation by calculating the estimated intermediate coordinates of the face on the skipped frames. Having combined PICO on the client side with Tensorflow + ResNet on the server side, we were able to make the client UI smoother and more interactive while reducing server and network utilization.

Results

During this in-house R&D project, we achieved a 99,8% accuracy of the face recognition algorithm. Less than 1% of images were recognized incorrectly because of extreme blurring, so our team continues improving the algorithm.

Our face recognition attendance system achieved the performance of 20-30 FPS with a Geforce RTX 2070 Graphics Card, while similar solutions on the market show 1-5 FPS at most.

Integrated with our internal security system, the developed PoC can recognize faces, compare them to those added to whitelists/blacklists, and notify the security service in case of an unauthorized access attempt. Beyond Itransition, the system can be used in offices, schools, and other organizations. It can work either as a standalone solution or as an addition to an ID-based security system to verify the authenticity of ID holders by comparing the user’s face with the ID information.

Thanks to this initiative, Itransition gained more experience in training machine learning models to make correct decisions across large sets of imperfect data, and added one more computer vision project to its portfolio.