Predictive analytics in HR

Predictive analytics in HR

October 25, 2021

Independent AI expert

Though it may be too early to say what business is gaining and losing from the surge of remote work scenarios and arrangements that have followed the outbreak of the pandemic, one thing is certain — remote work has brought a previously unthinkable level of digitization into all the interactions that comprise the life-blood of a company and infused HR predictive analytics with an unexpected tranche of data in the corporate space.

Though many organizations have mourned the loss of the 'water-cooler' culture, in-person candidate interviews, real-world scrums, and all the other highly socialized, synergistic advantages of collective action and interaction in office environments, the fact is that these were largely 'analog' experiences and events, with little or no data trail beyond scant meeting notes and shared diary appointments.

Now, with VoIP calls as recordable under typical corporate privacy arrangements as emails and phone calls already were before the pandemic, even meetings are susceptible to analysis — who attended, who spoke, what they said, and what happened afterward are all quantifiable and analyzable facets.

Data-rich communications

Likewise, increased use of collaboration platforms such as Slack and other tools that make up the digital workplace experience can't be side-stepped by casual face-to-face encounters, while the volume of in-house communications by other analyzable channels (not least email) has received a notable boost under varying COVID restrictions and precautions.

Further, AI-driven evaluation techniques that were once difficult to apply in later-stage face-to-face interviews are now easy to use on VoIP channel footage, without the need to unnerve applicants by the presence of external monitoring equipment.

Though we can hardly celebrate the cause of the boosted workplace digitization that is transforming HR predictive analytics, an uncertain future constrains us to derive benefit from it, whether remote work is here to stay or whether these new analytics-driven platforms persist into a later era of in-office work.

Here, we'll take a look at a few of the central areas where predictive analytics software can prove of growing benefit to human resources operations in an organization, from candidate procurement through to human network and employee churn analysis.

Candidate sourcing and filtering

When casting an initial net for possible job candidates, AI will almost certainly be involved in the process. Several machine learning sectors have been brought to bear on this sector of HR, including natural language processing (NLP), computer vision, audio analysis, and expression recognition. 

All of the largest recruitment consultants have availed themselves to some extent of HR software development in one or more of these sectors. For instance, LinkedIn Recruiter uses Gradient Boosted Decision Trees (GBDTs), among many other intensive machine learning approaches, to calculate 'non-obvious' factors that may align a potential recruit with the interests of an employer.

LinkedIn Recruiter architecture

AI head-hunters

Traditionally, candidates have been sourced either by arbitrary responses to job ads or via agencies and employment platforms to which candidates have subscribed. However, AI-based analytics offers more proactive methods for head-hunting prospective employees, so-called passive candidates, 'in the wild'.

HR predictive analytics systems are capable of scouring the web for individuals that are suited to your organization, with a growing number of companies offering AI-based 'broad sweep' candidate retrieval. Such companies not only identify possible candidates from a variety of online sources but also calculate how receptive the person might be to a job offer, and what salary level, perks and secondary considerations might be necessary to bring them into the fold.

For larger companies that are either growing or have higher turnover rates and that require such services on a more regular basis, it's possible to develop in-house AI systems that operate on the same principles. However, it should be considered that the legal status of web-scraping remains in a volatile state, with many of the most fruitful domains blocking or impeding systematic access to all 'bots' except those of the major search engine providers.

Social profile analysis

The extent of the data that's being considered in employee applications is growing: the recruiter platform Arya trawls the internet looking for social network and membership profiles associated with a candidate, seeking potential 'red flags' such as 'extreme political views', content related to drink or drugs, disclosure of confidential information about previous employers, discriminatory comments, and many other possible contraindications for hiring. The system can also make predictions as to the likelihood of an employee leaving the company (see 'Predicting employee churn' below). 

As with automated domain-scraping candidate search systems, live or systematic analysis of social media platforms are subject to site terms and, frequently, protections from the systematic polling that powers them, which may place some restrictions on automated profile analysis both for third-party providers and for the development of in-house systems. 

Natural language processing (NLP) assessment methods

Surveys and questionnaires, a longtime staple of multi-stage candidate assessment, can now be incorporated into NLP systems that can provide a more interactive and responsive study of a candidate, for instance through chatbots.

VoIP-based interview video footage can be analyzed not only in terms of emotion recognition and image-based evaluation (see below), but also parsed into text, and the text analyzed by NLP systems.

The Utah-based HR tech company HireVue performs this kind of text analysis on candidate interviews, combined with various other signifiers of suitability developed by the company's psychological research teams. It's possible to take into account factors such as commute distance and other pragmatic aspects that can affect a potential employee's likelihood to integrate well into the company.

As is an occupational hazard of leveraging predictive analytics in this sector, the company has been investigated for bias and for failing to disclose its use of facial recognition in accordance with the Algorithmic Accountability Act of 2019. So it's important to maintain transparency and full disclosure when developing or outsourcing such assessment systems and to ensure that the disclosed methods fall in line with local and national regulations.

Facial analysis for job candidates

The most enduring long-term strengths for HR predictive analytics at the interview stage are proving to be based on what the candidate says at interview time through automated data collection and analysis of auto-transcribed text from video and audio recording sources. 

Though the COVID-driven move from face-to-face interviews over to VoIP-based procedures has made applying facial analysis far easier, face-based emotion recognition techniques remain controversial, and their deployment should be carefully considered from both a legal and effectiveness standpoint.

HireVue (see above) initially included facial analysis and emotion recognition in its candidate evaluation software, but announced early in 2021 that it was removing this functionality from its offerings following results of an algorithm audit. The company's chief data scientist maintains that in most cases non-verbal data contributes no more than 0.25% of a model's predictive capabilities. 

Following the announcement that IBM, Microsoft, and Amazon would discontinue various implementations of facial recognition and analysis in response to protest over algorithmic racial discrimination, the Co-Director of the AI Now Institute, Meredith Whittaker, compared the use of facial analysis in candidate review to the long-discredited study of phrenology (discerning personality through a person's head-shape), and that systems based around it lack a basis of solid scientific consensus.

Facial analysis is a work in progress, underpinned by various highly-contested theories and may prove to be a more useful tool in later years, after further development and consolidation, rather than in this period.

Workforce evaluation and monitoring

Once a new employee is inducted into the company, it's important to gauge their level of adaptation and pace of development. The data behind predictive analytics for employee performance will likely feature core expectations based on the performance of previous or current employees, as well as comparative analytics that map the employee's individual development.

Determining metrics

Here, it's important that HR predictive analytics systems contain metrics that suit your company. Beyond fundamental matters such as attendance, sick leave and holiday planning coordination, it will be necessary to develop meaningful benchmarks of success that reflect a positive impact on the company.

These metrics can be difficult to establish, since attendance and good productivity are not inextricably linked, and since participation in surveillable platforms (such as Slack or an internal company channel) does not automatically mean that useful and/or timely work is occurring. In fact, it could well mean the opposite.

Certain sectors have intrinsic metrics (such as social media shares obtained in PR organizations, traffic fluctuations in SEO operations, or units produced in industrial settings) that can be hooked into an employee's predictive analytics profile; but where productivity is a more subjective topic, additional work will be necessary to avoid counter-productive metrics that abrade against employee goodwill while providing minimal proven benefit. 

Employee monitoring

Once the metrics are clear of practical and legal obstacles, HR predictive analytics can provide a wide range of insights into the way employees are furthering the company's aims, as well as warning signs that intervention may be necessary.

Simpler machine learning methods, such as K-Nearest Neighbors, are highly applicable to performance analysis and prediction. K-Means Clustering and Decision Trees — two of the oldest methods of machine learning analysis — can also be used to devise predictive analytics systems capable of ongoing performance evaluation.

K-means cluster analysis of employee performance

Meeting analysis

Besides the use of predictive analytics for voice interviews, for instance by analyzing auto-transcription of VoIP-based candidate interviews, it's possible to use machine learning to uncover insights from meetings, whether they took place in the real world or online.

The AI meeting analysis framework Headroom, currently in closed beta, can automate note-taking and uses emotion recognition algorithms to determine what impression a presentation is making on the 'silent' listeners in the room, providing the speaker with real-time feedback in a console window in their screen. The system keys on many factors, including pupil dilation, eyebrow disposition, mouth shape, and other groups of facial landmarks. 

However, it's worth re-iterating the earlier-mentioned warning around face-based analytics systems — all participants should ideally be enthusiastically participating in such systems; and that under a growing tranche of regulations around automated monitoring, it will usually be necessary to disclose their use in advance.

Predicting employee churn

Predictive analytics is a useful tool for determining the likelihood of an employee leaving your organization. It's possible to use generic trans-company datasets to pick out broad trends that apply to various demographics in your workforce, but it's also possible to leverage HR software that can directly deal with the particular trends in your own business sector or, more specifically, your own organization.

Generic data

The IBM HR Analytics Employee Attrition & Performance dataset can help to develop classification models to predict general employee churn. Though now characterized by IBM as a 'fictional data set', the data was gathered by a real-life survey of 1,470 employees. 

In the light of COVID, it should probably be noted that any data set older than one year is unlikely to represent the characteristics of 'remote working' in a meaningful way; and that, at the same time, any new datasets that do feature this characteristic and which do take account of COVID environments, are probably a little immature as of yet, with their long-term applicability subject to changes in working culture as the pandemic develops and (hopefully) abates.

Nonetheless, the data features a useful 1-4 grade marking on core issues such as Education, Environment Satisfaction, Job Satisfaction, and Work-Life Balance, among others, and can provide a useful generic basis for employee attrition forecasting.

Running HR predictive analytics on the IBM data immediately reveals certain universal tendencies, such as the fact that overtime, marital status, monthly income and business travel schedules all have an observable influence on employee departures.

Employee attrition vs marital status

It's possible to run the IBM data against field-matched data from your own company to obtain a thumbnail view of your current retention landscape. A number of other open-source datasets are available with varying volumes of content, and many of these provide a suitable template for generating your own bespoke datasets or experimenting with new analytics frameworks that will eventually be populated by genuine company data.

Manageable predictive analytics for SMEs

It's not always necessary to expensively train a large curated corpus of data in order to run machine learning-based predictive analytics on internal company data. So long as your dataset is consistent and old enough to yield meaningful year-on-year statistical trends, many of the 'lighter' approaches will suit a medium-sized company.

In fact, a research initiative studied a number of ML approaches, some more resource-intensive than others, and ultimately favored the K-Nearest Neighbors (KNN) algorithm. KNN was formulated by the US military in 1951 and is known as 'the lazy learner' since it traverses the entire dataset for a 'nearest neighbor' prediction, instead of needing to actually train a machine learning model to make predictions based on historical data.

KNN is efficient and often features as a component even in more sophisticated and expensive analytics systems — and it may be all you need to get the job done. However, since KNN trawls through every single data point in search of analytical trends, it's unsuitable for complex, high-volume data, such as the thirty-year history of a major global company (though this challenge can be addressed by Principal Component Analysis, an algorithm that can 'slim down' high-volume datasets into broadly representative, much lighter sets).

Other lower-impact approaches can be developed by similarly 'established' machine learning analysis techniques. In 2021, the Academy of Entrepreneurship Journal published a study into an ML-based employee retention framework using basic components such as Support Vector Machine and Random Forest.

Employee retention based on academic qualifications

Using this method, the authors were able to establish retention probabilities based on various interesting factors, such as academic qualifications, contract type, department placement, and even the employee's degree major and degree level.

Three considerations in closing

The pandemic has notably accelerated adoption of HR predictive analytics systems, and while this may be the most promising time ever to adopt reliable and established machine learning algorithms into your own HR department's forecasting systems, it's important to have a clear understanding of what the company is trying to achieve, and how success or failure can be quantified in non-ambiguous ways that lend themselves to analytical systems. This is not a problem that machine learning can solve for you.

Additionally, it's necessary to establish whether the means currently exist to generate the data that will drive your predictive analytics system; though it can be tempting to leverage historical in-house data simply because there are already many years' worth of reports and logs to analyze, it may be better to start fresh with more adept data architectures that will begin to bear fruit in a year or so; and to pay attention to the necessary equipment and methodologies that will constitute a business intelligence pipeline of growing value.

Finally, from a legal standpoint, it's essential to stay on top of the turbulent regulatory environment surrounding employees' rights in relation to workplace monitoring and AI systems; and, as necessary, to take professional advice in regard to protecting the company from legal exposure.