Attrition Predictor

Application to predict employee attrition in an outsourced contact-center environment.

Attrition Predictor

🔗 attrition-pred-app.roboteria.io

ℹ️ Only desktop version - not adapted for small screens.

Overview

Purpose

This machine-learning-based application predicts early employee attrition within the first 30, 60, or 90 days after a new project assignment.

Think of the app as a proof-of-concept. It shows how attrition could be predicted, but you’ll want to feed it with your actual workforce data to make the predictions useful.

Industry Context

  • The app is designed for a large outsourcing contact center, where each client (also referred to as a project) is supported by a group of dedicated support employees (agents).
  • Attrition (sometimes called “early churn”) is any case where an agent voluntarily leaves a staffing assignment or the company.
  • Attrition covers both new hires and agents reassigned internally from another project.

⚠️ Responsible use

This model is designed for workforce-planning; for example, deciding whether to slightly over-staff a new project or to allocate extra onboarding support.
It should not be used as an automated “hire / no-hire” filter for individual candidates.

Training Dataset

ℹ️ Data Disclaimer

Data used for this app version is generated for demonstration purposes. Although it simulates reality fairly well, it does not refer to any particular company.

Features

FeatureDescriptionTypeValues
genderEmployee’s gender.numericalfemale: 0
male: 1
fteFull-time equivalent
grouped by categories.
(full-timers vs. part-timers).
ordinal< 0.9 fte: 1
0.9 fte: 2
1 fte: 3
languagePrimary language skillcategoricallist of languages
countryCountry of employee’s residence.categoricallist of countries
employment_daysNumber of days
since the first day at the company
numericalcontinuous: integer
project_dateDate the last project
was assigned to the employee.
date →
numerical
month_sin
month_cos
quart_sin
quart_cos
new_employee“New” if hired fewer
than 10 days ago.
boolean0 / 1
industryClient’s industry.categoricalE-commerce: ecommerce
Consumer Electronics: manufacturer
Video Gaming: gaming
Mobile Gaming: mob_gaming
sizeProject size
by number of agents
ordinalXS (<10 agents): 1
S (10–20 agents): 2
M (20–30 agents): 3
L (30–60 agents): 4
XL (60–120 agents): 5
XXL (>120 agents): 6
channelsNumber of support channels
(grouped).
ordinalsingle channel: 1
two channels: 2
multi-channel: 3
phonePhone-line
among support channel
boolean0 / 1

Descriptive analysis

Time series analysis

Attrition over time

Attrition over time

Monthly attrition seasonality

Monthly seasonality

Quarterly attrition seasonality

Quarterly seasonality

Attrition by country

Attrition by country

Top 3 countries attrition distribution (days)

Country distribution

Top 3 languages attrition distribution (days)

Language distribution

Attrition per FTE category (full-timer vs. part-timer)

FTE distribution

Attrition by industry

Industry

Attrition by project size (in number of agents)

Client size

Top 15 features by importance

Feature importance

Labels

  • 30-days attrition: Employee left voluntarily within 30 days after last project assignment.
  • 60-days attrition: Employee left within 60 days after last project (includes 30-day attrition).
  • 90-days attrition: Employee left within 90 days (includes both 30- and 60-day attrition).

Machine Learning Model

After trying different linear and non-linear classifiers, chosen algorithm is random forest with 1000 n-estimators (trees) and unlimited leaves. Because of the highly imbalanced data (attrition / non-attrition labels are split as 10% / 90%) SMOTE-ENN (combination of SMOTE and Edited Nearest Neighbours) method was applied to training data set.

The target performance parameter was to achieve best recall: ability of the model to recognize attrition among actual attrition cases. At the same time, precision was sacrificed. It means that the model should prioritise recall — catching most true attrition events — even at the cost of additional false alarms.

Model Performance

Metric30-days60-days90-days
Accuracy0.9700.9380.900
Precision0.6830.6570.618
Recall0.9150.8700.831
F1-score0.7830.7490.708
AUC0.9440.9080.872

Confusion Matrices

30-days attrition

Confusion 30d

60-days attrition

Confusion 60d

90-days attrition

Confusion 90d

Tech Details

App Architecture

App Architecture

Tech Stack

  • 🐍 Python 3.10
  • 🧪 Flask 2.2.2
  • 🤖 Scikit Learn 1.1.2
  • 🔣 NumPy 1.23

Versions

  • v1.0.0 (01 Dec 2012)