Back

April 2024

ML-based machinery failure prediction

A predictive maintenance framework contrasting feed-forward and sequential ML models to forecast industrial machinery failures, achieving >0.99 AUROC scores with GRU models.


Repo: https://github.com/alex-w-99/ML-predictive-machinery-maintenance

The problem

In industrial settings, unexpected machinery failures can lead to costly downtime, safety hazards, and substantial economic losses. The challenge is predicting when equipment will fail before it actually does, giving maintenance teams time to intervene proactively rather than reactively.

The approach

This project explores how different machine learning architectures handle the temporal complexity of machinery sensor data. Using the Microsoft Azure Predictive Maintenance dataset, which includes a year of hourly telemetry from 100 machines tracking voltage, rotation, pressure, and vibration, I built and compared several predictive models:

  • Logistic regression: A baseline to establish performance expectations
  • Multi-layer perceptrons (MLPs): Feed-forward networks that process summarized time windows
  • Recurrent neural networks (RNNs, LSTMs, GRUs): Sequential models designed to capture temporal patterns

The key insight was treating the problem as a windowed forecasting task: given 24 hours of sensor readings, can we predict if a machine will fail in the next 6 or 24 hours, respectively?

The results

The standout performer was the GRU (gated recurrent unit) model, which achieved validation and test AUROC scores exceeding 0.99, essentially near-perfect classification. Setting the true positive rate to 80%, the model maintained remarkably low false positive rates; see the Model Performance Snapshots section of the repo's README.md, linked here.

Interestingly, the simpler MLP architecture outperformed both RNNs and LSTMs, highlighting an important lesson: more complex models aren't always better. Sometimes thoughtful feature engineering with a simpler architecture beats sophisticated sequential processing, especially when dealing with vanishing gradients over long sequences.

Why it matters

This project demonstrates how machine learning can transform maintenance from a reactive scramble into a proactive strategy. By accurately predicting failures hours in advance, facilities can:

  • Schedule maintenance during planned downtime
  • Reduce safety incidents
  • Extend equipment lifespan
  • Minimize operational disruptions

The techniques explored here apply far beyond industrial machinery. Any domain with time-series sensor data and failure patterns could benefit from similar approaches.