H&E Cell Classification: A Domain-Informed Machine Learning Approach

Project Summary

This project develops a machine learning solution to classify tumour and immune cells in H&E-stained pathology images, with the goal of supporting faster and more consistent clinical decision-making.

I evaluated five models of increasing complexity, from a simple pixel-based baseline to deep learning approaches. The results show that raw pixels and texture-only features are insufficient, while domain-informed colour features—directly aligned with H&E staining biology—provide the strongest signal.

The final recommended model, Colour Histogram + SVM, achieved 95% accuracy and 96% tumour sensitivity, while remaining lightweight, interpretable, and deployable without specialised hardware.

This project demonstrates a key insight: when strong domain knowledge is available, simple and interpretable models can outperform complex deep learning systems—especially in small medical datasets.