Anomaly Detection Security

Abstract droplet with camera inside floating in a larger fluid. Made with AI.

Consultant | Civil Government Client

Summary

In the high-stakes world of cybersecurity, insider threats—whether intentional or accidental—pose unique challenges. Unlike external attacks, they exploit legitimate access, often blending into routine activity. My work centered on building a novel anomaly detection system to safeguard a data warehouse housing sensitive information like SSNs, financial records, and health data, where even a single breach could be catastrophic.

Traditional approaches relied on rigid event-based models, but the dynamic, fluid nature of user behavior demanded a fresh strategy. We developed a model-based solution using statistical anomaly detection and unsupervised learning, training the system to define “normal” behavior and flag deviations. This required engineering robust features from system logs, balancing sensitivity with precision, and refining thresholds through iterative feedback from security teams.

However, the true challenge lay beyond the algorithm: earning trust. Security teams, reliant on intuition and context, needed clear, interpretable insights. We prioritized transparency, avoiding “black box” models and enabling users to understand how risk scores were calculated. Through training, documentation, and a hybrid approach combining automated analysis with human judgment, we bridged the gap between technical complexity and operational trust.

The project underscored the importance of collaboration, adaptability, and ethical design. While the model showed promise, its success hinged on aligning technology with human expertise—a lesson that will shape future work in AI-driven security. The result? A system that doesn’t just detect threats but empowers teams to act with confidence, blending innovation with practicality.

Impact

Developed an unsupervised ML model to detect subtle insider threats in high-stakes environments with millions of sensitive records, overcoming limitations of traditional tools.
Prioritized model transparency via feature importance ranking and training, enabling non-technical teams to trust and contextualize risks without relying on simplistic heuristics.
Combined ML with event-driven rules in a hybrid framework, balancing automation for complex patterns with urgent threat alerts to improve trust and adaptability.

Tech Stack

python (pandas, numpy, scikit-learn, CVXOPT)
SQL (Postgresql)
Tableau

Read the full story here on my Substack.