Adversarial ML: Evading Security Controls

Security vendors have spent the last decade embedding machine learning into every layer of the defensive stack. EDR engines classify process behaviour in real time. Email gateways run neural networks against message content and metadata. WAFs apply learned models to HTTP traffic. UEBA platforms build probabilistic baselines around every user and workstation on the network. The implicit assumption is that ML makes these controls harder to evade than their signature-based predecessors. Our red team engagements consistently demonstrate that this assumption is wrong — and that adversarial ML techniques give attackers a systematic, reproducible path through controls that defenders believe are robust.

How Machine Learning Has Reshaped Defensive Security

Understanding where ML fits in the defensive stack is essential before examining how to defeat it. The technology is deployed across four primary control categories, each with distinct model architectures and attack surfaces.

EDR ML engines operate at the endpoint and classify executable behaviour using a combination of static analysis (PE header features, import table entropy, section characteristics) and dynamic analysis (process creation chains, memory allocation patterns, API call sequences). Products from CrowdStrike, SentinelOne, Microsoft Defender for Endpoint, and Carbon Black all use proprietary gradient-boosted or neural network models trained on billions of samples. These models score executables and running processes against a maliciousness probability threshold, triggering alerts or kills when the score crosses a vendor-defined value.

Email security classifiers combine natural language processing on message content with graph-based analysis of sender reputation, domain age, DKIM/DMARC posture, and behavioural patterns such as first-contact frequency and reply-chain anomalies. Microsoft Defender for Office 365 and Proofpoint both publish research indicating their models analyse hundreds of features per message. The attack surface spans every feature the model consumes.

WAF ML rules augment or replace traditional OWASP-based signatures with models trained to detect anomalous request patterns. AWS WAF, Cloudflare, and Imperva all incorporate ML scoring layers that assign risk scores to HTTP requests based on payload structure, request frequency, header anomalies, and deviation from learned application-specific baselines.

UEBA anomaly detection builds statistical models of legitimate user and entity behaviour over time — typically 30 to 90 day training windows — and generates alerts when observed behaviour deviates from the learned baseline beyond a configurable threshold. Splunk UBA, Microsoft Sentinel, and Varonis all fall into this category. The attack surface here is not the model's classification of a single artefact but its accumulated statistical picture of normality.

Adversarial Example Generation Against Malware Classifiers

The academic adversarial ML literature focuses primarily on image classifiers, but the underlying principle — that small, carefully crafted perturbations to input features can cause a model to misclassify — applies directly to malware classifiers. The challenge in the binary domain is that perturbations must preserve functionality. You cannot arbitrarily modify bytes in an executable without breaking it. This constraint narrows the feature space, but does not eliminate the attack.

Feature Space Mapping

We begin every engagement targeting an ML-based EDR by building a model of which features the target classifier weights most heavily. In black-box scenarios where we have no access to the vendor model, we use a shadow model trained on the same class of samples. We submit variants of a known-malicious sample with systematically modified features and observe whether the EDR triggers. This empirical approach lets us map the approximate decision boundary without white-box access.

Common high-weight static features across major EDR vendors include PE section entropy (particularly for packed payloads), the ratio of imported DLLs to total imports, the presence of specific import combinations associated with process injection (VirtualAllocEx, WriteProcessMemory, CreateRemoteThread), debug directory characteristics, and version info resource population. Executables missing version info, with high entropy .text sections and minimal imports, score disproportionately high across most models.

Feature Manipulation Techniques

The most reliable static evasion technique we use operationally is benign feature injection. Rather than modifying malicious functionality, we add features associated with legitimate software. Populating the PE version info resource with credible values, adding a rich import table that mirrors a common benign application category, padding the overlay with data that reduces overall entropy, and embedding a valid code signature from a leaked certificate all reduce model confidence scores significantly.

A representative workflow for a shellcode loader looks like this:

# Stage 1: Establish baseline detection score via sandbox telemetry
# Stage 2: Add benign overlay content to reduce entropy score
python pe_patcher.py --add-overlay benign_data.bin --target loader.exe

# Stage 3: Populate version info resource
python pe_patcher.py --set-versioninfo \
  --product "Microsoft Edge Update" \
  --company "Microsoft Corporation" \
  --version "1.3.185.31" \
  --target loader.exe

# Stage 4: Mirror import table of a known-clean executable
python pe_patcher.py --mirror-imports explorer.exe --target loader.exe

# Stage 5: Validate functionality preserved
wine loader.exe --selftest

For dynamic feature evasion, the target is the behavioural classifier that monitors API call sequences and process relationships at runtime. We address this by fragmenting suspicious API call patterns across multiple execution stages separated by timing delays, interleaving benign API calls between sensitive operations, and using indirect syscalls or direct NT API calls to bypass userland API monitoring hooks entirely. API call sequence models are typically trained on hook-captured data; removing the hooks from the call path degrades the model's visibility to the feature vectors it was trained on.

Observed result: In a recent red team engagement against an organisation running SentinelOne, we reduced the static ML score of a Cobalt Strike reflective loader from a blocked verdict to a clean pass by combining benign overlay padding, version info population, and import table normalisation. No code modification to the actual shellcode was required.

Evasion of ML-Based Phishing Detection

Email security ML models present a different attack surface because the input domain is text and metadata rather than binary structure. The features consumed by phishing classifiers fall into three broad categories: content features (NLP embeddings, keyword presence, urgency scoring, URL reputation), sender features (domain age, SPF/DKIM/DMARC alignment, sending history, look-alike domain distance), and behavioural features (first contact with recipient, reply-chain absence, sending volume patterns).

Content Feature Evasion

High-scoring phishing content features include urgency language ("your account will be suspended"), financial terminology in combination with action requests, and URLs with mismatched display text and destination domains. NLP models score these features against a learned distribution of phishing versus legitimate message content.

We evade content classifiers by semantic substitution — replacing high-weight phishing vocabulary with semantically equivalent but lower-scoring alternatives drawn from the model's learned distribution of legitimate corporate communication. Instead of "Your account has been compromised — click here immediately," we use language patterns characteristic of internal IT service notifications: "The IT team has initiated a mandatory security review for your account. Please complete your profile verification through the employee self-service portal at your earliest convenience."

A secondary technique is context anchoring — embedding the phishing lure within a broader legitimate-looking email thread. Reply-chain analysis is a significant feature in enterprise email security. We craft initial innocuous contact that establishes a thread history, then deliver the payload in a reply to that thread. The model's prior on multi-turn conversations with existing reply chains is substantially lower maliciousness than cold first-contact messages.

Sender Reputation Evasion

Domain age and reputation are among the strongest features in email classifiers. Newly registered domains used for phishing score very high. We address this through aged domain acquisition — purchasing domains registered several years prior that have clean reputations, or identifying recently expired domains with established sending history. We also configure full SPF, DKIM, and DMARC on all infrastructure, since authentication failures are high-weight negative signals that trivially increase classifier scores independent of content.

Important note: Domain acquisition and infrastructure staging for phishing simulation must be conducted within the explicit written scope of the engagement. All techniques described here are used exclusively under authorised red team rules of engagement.

WAF ML Bypass Techniques

ML-augmented WAFs apply model scoring to HTTP requests alongside, or in replacement of, traditional signature matching. The model's input features typically include normalised request URI, query parameter structure, POST body characteristics, request header anomaly scoring, and deviation from learned application-specific traffic baselines. High-confidence attack patterns such as classic SQL injection and XSS payloads are trivially identified; the model's value is in detecting novel or obfuscated variants that signatures miss.

Encoding and Normalisation Abuse

WAF ML models are trained on normalised representations of request data. If the normalisation pipeline the model uses differs from the normalisation the backend application performs, there is a gap in which payloads live that the model never sees but the application evaluates. We systematically probe normalisation differentials by submitting the same payload through different encoding schemes — double URL encoding, UTF-8 overlong sequences, mixed case, null byte insertion, comment injection in SQL contexts — and observing which representations reach the application backend.

# Classic WAF-blocked payload
' OR 1=1--

# Double-URL encoded — tests normalisation depth
%2527%2520OR%25201%253D1--

# UTF-8 overlong encoding of apostrophe
%c0%27 OR 1=1--

# SQL comment fragmentation
'/**/OR/**/1=1--

# Case mixing combined with comment injection
' oR/**/1/**/ =/**/ 1--

Baseline Deviation Minimisation

ML WAF models that learn application-specific baselines are particularly vulnerable to low-and-slow attack patterns that keep per-request anomaly scores below the alert threshold. Rather than a single obviously malicious request, we distribute the attack across many requests that individually appear to be normal application traffic. For SQL injection, this means using boolean-based blind techniques with one logical condition per request, paced to match normal user interaction timing, with payload characters that do not individually deviate far from the learned request structure baseline.

We also exploit model confidence gaps at class boundaries. ML classifiers assign continuous probability scores; there is always a region near the decision boundary where the model's confidence is low and small perturbations flip the classification. We identify this region through iterative probing and craft payloads that consistently land in it.

HTTP/2 and Protocol-Level Evasion

Many WAF ML models were trained predominantly on HTTP/1.1 traffic. HTTP/2's binary framing, header compression (HPACK), and request multiplexing introduce representation differences that some models handle poorly. We test HTTP/2 request smuggling variants and header injection via HPACK encoding as part of every WAF assessment where the target supports HTTP/2.

UEBA Anomaly Detection Evasion Through Behavioural Mimicry

UEBA systems are qualitatively different from the other controls discussed because they operate on aggregated behaviour over time rather than individual artefact analysis. Evading a UEBA system is not about crafting a single undetected payload — it is about operating in a manner that the system's learned model of normality classifies as ordinary user behaviour. This requires understanding what the model considers normal for the specific user or entity you are operating as.

Baseline Reconnaissance

Before taking any post-exploitation action, we spend time mapping the target user's behavioural baseline. Key dimensions include: working hours and access patterns, systems and shares routinely accessed, typical data volumes transferred, applications launched, authentication patterns (MFA cadence, source IPs), and command-line tool usage. This reconnaissance is conducted passively through access to existing logs if available, or through initial low-noise access to observe artefacts like recently accessed files and shell history.

Most UEBA systems flag on a small set of high-signal behaviours: access to systems never previously accessed, large-volume data staging or exfiltration, access at unusual hours, lateral movement to a high number of distinct hosts in a short window, and privilege escalation attempts. Understanding which of these your target baseline would flag allows you to operate entirely within the statistical envelope of normalcy.

Temporal Pacing

UEBA models are sensitive to rate and timing. A user who accesses 500 files in one hour generates a very different anomaly score than a user who accesses the same 500 files over five working days. We pace all post-exploitation activity to match the target user's historical access rate. For data collection operations, this means accepting a longer dwell time in exchange for staying below anomaly thresholds — a trade-off that is always worthwhile in engagements where stealth is a primary objective.

Access Pattern Normalisation

UEBA systems model which resources a user accesses and flag access to resources outside the established peer group. When we need to access systems or data outside the compromised user's typical scope, we do so by pivoting through a second identity whose scope includes the target resource, rather than introducing new access patterns for the original identity. Identity hopping — using one compromised credential to access a resource, establishing a foothold there, and continuing as a new identity appropriate to that context — is a core UEBA evasion technique.

Example from engagement: During a financial services red team, our initial foothold was on a junior analyst account. Rather than accessing the treasury system directly (which would have generated an immediate UEBA alert), we used the analyst account to access a shared project folder that the treasury team also used, identified a service account credential stored in a configuration file, and pivoted to that service account to access treasury systems — keeping both identities within their respective expected access patterns.

Exfiltration Channel Selection

UEBA data exfiltration detectors model outbound data volume against per-user baselines. We avoid triggering volume anomalies by using slow exfiltration over living-off-the-land channels — sending data through services the user legitimately uses (email, OneDrive, SharePoint, GitHub) in volumes consistent with their normal usage. We also use data compression and selective exfiltration to minimise the raw byte volume that must transit the network, targeting specifically the data required to demonstrate objective completion rather than bulk collection.

Methodology for Testing ML Security Controls

Organisations that want to validate the resilience of their ML security controls should structure testing around four phases. We follow this methodology on every engagement that includes ML control assessment.

Phase 1: Control Inventory and Architecture Mapping

Identify every ML-based control in the environment, its vendor, version, and configuration. Determine which model types are in use (signature augmentation vs. pure ML), what training data vintage the models reflect, and whether any models have been fine-tuned on organisation-specific data. Also identify the integration points — where does the ML control sit in the traffic flow, what normalisation does it apply, and what does it pass to downstream controls?

Phase 2: Feature Enumeration

For each control, enumerate the features the model consumes. For commercial products, vendor research papers, patent filings, and conference presentations often reveal significant architectural detail. Build a feature map that distinguishes high-weight features (those that, if absent or manipulated, would significantly alter model output) from low-weight ones.

Phase 3: Adversarial Sample Generation

Generate adversarial samples that systematically probe the feature space. Use both gradient-based methods (where white-box access to the model is available) and query-based black-box methods (iterative feedback from the model's output to estimate the gradient). Validate that adversarial modifications preserve the functional properties of the original input — a malware sample that evades the classifier but no longer executes its payload is not a valid adversarial example for offensive purposes.

Phase 4: Controlled Evasion Testing and Gap Analysis

Execute the adversarial samples against the live controls in a controlled test environment that mirrors production. Document the evasion rate, the specific feature manipulations that succeeded, and the residual detection capability after evasion. Produce a gap analysis that maps each successfully evaded technique to the compensating controls — or lack thereof — that would catch it at another layer. This is where the real value of adversarial ML testing emerges: not in proving that a single control can be bypassed, but in identifying the gaps in defence-in-depth that a real attacker could exploit.

Testing scope note: Adversarial ML testing against live production ML controls requires careful scoping and a controlled feedback loop with the client's security operations team. Model training data can sometimes be inferred from repeated queries, and live testing may generate artefacts that persist in vendor telemetry. We always conduct initial testing against isolated lab instances before any live environment testing.

Defensive Implications and Hardening Recommendations

The techniques described above are not theoretical. We use all of them in operational red team engagements. Defenders who understand the adversarial ML attack surface can take concrete steps to raise the cost of evasion.

Ensemble and layer your ML controls. A single ML model is more susceptible to adversarial evasion than an ensemble of models trained on different feature sets. Combining static and dynamic analysis, and running multiple independent classifiers whose outputs are aggregated, forces attackers to simultaneously evade multiple decision boundaries.
Do not rely on ML as the sole detection mechanism. ML controls should complement, not replace, signature-based and rule-based detection. Adversarially evaded malware that passes the ML engine should still generate alerts from behavioural rules, honeytokens, or network detections.
Retrain models on recent data and adversarial examples. Models trained on data that is 12 to 24 months old are significantly more susceptible to evasion than continuously updated models. Ask vendors about their retraining cadence and whether adversarial training — incorporating known adversarial examples into the training set — is part of the process.
Test your controls adversarially. Vendor detection rates on standard benchmarks do not measure adversarial robustness. Engage a red team to test your ML controls specifically against adversarial techniques, not just commodity malware. The evasion techniques in this post are known and operationalised; attackers are using them now.
Implement UEBA with appropriate tuning and peer grouping. Out-of-the-box UEBA deployments with poorly tuned baselines generate high false positive rates that lead security teams to suppress alerts. Invest in proper onboarding, peer group configuration, and alert triage workflows before relying on UEBA as a compensating control.
Monitor for model probing activity. Repeated systematic queries to security controls from the same source with incrementally varying payloads is a recognisable pattern. Implement rate limiting and anomaly detection on your own security control query interfaces.

Key takeaway: Machine learning does not make security controls evasion-proof — it changes the evasion methodology from signature matching to feature space manipulation. Adversarial ML is a maturing discipline with a growing toolset, and red team engagements that include explicit ML evasion testing reveal a category of gaps that traditional penetration testing misses entirely. The organisations best positioned to defend against these techniques are those that have already seen them used against their controls.

Adversarial Machine Learning: Evading ML-Based Security Controls in Red Team Engagements

How Machine Learning Has Reshaped Defensive Security

Adversarial Example Generation Against Malware Classifiers

Feature Space Mapping

Feature Manipulation Techniques

Evasion of ML-Based Phishing Detection

Content Feature Evasion

Sender Reputation Evasion

WAF ML Bypass Techniques

Encoding and Normalisation Abuse

Baseline Deviation Minimisation

HTTP/2 and Protocol-Level Evasion

UEBA Anomaly Detection Evasion Through Behavioural Mimicry

Baseline Reconnaissance

Temporal Pacing

Access Pattern Normalisation

Exfiltration Channel Selection

Methodology for Testing ML Security Controls

Phase 1: Control Inventory and Architecture Mapping

Phase 2: Feature Enumeration

Phase 3: Adversarial Sample Generation

Phase 4: Controlled Evasion Testing and Gap Analysis

Defensive Implications and Hardening Recommendations