Home SERVICES
All Services Red Team Operations Active Directory Cloud Security AI Red Teaming
ABOUT US
About Us Certifications FAQ
Process Industries Blog Request a Quote CONTACT
Request a Quote Get Help Now Ask a Question
Services / AI Red Teaming
New Service NEW

AI Red Teaming &
LLM Security Testing

Adversarial testing of AI and machine learning systems by a principal consultant who builds LLM security frameworks — not just runs generic scanners. We think like attackers who have studied your AI stack.

Offensive Security for AI Systems

AI is now embedded in production — customer-facing chatbots, internal copilots, autonomous agentic workflows, RAG-powered search, and LLM-augmented APIs. Each integration introduces a new class of attack surface that traditional penetration testing does not cover.

AI red teaming applies the same adversarial rigour as traditional offensive security — exploit-proven, manually driven, zero scanner dependency — to LLM applications, ML pipelines, and agentic systems. Every finding comes with a working proof-of-concept and a clear remediation path.

Arturs Stay has 15 years of offensive security experience and has been actively building and stress-testing LLM security testing frameworks since the technology reached enterprise adoption. This is not a compliance exercise — it is adversarial testing designed to find what your AI vendor's safety team missed.

Engagements are aligned to the OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, and EU AI Act security requirements — giving you findings that map directly to recognised standards for audit, board reporting, and regulatory purposes.

10
Attack Categories Covered
4
Recognised Frameworks Aligned
15yr
Principal Offensive Security Experience
0
Scanner Dependencies — Manual Only
Principal Consultant
Every AI red team engagement is led personally by Arturs Stay — OSCP, OSEP, CREST-certified — with direct experience attacking production LLM deployments. Toronto-based, serving enterprises across Canada and globally.
Attack Categories

Ten distinct attack categories mapped to real threat actor techniques — covering the full offensive surface of modern LLM deployments, from inference endpoints to training pipelines.

Direct Prompt Injection
Attacker-controlled input that overrides the model's intended instructions. We craft payloads that force role abandonment, policy bypass, and unintended output — testing every user-facing input surface systematically.
LLM01Input ManipulationRole Override
Indirect Prompt Injection
Malicious instructions embedded in external data sources — web pages, documents, emails, database records — that the model ingests and acts on without the user's knowledge. Critical for agentic systems with broad tool access.
LLM01Data Source PoisoningAgent Hijack
System Prompt Extraction
Techniques to leak confidential system prompt contents — business logic, persona instructions, API keys embedded in context, and proprietary prompt engineering — bypassing "keep this confidential" instructions through multi-turn manipulation.
LLM07Data LeakagePrompt Leak
Jailbreaking & Safety Bypass
Structured attacks to circumvent RLHF-trained safety guardrails, content filters, and policy refusals. We test DAN variants, persona switching, encoding tricks, token smuggling, and context-window exhaustion — documenting every successful bypass with reproducible evidence.
LLM01Guardrail BypassPolicy Evasion
Model Extraction & Inversion
Systematic querying to reconstruct model behaviour, steal fine-tuned capabilities, or reconstruct training data through membership inference and model inversion attacks. Particularly relevant for organisations that have invested heavily in proprietary fine-tuned models.
LLM10IP TheftInference Attacks
Data Poisoning
Adversarial manipulation of training data, fine-tuning datasets, and RLHF feedback loops to embed backdoors or degrade model reliability. We assess your ML data pipeline's resistance to poisoning at ingestion, preprocessing, and training stages.
LLM03Training IntegrityBackdoor Implant
RAG Pipeline Manipulation
Attacks targeting retrieval-augmented generation systems — poisoning the vector database, exploiting embedding collisions, manipulating retrieval ranking, and injecting adversarial documents that the retriever surfaces to the LLM context window.
LLM02Vector StoreEmbedding Attacks
Tool-Call & Function Hijacking
Exploiting LLM function-calling and plugin interfaces to invoke unintended tool actions — exfiltrating data through API calls, triggering state-changing operations, and chaining tool outputs to escalate from LLM access to backend system compromise.
LLM07Plugin SecurityAPI Abuse
Agentic AI Workflow Abuse
Multi-step attack chains against autonomous agents with persistent memory, planning capabilities, and real-world tool access. We test for goal hijacking, memory poisoning, plan manipulation, and the cascading consequences when an agent operates unsupervised over extended task horizons.
LLM06Agent SecurityMulti-Step Chains
Supply Chain Attacks on ML Models
Adversarial evaluation of third-party model dependencies — Hugging Face models, pre-trained checkpoints, open-source adapters, and fine-tuning datasets. We test for serialisation exploits (pickle), embedded backdoors, and typosquatting on model registries.
LLM05Model IntegrityThird-Party Risk
What We Test

AI red teaming covers every component of your AI stack — from the model endpoint to the data pipeline — across all deployment architectures.

LLM Applications & Chatbots Customer-facing and internal chatbots built on GPT-4, Claude, Gemini, Llama, Mistral, and other foundation models. Every input surface, output filter, and system prompt boundary is tested.
Agentic AI Workflows Autonomous agents using LangChain, AutoGen, CrewAI, OpenAI Assistants API, and custom orchestration. Planning loops, memory systems, and tool-call chains are stress-tested for multi-step abuse.
RAG Systems & Knowledge Bases Retrieval-augmented generation pipelines including vector databases (Pinecone, Weaviate, Chroma, pgvector), embedding models, rerankers, and document preprocessing stages.
Embedding Models & Pipelines Adversarial inputs crafted to manipulate embedding space similarity, poisoning semantic search results and downstream LLM context with attacker-controlled content.
LLM API & Plugin Integrations Third-party plugin ecosystems, function-calling interfaces, custom GPT actions, and API gateways that expose LLM capabilities — tested for injection, privilege escalation, and data exfiltration.
Fine-Tuned & Proprietary Models Custom fine-tuned models hosted on-premise or via private cloud inference. Training data integrity, model extraction resistance, and backdoor detection are all assessed.
Cloud-Hosted AI Services Azure OpenAI, AWS Bedrock, Google Vertex AI, and Hugging Face Inference Endpoints — including IAM misconfigurations, tenant isolation, and model access control weaknesses.
ML Training Pipelines Data ingestion, preprocessing, and model training infrastructure — assessed for supply chain integrity, dataset poisoning vectors, and pipeline compromise paths that could corrupt production models.
Standards & Frameworks

Every engagement maps findings to recognised AI security frameworks — enabling direct communication with auditors, boards, and regulators without translation overhead.

OWASP
The primary framework for LLM application security, covering the ten most critical vulnerability classes — from prompt injection (LLM01) and insecure output handling (LLM02) to model theft (LLM10). All attack categories in our methodology map directly to the OWASP LLM Top 10, giving your development team clear remediation guidance tied to an internationally recognised standard.
MITRE
ATLAS (Adversarial Threat Landscape for AI Systems) is the AI-equivalent of MITRE ATT&CK — a knowledge base of real-world adversary tactics and techniques targeting machine learning systems. Findings are mapped to ATLAS tactics including Reconnaissance, ML Attack Staging, Model Access, and Exfiltration, enabling threat-intelligence-grade reporting your security operations team can act on.
NIST
The NIST AI RMF provides a structured approach to managing AI risk across the Govern, Map, Measure, and Manage functions. Our assessments produce findings aligned to the RMF's MEASURE function — providing quantified, evidence-based input to your organisation's AI risk management processes and supporting compliance with emerging AI governance requirements in regulated industries.
EU AI Act
EU AI Act Security Requirements
For organisations subject to the EU AI Act — particularly those deploying high-risk AI systems — our assessments address cybersecurity requirements under Article 15, including robustness against adversarial attacks, error resilience, and accuracy consistency. Findings are documented in a format suitable for conformity assessment and regulatory submission, covering the security obligations that came into force under the Act's phased implementation timeline.
Common Questions
Do we need to give you access to the model weights?
Not for most attack categories. The majority of AI red teaming is black-box or grey-box — we interact through the same interfaces an attacker would use. For supply chain, data poisoning, and model extraction assessments, access to training pipelines or inference infrastructure may be required, which we scope during the pre-engagement call.
How is this different from a standard penetration test?
Traditional penetration testing does not cover LLM-specific attack surfaces — prompt injection, jailbreaking, RAG manipulation, and agentic workflow abuse require specialised techniques and tooling. AI red teaming is a distinct discipline that applies offensive security methodology to the unique trust models, input surfaces, and failure modes of AI systems. We run both — and know where the gap is.
What does the deliverable look like?
You receive a technical report with every finding mapped to OWASP LLM Top 10 and MITRE ATLAS, a working proof-of-concept for each vulnerability, a risk-prioritised remediation roadmap, and an executive summary for board and audit committee presentation. For EU AI Act or NIST AI RMF engagements, findings are additionally structured to align with those framework's reporting requirements.
We use a third-party LLM API — can you still test us?
Yes. Most organisations use foundation models via API (OpenAI, Anthropic, Google, Azure OpenAI). The attack surface in these cases focuses on your application layer — how you pass context, handle outputs, constrain user input, and connect the LLM to backend systems. This is frequently where the most critical vulnerabilities are found, regardless of the underlying model provider.
Request an AI Security Assessment
Tell us about your AI stack — LLMs, agentic workflows, RAG pipelines, or custom ML models. We will scope an adversarial assessment that matches your actual risk exposure, not a generic checklist.
Start the Conversation →
Toronto-based. Serving enterprises across Canada and globally. Engagements are principal-led — you work directly with Arturs Stay.