Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Sizhe Chen*, Arman Zharmagambetov, David Wagner, Chuan Guo* (* for equal technical contributions)

🔥 Meta-SecAlign models are now licensed for commercial use under the Llama community licenses, despite this codebase being licensed for non-commercial use only.

Comparable to GPT-5-high in agentic (tool/web) utility and security, Meta-SecAlign-70B is the first fully open-source LLM with built-in prompt injection defense and commercial-grade performance, unlocking open research on secure agentic applications (downloaded 10K times in 3 months).

Environment Setup

Hardware requirements: Meta-SecAlign-8B requires 4×80 GB A100s for training and one 16 GB GPU for evaluation. Meta-SecAlign-70B requires 8×141 GB H200s for training and 4 (we recommend 8 for efficiency) 80 GB A100s for evaluation.
Install uv (a Python package management tool).
Install Meta-SecAlign package dependencies:

git clone --recurse-submodules https://github.com/facebookresearch/Meta_SecAlign.git
cd Meta_SecAlign
uv venv metasecalign --python 3.13
source metasecalign/bin/activate
uv pip install -r requirements.txt
uv pip install torchtune==0.6.0 --index-url https://download.pytorch.org/whl/cu126

Install Meta-SecAlign data dependencies (including those used for SEP utility evaluation if you have a GPU available):

python setup.py

Configure OpenAI keys (used for utility evaluation) in data/openai_configs.yaml. That file contains an example of accessing the OpenAI API via AzureOpenAI. A more detailed example is available here.
[Optional] Configure Gemini keys in data/gemini_configs.yaml if you want to evaluate Gemini models.

Demo

demo.py contains minimal code to use our two Meta-SecAlign models. Feel free to try new samples and prompt injections, or test the models on your codebase:

python demo.py

Evaluation

run_tests.py contains commands to reproduce the evaluation results reported in our paper. It sequentially invokes tests.py, test_lm_eval.py, test_agentdojo.py, and test_injecagent.py. Results will be logged to [model_path]/summary.tsv.

python run_tests.py -m [model_path] --lora_alpha [lora_alpha]

model_path is the path to the tested model. We support:
- Local models (vLLM inference)
  - meta-llama/Llama-3.1-8B-Instruct_SecAlign (Meta-SecAlign-8B downloaded by setup.py): the first fully open model with state-of-the-art prompt injection defense
  - meta-llama/Llama-3.3-70B-Instruct_SecAlign (Meta-SecAlign-70B downloaded by setup.py): the first fully open model with state-of-the-art prompt injection defense
  - meta-llama/Llama-3.1-8B-Instruct
  - meta-llama/Llama-3.3-70B-Instruct
  - Other Hugging Face open-weight models may also be natively supported.
- OpenAI GPT models
  - gpt-4o-mini: the first commercial model with instruction hierarchy prompt injection defense.
  - gpt-4o: the follow-up flagship model, also with prompt injection defense.
  - gpt-5: the latest and most secure commercial model in our evaluation; change reasoning levels by specifying --gpt5_reasoning_effort (default to high).
- Google Gemini models
  - gemini-2.0-flash: a Google commercial model with a claimed prompt injection defense
  - gemini-2.5-flash: a Google commercial model with a claimed prompt injection defense
  - gemini-2.0-pro: a flagship Google model (not claimed to include a prompt injection defense)
  - gemini-2.5-pro: a flagship Google model (not claimed to include a prompt injection defense)
  - gemini-3-pro-preview: the state-of-the-art Google model with strong prompt injection defense
[Optional] lora_alpha is a test-time hyper-parameter for Meta-SecAlign models. It defaults to 8, which uses the exact Meta-SecAlign models as trained. A lora_alpha value between 0 and 8 interpolates between the undefended model and our defended model to enable a flexible utility–security trade-off. Extrapolating lora_alpha beyond 8 is possible but untested.
We support the following prompt-injection benchmark evaluations for the community:
- 6 security benchmarks
  - instruction following: AlpacaFarm-Hacked, SEP, TaskTracker, CyberSecEval2
  - agentic tool-calling: InjecAgent, AgentDojo
- 8 utility benchmarks
  - general knowledge (from lm_eval): MMLU, MMLU-Pro, BBH, IFEval, GPQA Diamond
  - instruction following: AlpacaEval2, SEP (in SEP, we use AlpacaEval2 prompting to compare against reference responses from meta-llama/Meta-Llama-3-8B-Instruct)
  - agentic tool-calling: AgentDojo

Defensive Fine-Tuning (SecAlign++)

secalign_plus_plus.py provide commands to defensive-fine-tune meta-llama/Llama-3.1-8B-Instruct (default) or meta-llama/Llama-3.3-70B-Instruct (uncomment the specific line to fine-tune it) to a robust LoRA model using our training recipe, SecAlign++.

python secalign_plus_plus.py

Code Acknowledgements

Significantly improved from SecAlign, the majority of the Meta-SecAlign code is licensed under CC-BY-NC. Portions of the project are available under separate license terms: AgentDojo, TaskTracker, and lm-evaluation-harness are licensed under MIT. Code from other repositories includes AgentDojo (agentdojo), TaskTracker (setup.py), and lm_eval_harness (lm_eval_config). This software and/or data was deposited in the BAIR Open Research Commons repository in 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
agentdojo @ d3640b5		agentdojo @ d3640b5
data		data
helpers		helpers
lm_eval_config		lm_eval_config
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
demo.py		demo.py
requirements.txt		requirements.txt
run_tests.py		run_tests.py
secalign_plus_plus.py		secalign_plus_plus.py
setup.py		setup.py
test.py		test.py
test_agentdojo.py		test_agentdojo.py
test_injecagent.py		test_injecagent.py
test_lm_eval.py		test_lm_eval.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Environment Setup

Demo

Evaluation

Defensive Fine-Tuning (SecAlign++)

Code Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

facebookresearch/Meta_SecAlign

Folders and files

Latest commit

History

Repository files navigation

Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks

Environment Setup

Demo

Evaluation

Defensive Fine-Tuning (SecAlign++)

Code Acknowledgements

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages