Read the IgDesign paper.
We recommend micromamba as the package manager.
$ micromamba env create -f environment.yml
$ micromamba activate igdesign
$ pip install -e .
This is a multi-step process so we've provided a script. We also assume you're using micromamba as the package manager. If not, please edit the script accordingly.
$ chmod +x install_amd_env.sh
$ ./install_amd_env.sh
$ micromamba activate igdesign
In order to download our pretrained model weights please use the provided shell script:
$ chmod +x download_ckpts.sh
$ ./download_ckpts.sh
This script will download the LMDesign and fine-tuned ProteinMPNN weights associated with the split which excluded the ACVR2B target.
To inverse fold sequences, run predict.py
with the environment activated to use the default config with PDB:1N8Z (Trastuzumab-HER2):
$ (igdesign) python predict.py
To use a different config specify the --config_name
parameter, for example using PDB:5NGV (Bimagrumab-ACVR2B)
$ (igdesign) python predict.py --config_name 5ngv.yaml
The output will be a CSV containing sequences and (if specified) cross-entropy losses for all of the specified CDRs. For example, the column "hcdr3"
corresponds to the generated HCDR3s and the column "ce_loss_independent_hcdr3"
corresponds to the cross-entropy scores.
Inference details are specified using YAML configs, examples of which can be found in ./configs
. Each config uses the following parameters:
structure_path
: Path to a PDB file with the desired structurelmdesign_checkpoint
: Path to IgDesign checkpointpmpnn_checkpoint
: Path to IgMPNN checkpointsave_path
: Path to save generated sequences and scoresregion_order
: List of CDRs to be designed and the order they will be designed inlmdesign_num_decoding_orders
: Number of decoding orders used during IgDesign inferencelmdesign_num_pmpnn_seqs
: Number of IgMPNN sequences used during IgDesign inferencelmdesign_num_lm_seqs
: Number of language model samples used during IgDesign inferencelmdesign_pmpnn_logit_temperature
: Sampling temperature for IgMPNNlmdesign_output_logit_temperature
: Sampling temperature for IgDesignindependent_loss
: If true, computes cross-entropy loss using IgDesigncondition_on_light_chain
: If true, conditions on light chain sequencecondition_on_antigen
: If true, conditions on antigen sequenceepitope_idxs_or_all
: Set to"all"
to use full antigen. Set to a list of indices to use a subset of the antigenantigen_chain_id
: Chain ID in PDB file for antigenheavy_chain_id
: Chain ID in PDB file for heavy chainlight_chain_id
: Chain ID in PDB file for light chainregions
: Each key of this should be a CDR inregion_order
mapping topositions
(list of indices corresponding to the CDR) andchain
(either"heavy"
or"light"
)
SPR data generated from validating IgDesign is included in "./data"
with representative sensorgrams in "./data/Sensorgrams"
. Each CSV corresponds to one antibody-antigen system. The columns for each dataset are:
Target
: The target screened againstReference Ab
: The reference antibody usedMethod
: Method for sequence generation. Either"Positive Control"
,"SAbDab"
, or"Inverse Folding"
HCDRs Designed
: Except for positive control, either"HCDR3"
or"HCDR123"
HCDR1
: HCDR1 sequenceHCDR2
: HCDR2 sequenceHCDR3
: HCDR3 sequenceKD (nM)
: Binding affinity measured in SPR (average of replicates)Binding Replicates
: Number of observed binding replicatesReplicates
: Number of observed replicatesRatio Binding Replicates
: Ratio of replicates that show bindingBinder
: True for binders, False for non-binders. Except for positive controls, we call a sequence a binder if it binds in all replicates, otherwise we call it a non-binder.- scRMSD columns: Columns for scRMSD are organized by region (one of
"HCDR1"
,"HCDR2"
,"HCDR3"
,"HCDR123"
, or"Fv"
) followed by"scRMSD"
and followed by the model or strategy in parentheses ("ABB2"
for ABodyBuilder2,"ABB3"
for ABodyBuilder3,"ABB3-LM"
for ABodyBuilder3-LM,"ESMFold"
for ESMFold,"Mean/Min/Max Ensemble"
for mean/min/max ensembles).
Note
: At this time, we have released datasets for 7 out of 8 antibody-antigen systems used to evaluate IgDesign. The dataset for CD40-Ravagalimab has sequences blinded as X's and is included in "./data/Blinded"
.
We've tested this model on Nvidia A100 and H100 and AMD Mi300x gpus. We expect the model to perform equally well on the Nvidia H200 and the upcoming AMD Mi325x. If you encounter any bugs, feel free to open a ticket.
If you find our code, data, or results useful, we ask that you cite our work:
@article{Shanehsazzadeh2023igdesign,
title = {IgDesign: In vitro validated antibody design against multiple therapeutic antigens using inverse folding},
url = {http://dx.doi.org/10.1101/2023.12.08.570889},
DOI = {10.1101/2023.12.08.570889},
publisher = {Cold Spring Harbor Laboratory},
author = {Shanehsazzadeh, Amir and Alverio, Julian and Kasun, George and Levine, Simon and Calman, Ido and Khan, Jibran A and Chung, Chelsea and Diaz, Nicolas and Luton, Breanna K and Tarter, Ysis and McCloskey, Cailen and Bateman, Katherine B and Carter, Hayley and Chapman, Dalton and Consbruck, Rebecca and Jaeger, Alec and Kohnert, Christa and Kopec-Belliveau, Gaelin and Sutton, John M and Guo, Zheyuan and Canales, Gustavo and Ejan, Kai and Marsh, Emily and Ruelos, Alyssa and Ripley, Rylee and Stoddard, Brooke and Caguiat, Rodante and Chapman, Kyra and Saunders, Matthew and Sharp, Jared and Ganini da Silva, Douglas and Feltner, Audree and Ripley, Jake and Bryant, Megan E and Castillo, Danni and Meier, Joshua and Stegmann, Christian M and Moran, Katherine and Lemke, Christine and Abdulhaqq, Shaheed and Klug, Lillian R and Bachas, Sharrol},
year = {2023},
month = dec
}