ESM

Shorthand esm.pretrained. Dataset Description
ESM-2 esm2_t36_3B_UR50D() esm2_t48_15B_UR50D() UR50 (sample UR90) SOTA general-purpose protein language model. Can be used to predict structure, function and other protein properties directly from individual sequences. Released with Lin et al. 2022 (Aug 2022 update).
ESMFold esmfold_v1() PDB + UR50 End-to-end single sequence 3D structure predictor (Nov 2022 update).
ESM-MSA-1b esm_msa1b_t12_100M_UR50S() UR50 + MSA MSA Transformer language model. Can be used to extract embeddings from an MSA. Enables SOTA inference of structure. Released with Rao et al. 2021 (ICML'21 version, June 2021).
ESM-1v esm1v_t33_650M_UR90S_1() ... esm1v_t33_650M_UR90S_5() UR90 Language model specialized for prediction of variant effects. Enables SOTA zero-shot prediction of the functional effects of sequence variations. Same architecture as ESM-1b, but trained on UniRef90. Released with Meier et al. 2021.
ESM-IF1 esm_if1_gvp4_t16_142M_UR50() CATH + UR50 Inverse folding model. Can be used to design sequences for given structures, or to predict functional effects of sequence variation for given structures. Enables SOTA fixed backbone sequence design. Released with Hsu et al. 2022.

Comments