Full Catalog

Benchmarks

Every AI biology and biomedical benchmark we track, with saturation status, scores, and successor information.

272 of 272 benchmarks
NameDomainYearTypeStatusTime to SaturationBest ScoreHuman BaselineSuccessor
MolQuestDrug Discovery2026Agent WorkflowActiveSOTA ~50%; most <30%
FLIP2Protein2026RegressionActiveVaries
Doctorina MedBenchClinical2026Agent SimulationActiveVaries (>1,000 cases)
LABBench2Agentic Bio2026Research WorkflowActive26-46% drop vs LAB-Bench
MolGenBenchDrug Discovery2025Structure-based GenerationActiveVaries
ToxiMolDrug Discovery2025GenerationActiveLow success rates
PDBbind CleanSplitDrug Discovery2025RegressionActiveVaries (leak-free)
PoseXDrug Discovery2025Pose PredictionActiveAI > physics-based (post-relaxation)
FoldBenchProtein2025Structural PredictionActiveVaries by task
AbBiBenchProtein2025Antibody DesignActiveVaries
EC-BenchProtein2025ClassificationActiveVaries
MotifBenchProtein2025GenerationActiveRFdiffusion solves ~16/30
SHAPESProtein2025GenerationActiveMost models fail on loops/mixed
PFMBenchProtein2025Multi-taskActiveVaries across 38 tasks
LiveProteinBenchProtein2025Multi-taskActiveVaries
rnaglibProtein2025Multi-task (RNA 3D)ActiveVaries
PeptoneBenchProtein2025Conformational DynamicsActiveBioEmu and PepTron lead
PDFBenchProtein2025Protein Design from FunctionActiveVaries across 16 metrics
SafeProteinBiosecurity2025Red-teamingActiveVaries
Gene-MTEBGenomics2025Classification/ClusteringActive0.59 average
NullsettesGenomics2025Zero-shot PredictionActiveMost models fail
METAGENE-1Genomics2025Classification/ClusteringActiveMCC 92.96 (pathogen)
OmniGenBenchGenomics2025Multi-taskActiveVaries across 123+ datasets
DNALongBenchGenomics2025Classification/RegressionActiveExpert models lead
NABenchGenomics2025Fitness PredictionActiveVaries
AlphaGenomeGenomics2025Multi-taskActive25/26 VEP evaluations SOTA
TraitGymGenomics2025Variant PredictionActiveCADD/GPN-MSA best for disease traits
Arc Institute Virtual Cell ChallengeVirtual Cell2025Perturbation PredictionActiveStatistical baselines competitive with AI
scPerturBenchVirtual Cell2025Perturbation PredictionActiveNo method consistently wins; linear baselines competitive
HLE (Bio/Medicine)Science QA2025Short Answer/MCQActive44.7% overall; ~22% bio/med98%
ATLAS (Bio)Science QA2025Open-endedActiveVaries
MedHELMClinical2025Multi-task ClinicalActive66% win-rate (DeepSeek R1)
MedAgentBenchClinical2025Agent WorkflowActive69.67%
HealthBenchClinical2025ConversationalActive60% (o3); Hard: 32%
FHIR-AgentBenchClinical2025Agent WorkflowActiveVaries
LiveMedBenchClinical2025Dynamic MCQActiveVaries (refreshes weekly)
MedXpertQAClinical2025MCQActiveVaries (4,460 Qs, 17 specialties)
DiagnosisArenaClinical2025Diagnostic ReasoningActive45.82%
Script Concordance TestingClinical2025Probabilistic ReasoningActiveVaries
AgentClinicClinical2025Diagnostic AgentActiveVaries (120 NEJM cases)
MedBench v4 (China)Clinical2025Multi-taskActiveAgent 85.3/100 (Claude Sonnet 4.5)
VCTBiosecurity2025Multimodal QAActive43.8% (o3)22.1% (expert virologists)
ABC-BenchBiosecurity2025Agent WorkflowActive53% (Grok 3)24% (PhD experts)
ABLEBiosecurity2025Agent WorkflowActive7/8 low-level tasks
BBG Framework / B3Biosecurity2025Open-endedUnknownPilot phase
Scale AI FORTRESS (CBRNE)Biosecurity2025Adversarial SafetyActiveVaries
Scale AI PropensityBenchBiosecurity2025Agent SafetyActiveVaries (5,874 tasks)
TroubleshootingBenchBiosecurity2025Protocol QAActiveNo model exceeds 80th-pct expert (36.4%)80th-pct expert: 36.4%
BixBenchAgentic Bio2025BioinformaticsActive17% (Claude 3.5 Sonnet); 9% (GPT-4o)
MedAgentGymAgentic Bio2025Training EnvironmentActiveN/A
BioProBenchAgentic Bio2025Protocol UnderstandingActiveVaries
CT-BenchMedical Imaging2025Multimodal QAActive61.8%
ReXVQAMedical Imaging2025Visual QAActive83.24% (MedGemma)
MatBench DiscoveryDrug Discovery2025Crystal StabilityNearing12 moF1=0.924
FairMedQAClinical2025Fairness EvaluationActive3-19 pct point accuracy disparity
MEDECClinical2025Error Detection/CorrectionActiveVaries
MedThink-BenchClinical2025Reasoning EvaluationActiveVaries (500 QA pairs)
CLINICClinical2025TrustworthinessActiveDeepSeek-R1-LLaMA best robustness
MedAgentsBenchClinical2025Multi-agent ReasoningActive92.6% (TeamMedAgents)
TDC-2 (Single-cell DTI)Drug Discovery2024ClassificationActiveVaries
TDC-2 Protein-Peptide BindingDrug Discovery2024RegressionActiveVaries
DrugGymDrug Discovery2024Agent WorkflowActiveN/A (simulator)
MolScoreDrug Discovery2024Meta-frameworkUnknownN/A
PolarisDrug Discovery2024Classification/RegressionActiveVaries (blind test)
WelQrateDrug Discovery2024ClassificationActiveVaries
PharmaBenchDrug Discovery2024Classification/RegressionActiveVaries
PLINDERDrug Discovery2024Docking/RegressionActiveDiffDock drops 38%->15% with leak control
CPI2MDrug Discovery2024RegressionActiveBaseline
PoseBenchDrug Discovery2024Pose PredictionActive~68% (PB); ~33% (DockGen-E)
DockGenDrug Discovery2024Pose PredictionActive~24-33% top-1 RMSD<2A
MF-PCBADrug Discovery2024Virtual ScreeningActiveBaselines established
TOMG-BenchDrug Discovery2024LLM GenerationActiveFine-tuned Llama > GPT-3.5 by 46.5%
ChemBenchDrug Discovery2024MCQ/Free-formNearing24 moClaude 3.5 Sonnet > best human chemistsExpert chemists
CASP16 (Complexes)Protein2024Structural PredictionActiveVaries; AB-Ag major failure area
CASP16 (RNA)Genomics2024Structural PredictionActiveNo TM-score >0.8 for novel RNAs
CASP16 (Protein-Ligand)Drug Discovery2024Binding AffinityActiveMax Kendall tau 0.42
ProteinBenchProtein2024Multi-taskActiveVaries; drops >20% at 500+ residues
PPB-AffinityProtein2024RegressionActiveVaries
BEACON (RNA Tasks)Genomics2024Multi-task (13 RNA tasks)ActivePLMs surpass SOTA on 8/13
ATLAS MD Ensemble DatasetProtein2024Conformational DynamicsActiveVaries
Evo / Evo 2Genomics2024Multi-taskActiveBRCA1 >90% AUROC (Evo 2)
DART-EvalGenomics2024Multi-taskActiveDNALMs inconsistent
GenBenchGenomics2024Multi-taskActiveVaries across 43 datasets
Genomics Long-Range BenchmarkGenomics2024Classification/RegressionActiveLarge gaps in VEP
BorzoiGenomics2024Gene Expression PredictionActive524kb input, 32bp resolutionAlphaGenome
PRIDICT 2.0 (Prime Editing)Genomics2024RegressionActiveSpearman R=0.85
CZI cz-benchmarksVirtual Cell2024Classification/PredictionActiveVaries
scPerturbVirtual Cell2024Data Resource/QCActiveE-distance metricscPerturBench
PerturBenchVirtual Cell2024Perturbation PredictionActiveSimple models outperform complex
MMLU-Pro MedicalBio NLP2024MCQSaturated18 mo90.1%
MedCalc-BenchClinical2024CalculationActive50.9%
MedS-BenchClinical2024Multi-taskActiveVaries across 39 datasets
JAMA Clinical ChallengeClinical2024MCQActive88.6% (o1-preview on 70 cases)
AfriMed-QAClinical2024MCQActiveVaries (15K+ Qs, 32 specialties)
MMedBenchClinical2024Multilingual MCQActiveVaries across 6 languages
MultiADEBio NLP2024ExtractionActiveVaries across 6 ADE datasets
WMDP-BioBiosecurity2024MCQSaturated7 mo87%60.5%VCT, ABC-Bench
BioLP-BenchBiosecurity2024Protocol Error DetectionActive34% (o4-mini)~38% (expert avg)
LAB-BenchAgentic Bio2024Research WorkflowSaturated18 mo89%; several subtasks at ceiling~79% (expert on ProtocolQA)LABBench2
ScienceAgentBenchAgentic Bio2024Code GenerationActive42.2% (o1-preview + self-debug)
BioCoderAgentic Bio2024Code GenerationNearing24 mo~50% Pass@1 (GPT-4)BixBench
LitQA2 / PaperQA2Agentic Bio2024Literature QANearing24 mo~90% (PaperQA2)~67% (human experts)
OmniMedVQAMedical Imaging2024Visual QAActiveLVLMs struggle
ReXrankMedical Imaging2024Report GenerationActive1/RadCliQ-v1 0.98 (ReXGradient)
TDC Clinical Trial OutcomeDrug Discovery2023ClassificationActiveVaries
TARTARUSDrug Discovery2023Generation/RLActiveLow success on hard objectives
TOXRICDrug Discovery2023ClassificationActiveVaries
PoseBustersDrug Discovery2023Pose ValidationActive~75% PB-Valid
ProteinGymProtein Fitness2023RegressionActiveSpearman ~0.52
ProteinInvBenchProtein2023GenerationActiveRecovery ~66%
Mega-Scale Stability DatasetProtein2023RegressionActivePCC 0.72 (ThermoMPNN)
GUEGenomics2023ClassificationActiveF1 0.3-0.95DART-Eval, GenBench
BENDGenomics2023Classification/RegressionActiveVaries
NT-18 BenchmarkGenomics2023ClassificationActiveMCC 0.974 (promoter); 0.983 (splice)NTv3
DeepPrimeGenomics2023RegressionActiveApproaching strong performance for specific edits
GEARSVirtual Cell2023Perturbation PredictionActive40% higher precision than baselines (claimed); linear models outperform (2025)
GPQA Diamond (Bio)Science QA2023MCQSaturated36 mo94.1%67%HLE, ATLAS
EHRSHOTClinical2023Few-shot PredictionActiveVaries across 15 tasks
ACI-BENCHClinical2023Note GenerationActiveMEDCON 57.78
TotalSegmentatorMedical Imaging2023SegmentationNearing36 moAvg Dice >0.90 for major organs
MedPerfClinical2023Federated BenchmarkingActiveN/A (platform)
PMODrug Discovery2022OptimizationActiveVaries
DOCKSTRINGDrug Discovery2022Regression/GenerationActiveVaries
PEERProtein2022Multi-taskActiveVaries across 17 tasks
LRGB Peptides-funcProtein2022ClassificationActiveVaries (AP)
LRGB Peptides-structProtein2022RegressionActiveVaries (MAE)
GenomicBenchmarksGenomics2022ClassificationSaturated18 mo95%+ accuracyGUE, BEND
SeiGenomics2022ClassificationActiveAUROC 0.972; AUPRC 0.409AlphaGenome
MedMCQAClinical2022MCQActive~75-80%
BigBIOBio NLP2022FrameworkActiveN/A (126+ datasets)
BioREDBio NLP2022Relation ExtractionActiveVaries
AMOSMedical Imaging2022SegmentationActiveVaries (15 organs)AMOS-MM (2024)
TDC ADMET (22 leaderboards)Drug Discovery2021Classification/RegressionActiveVaries by taskTDC-2
TDC DrugComboDrug Discovery2021RegressionActiveVaries
TDC DockingDrug Discovery2021GenerationActiveVaries
TDC DTI DG GroupDrug Discovery2021RegressionActiveVaries
PCQM4Mv2Drug Discovery2021RegressionActiveMAE 0.0719 eV
FLIPProtein2021RegressionNearing60 moVariesFLIP2
ATOM3DProtein2021Multi-task (3D)ActiveVaries across 8 tasks
EnformerGenomics2021Gene Expression PredictionActiveCross-gene CAGE Pearson 0.85~0.94 (experimental replicate ceiling)Borzoi, AlphaGenome
DNABERT evaluation tasksGenomics2021ClassificationNearing60 moF1 0.940 (promoter); MCC 0.871 (splice)DNABERT-2, GUE
CRISPR on-target (CRISPRon)Genomics2021RegressionNearing60 moSpearman ~0.80Ceiling ~0.85-0.90 (biological noise)
Open Problems (single-cell)Virtual Cell2021Multi-taskActiveVaries across 12 tasks
MedQA (USMLE)Clinical2021MCQSaturated42 mo96.5%87%MedXpertQA, HealthBench
SLAKEMedical Imaging2021Visual QAActive~78.7%
RadGraphMedical Imaging2021Entity/Relation ExtractionActiveMicro F1 0.82
TDC Caco-2 PermeabilityDrug Discovery2021RegressionActiveMAE 0.256TDC-2
TDC Human Intestinal AbsorptionDrug Discovery2021ClassificationActiveAUROC 0.993TDC-2
TDC P-glycoprotein InhibitionDrug Discovery2021ClassificationActiveAUROC 0.938TDC-2
TDC BioavailabilityDrug Discovery2021ClassificationActiveAUROC 0.942TDC-2
TDC LipophilicityDrug Discovery2021RegressionActiveMAE 0.456TDC-2
TDC Aqueous SolubilityDrug Discovery2021RegressionActiveMAE 0.741TDC-2
TDC Blood-Brain BarrierDrug Discovery2021ClassificationActiveAUROC 0.924TDC-2
TDC Plasma Protein BindingDrug Discovery2021RegressionActiveMAE 7.526TDC-2
TDC Volume of DistributionDrug Discovery2021RegressionActiveSpearman 0.713TDC-2
TDC CYP2C9 InhibitionDrug Discovery2021ClassificationActiveAUPRC 0.859TDC-2
TDC CYP2D6 InhibitionDrug Discovery2021ClassificationActiveAUPRC 0.79TDC-2
TDC CYP3A4 InhibitionDrug Discovery2021ClassificationActiveAUPRC 0.916TDC-2
TDC CYP2C9 SubstrateDrug Discovery2021ClassificationActiveAUPRC 0.474TDC-2
TDC CYP3A4 SubstrateDrug Discovery2021ClassificationActiveAUPRC 0.667TDC-2
TDC Half-LifeDrug Discovery2021RegressionActiveSpearman 0.576TDC-2
TDC Hepatocyte ClearanceDrug Discovery2021RegressionActiveSpearman 0.536TDC-2
TDC Microsome ClearanceDrug Discovery2021RegressionActiveSpearman 0.63TDC-2
TDC Acute Toxicity LD50Drug Discovery2021RegressionActiveMAE 0.552TDC-2
TDC hERG CardiotoxicityDrug Discovery2021ClassificationActiveAUROC 0.88TDC-2
TDC AMES MutagenicityDrug Discovery2021ClassificationActiveAUROC 0.871TDC-2
TDC Drug-Induced Liver InjuryDrug Discovery2021ClassificationActiveAUROC 0.956TDC-2
TDC DTI BindingDBDrug Discovery2021RegressionActivePCC 0.588TDC-2
BioREDBio NLP2021Relation ExtractionActiveF1 % 89.3
Montreal Archive of Sleep StudiesClinical2021ClassificationActiveAccuracy % 86.8
LIT-PCBADrug Discovery2020Virtual ScreeningActiveVariesMF-PCBA
CrossDocked2020Drug Discovery2020Scoring/PoseActiveR 0.612
S669 (Stability blind test)Protein2020RegressionActivePCC ~0.43-0.67
OGB Protein TasksProtein2020Classification/Link PredictionActiveVaries
scIBVirtual Cell2020IntegrationNearing72 moscANVI ~0.8CZI cz-benchmarks
BEELINE (GRN inference)Virtual Cell2020GRN InferenceActiveClose to random predictor in many cases
MMLU-BioBio NLP2020MCQSaturated48 mo93%+89.8%MMLU-Pro
BLURBBio NLP2020Multi-task NLPNearing72 mo82.91 BLURB scoreBigBIO
MIMIC-IV BenchmarksClinical2020PredictionActiveHospitalization AUROC ~0.87
PathVQAMedical Imaging2020Visual QAActive50-65%GEMeX
PANDA (Prostate Gleason)Medical Imaging2020ClassificationActiveQWK ~0.93+
OC20Drug Discovery2020Catalyst PredictionActiveEquiformerV2 leadsOC22, OC25
OGB-MolHIVDrug Discovery2020ClassificationActive0.835 ROC-AUC
LIT-PCBA (ALDH1)Drug Discovery2020ClassificationActiveAUC 0.806
LIT-PCBA (KAT2A)Drug Discovery2020ClassificationActiveAUC 0.746
LIT-PCBA (MAPK1)Drug Discovery2020ClassificationActiveAUC 0.743
LIT-PCBA (ESR1 antagonist)Drug Discovery2020ClassificationActiveAUC 0.666
BioNLP13-CGBio NLP2020NERActiveF1 % 87.83
GuacaMolDrug Discovery2019GenerationSaturated24 moNear-perfect on simple goalsPMO, TARTARUS, DrugGym
MOSESDrug Discovery2019GenerationSaturated48 moHigh validity/uniquenessMolScore, TARTARUS
TAPEProtein2019Multi-taskSaturated48 moNear-ceiling on most tasksProteinGym, PEER, FLIP
SKEMPI 2.0Protein2019RegressionActivePearson R ~0.7-0.8
SpliceAIGenomics2019ClassificationNearing84 moAUPRC 0.98; ~95% top-kOpenSpliceAI, AlphaGenome
dynverse (Trajectory inference)Virtual Cell2019Trajectory InferenceActiveVaries by topology
PubMedQABio NLP2019MCQActive72 mo81.6%78%MedHELM, BigBIO
PhysioNet 2019 (Sepsis)Clinical2019PredictionActiveVaries (104 teams)
CheXpertMedical Imaging2019ClassificationNearing84 moAUC ~0.942.6/3 radiologistsCheXpert Plus
MIMIC-CXRMedical Imaging2019Report GenerationActiveRadCliQ-v1 0.92
APTOS Diabetic RetinopathyMedical Imaging2019ClassificationNearing84 moQWK 0.967
MedNLIClinical2019ClassificationActiveAccuracy % 86.59
DDI Extraction 2013Bio NLP2019Relation ExtractionActiveF1 % 83.35
MoleculeNetDrug Discovery2018Classification/RegressionNearing96 moAUROC 0.85-0.95TDC, WelQrate
MedNLIBio NLP2018NLINearing96 mo~82% accuracyBioNLI
VQA-RADMedical Imaging2018Visual QAUnknown79.2%
HAM10000Medical Imaging2018ClassificationNearing96 mo~96%+ accuracyDermaMNIST-E
PCamMedical Imaging2018ClassificationNearing96 mo~97%+ accuracy
Medical Segmentation DecathlonMedical Imaging2018SegmentationActivennU-Net variants lead
RSNA Pneumonia DetectionMedical Imaging2018Object DetectionActiveVaries (1,400+ Kaggle teams)
BBBPDrug Discovery2018ClassificationNearing96.4% ROC-AUC
BACEDrug Discovery2018ClassificationActive88.4% ROC-AUC
ClinToxDrug Discovery2018ClassificationNearing99.2% ROC-AUC
SIDERDrug Discovery2018Multi-label ClassificationActive91.1% ROC-AUC
ToxCastDrug Discovery2018Multi-label ClassificationActive78.2% ROC-AUC
MUVDrug Discovery2018Virtual ScreeningNearing99.8% ROC-AUC
HIV (MoleculeNet)Drug Discovery2018ClassificationActiveAUC 0.851
EBM-NLPBio NLP2018NERActiveF1 % 76.01
USPTO-MITDrug Discovery2017Reaction PredictionNearing108 mo>90% top-1 (forward)
CAMI (Metagenomic)Genomics2017Classification/AssemblyActiveGood at genus, poor at strainCAMI III
CRISPR off-target (CIRCLE-seq)Genomics2017ClassificationActiveAUROC 0.977 (CCLMoff)
ChemProt REBio NLP2017Relation ExtractionNearing108 moF1 90.8%
MIMIC-III BenchmarksClinical2017PredictionNearing108 moAUROC ~0.94 (mortality)MIMIC-IV, YAIB
TAC ADRBio NLP2017ExtractionNearing108 moF1 ~85.2%MultiADE
SMM4HBio NLP2017Social Media MiningActiveADR detection F1 ~0.65-0.70
NIH ChestX-ray14Medical Imaging2017ClassificationNearing108 moAUC ~0.85-0.88CheXpert, MIMIC-CXR
Camelyon17Medical Imaging2017ClassificationActiveKappa ~0.89Camelyon+
CASF-2016Drug Discovery2016Scoring/Ranking/DockingNearing120 moPearson R ~0.86PDBbind CleanSplit
USPTO-50KDrug Discovery2016RetrosynthesisNearing120 mo65% top-1~48.2% avg (forward)
HoC (Hallmarks of Cancer)Bio NLP2016ClassificationNearing120 moF1 ~90.3%
ISIC ChallengesMedical Imaging2016Classification/SegmentationNearing120 moExceeds cliniciansClinician level3D TBP (2024)
Camelyon16Medical Imaging2016ClassificationSaturated72 moAUC 0.994Pathologist AUCCamelyon17, Camelyon+
CAMELYON16Medical Imaging2016Pathology DetectionNearing0.987 AUC
OGB-MolPCBADrug Discovery2016Classification/RegressionActiveTest AP 0.3167
TS115Protein2016RegressionActiveQ3 Accuracy 0.87
ZINCDrug Discovery2015Regression/GenerationActiveVariesZINC20, ZINC-22
Schneider 50KDrug Discovery2015Reaction ClassificationSaturated72 mo>99% (RXNFP)USPTO 1K TPL
QM8Drug Discovery2015RegressionNearing132 moNear saturationQM9
DeepSEAGenomics2015ClassificationNearing132 moAUC 0.958Sei, AlphaGenome
BC5CDR-Chemical NERBio NLP2015NERSaturated84 moF1 94.2%BioRED
BC5CDR-Disease NERBio NLP2015NERNearing132 moF1 ~90%
EyePACS DRMedical Imaging2015ClassificationSaturated48 moAUC ~0.99
BC5CDRBio NLP2015Relation ExtractionNearing91.9% F1
PCBADrug Discovery2015ClassificationActiveAUC 0.8887
Tox21Drug Discovery2014ClassificationUnknownAUC ~0.85
KIBADrug Discovery2014RegressionActiveCI ~0.898
QM9Drug Discovery2014RegressionNearing144 moNear chemical accuracyPCQM4Mv2
SAbDab (CDR design)Protein2014GenerationActiveCDR-H3 AAR ~40-50%; RMSD ~2.5-3.5A
NCBI-Disease NERBio NLP2014NERSaturated96 moF1 ~91%IAA ~87%
CAMEOProtein2013Structure EvaluationSaturated144 moAlphaFold dominantCAMEO complexes/ligands/peptides
ClinVar (coding variants)Genomics2013ClassificationNearing156 moAUC ~0.95
BioASQBio QA2013Semantic QAActiveF1 ~0.58; yes/no >80%
DDI REBio NLP2013Relation ExtractionSaturated120 moF1 83.3%MultiADE
DUD-EDrug Discovery2012Virtual ScreeningSaturated96 moBiased high AUCLIT-PCBA, MF-PCBA, WelQrate
QM7/QM7bDrug Discovery2012RegressionSaturated96 moChemical accuracy achievedQM9
RNA-PuzzlesProtein2012Structural PredictionActiveBest RMSD 3-7AHuman experts outperform servers
BraTSMedical Imaging2012SegmentationActiveDice ~0.88 whole tumorBraTS 2025 Lighthouse
HIV-DTI-77Drug Discovery2012ClassificationActiveF1 % 68.3
HIV-fMRI-77Medical Imaging2012ClassificationActiveF1 % 72.2
DAVISDrug Discovery2011RegressionNearing180 moCI ~0.903
CAGIGenomics2011Variant InterpretationActiveVaries
CAFAProtein2010Function PredictionActiveFmax 0.4-0.6
LINNAEUS NERBio NLP2010NERSaturated132 moMid-to-high 90s%
BC2GMBio NLP2008Named Entity RecognitionActive88.8% F1
BC2GM NERBio NLP2007NERNearing228 moF1 ~90.9%
PDBbindDrug Discovery2004RegressionNearing264 moPearson R >0.86 (standard); <0.60 (CleanSplit)PDBbind CleanSplit, PLINDER
JNLPBA NERBio NLP2004NERSaturated204 moF1 ~79.6%
GAD REBio NLP2004Relation ExtractionSaturated204 moF1 ~85%
JNLPBABio NLP2004Named Entity RecognitionActive82.0% F1
DB5.5 (Docking Benchmark)Protein2003DockingActiveTop-10 success ~38%
CAPRIProtein2001Docking/InteractionActiveVaries
CB513Protein1999Structure PredictionActive0.763 Q8 Accuracy
CATHProtein1997ClassificationNearing348 moHigh accuracy (known folds)
CASP (Single-Chain)Protein1994Structural PredictionSaturated312 moGDT-TS 92.4~90 (experimental)CASP complexes/RNA tracks