AI Can Learn Scientific Taste

Jingqi Tong1,2,3,*, Mingzhe Li1,2,3,*, Hangcheng Li1,2,3,*, Yongzhuo Yang1,3,‡, Yurong Mou1,2,3,‡, Weijie Ma1,2, Zhiheng Xi1, Hongji Chen1,3, Xiaoran Liu1,2,3, Qinyuan Cheng1,2,3, Ming Zhang1, Qiguang Chen5, Qipeng Guo2, Tianlei Ying1,2, Tianxiang Sun2, Yining Zheng1,2,3, Xinchi Chen1,3,†, Jun Zhao1,†, Ning Ding4, Xuanjing Huang1, Yugang Jiang1, Xipeng Qiu1,2,3,†
1Fudan University, 2Shanghai Innovation Institute, 3OpenMOSS Team, 4Tsinghua University, 5Central South University
* Equal contribution. † Corresponding author. ‡ Core contributors.
TL;DR

We treat scientific taste as a learnable objective and show that
Reinforcement Learning from Community Feedback can train models
to judge and propose higher-impact scientific ideas.

Abstract

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most related research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored.

We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision and formulates scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers. For preference alignment, using Scientific Judge as a reward model, we train Scientific Thinker to propose research ideas with high potential impact.

Experiments show that Scientific Judge outperforms strong LLM baselines such as GPT-5.2 and Gemini 3 Pro, and generalizes to future-year test sets, unseen fields, and peer-review preference. Scientific Thinker further proposes research ideas with higher potential impact than strong baselines. Our findings show that AI can learn scientific taste, marking a key step toward human-level AI scientists.

๐ŸŒ Why Scientific Taste Matters

  • Scientific taste is more than execution. AI scientists increasingly help with literature search and experimentation, but choosing which ideas are worth pursuing remains a separate capability.
  • Community feedback provides supervision. In science, long-term community judgement is reflected in signals such as citations, which can be turned into matched preference data.
  • Taste can be modeled and aligned. Once preference signals are constructed, a model can learn to judge ideas and then be used as a reward model to improve idea generation itself.

Scientific Judge (small models) outperforms much larger baselines; Scientific Thinker achieves strong win rates.

โš™๏ธ RLCF Formulation

Pipeline

  • Construct community preference. We pair papers from the same field and publication period, then label the higher-citation paper as preferred.
  • Train Scientific Judge. A generative reward model reasons over a pair of papers and predicts which one is more likely to have higher impact.
  • Train Scientific Thinker. Using Scientific Judge as the reward model, a policy model learns to propose follow-up research ideas with higher potential impact through comparison-based GRPO.

The Core Thesis

Scientific taste is not mystical or purely subjective. Large-scale community feedback can be converted into matched preferences that enable models to learn scientific judgement and improve scientific ideation.

๐Ÿ† SciJudgeBench and Main Results

SciJudgeBench is built from 2.1M arXiv papers published through 2024, producing 696,758 field- and time-matched citation-based preference pairs. We evaluate across three settings:

  • In-domain scientific judgement : paired paper preference prediction across Computer Science, Mathematics, Physics, and Other fields.
  • Generalization settings : future-year papers, ICLR peer-review preference, and bioRxiv biology transfer.
  • Ideation evaluation : pairwise win-rate comparisons for Scientific Thinker against its base policy and strong proprietary models.

Leaderboard

๐Ÿ† Scientific Judge on SciJudgeBench and OOD Settings

We report pairwise accuracy with position-swap consistency on in-domain and out-of-domain evaluations.

Models In-Domain Future-Year ICLR Review bioRxiv
Base Models
Qwen3-4B-Instruct 60.3 68.3 65.3 56.9
Qwen3-30B-A3B-Instruct 66.3 71.6 76.8 45.0
Scientific Judge Models
SciJudge-Qwen3-4B 75.3 74.5 79.1 57.5
SciJudge-Qwen3-30B 80.6 78.2 87.7 71.2
Strong Baselines
GPT-5.2-Thinking 72.7 - - -
GLM-5 73.6 - - -
Gemini-3.0-Pro-Preview 75.7 - - -

๐Ÿ† Scientific Thinker Win Rates Against Strong Models

Models GPT-5.2-high GLM-5 Gemini 3 Pro Average
In-Domain
Qwen3-30B-A3B-Thinking 37.5 33.0 20.5 30.3
SciThinker-30B 61.0 58.5 43.0 54.2
Out-of-Domain
Qwen3-30B-A3B-Thinking 36.0 29.5 18.0 27.8
SciThinker-30B 59.0 61.0 42.5 54.2

๐Ÿงช Key Findings

Settings

  • Preference modeling: Scientific Judge is trained with GRPO on citation-based pairwise supervision and evaluated with position-swap consistency.
  • Preference alignment: Scientific Thinker is trained with comparison-based GRPO using Scientific Judge as the reward model.

Results

  1. Scientific judgement scales with both data and model size. Larger models and more preference data consistently improve in-domain performance.
  2. Learned judgement transfers across time. Scientific Judge generalizes to papers published after the training period.
  3. Learned judgement transfers across fields and metrics. Gains persist on bioRxiv biology papers and on ICLR peer-review preference.
  4. Scientific Thinker improves ideation quality. The trained policy model strongly outperforms its base policy on both in-domain and out-of-domain settings.
  5. AI can learn scientific taste. Together, the results suggest that scientific judgement and high-potential ideation can both be improved through RLCF.

Scaling trends for scientific judgement across model size and training data.

Scientific Thinker win rates after preference alignment.

Citation

@misc{tong2026ailearnscientifictaste,
    title={AI Can Learn Scientific Taste},
    author={Jingqi Tong and Mingzhe Li and Hangcheng Li and Yongzhuo Yang and Yurong Mou and Weijie Ma and Zhiheng Xi and Hongji Chen and Xiaoran Liu and Qinyuan Cheng and Ming Zhang and Qiguang Chen and Weifeng Ge and Qipeng Guo and Tianlei Ying and Tianxiang Sun and Yining Zheng and Xinchi Chen and Jun Zhao and Ning Ding and Xuanjing Huang and Yugang Jiang and Xipeng Qiu},
    year={2026},
    eprint={2603.14473},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2603.14473},
}