AI Can Learn Scientific Taste

Jingqi Tong^1,2,3,*, Mingzhe Li^1,2,3,*, Hangcheng Li^1,2,3,*, Yongzhuo Yang^1,3,‡, Yurong Mou^1,2,3,‡, Weijie Ma^1,2, Zhiheng Xi¹, Hongji Chen^1,3, Xiaoran Liu^1,2,3, Qinyuan Cheng^1,2,3, Ming Zhang¹, Qiguang Chen⁵, Qipeng Guo², Tianlei Ying^1,2, Tianxiang Sun², Yining Zheng^1,2,3, Xinchi Chen^1,3,†, Jun Zhao^1,†, Ning Ding⁴, Xuanjing Huang¹, Yugang Jiang¹, Xipeng Qiu^1,2,3,†

¹Fudan University, ²Shanghai Innovation Institute, ³OpenMOSS Team, ⁴Tsinghua University, ⁵Central South University

* Equal contribution. † Corresponding author. ‡ Core contributors.

Paper Code

Models & Datasets

Daily Paper Demo

TL;DR

We treat scientific taste as a learnable objective and show that
Reinforcement Learning from Community Feedback can train models
to judge and propose higher-impact scientific ideas.

Abstract

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most related research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored.

We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision and formulates scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers. For preference alignment, using Scientific Judge as a reward model, we train Scientific Thinker to propose research ideas with high potential impact.

Experiments show that Scientific Judge outperforms strong LLM baselines such as GPT-5.2 and Gemini 3 Pro, and generalizes to future-year test sets, unseen fields, and peer-review preference. Scientific Thinker further proposes research ideas with higher potential impact than strong baselines. Our findings show that AI can learn scientific taste, marking a key step toward human-level AI scientists.

🌏 Why Scientific Taste Matters

Scientific taste is more than execution. AI scientists increasingly help with literature search and experimentation, but choosing which ideas are worth pursuing remains a separate capability.
Community feedback provides supervision. In science, long-term community judgement is reflected in signals such as citations, which can be turned into matched preference data.
Taste can be modeled and aligned. Once preference signals are constructed, a model can learn to judge ideas and then be used as a reward model to improve idea generation itself.

Scientific Judge (small models) outperforms much larger baselines; Scientific Thinker achieves strong win rates.

⚙️ RLCF Formulation

Pipeline

Construct community preference. We pair papers from the same field and publication period, then label the higher-citation paper as preferred.
Train Scientific Judge. A generative reward model reasons over a pair of papers and predicts which one is more likely to have higher impact.
Train Scientific Thinker. Using Scientific Judge as the reward model, a policy model learns to propose follow-up research ideas with higher potential impact through comparison-based GRPO.

The Core Thesis

Scientific taste is not mystical or purely subjective. Large-scale community feedback can be converted into matched preferences that enable models to learn scientific judgement and improve scientific ideation.

🏆 SciJudgeBench and Main Results

SciJudgeBench is built from 2.1M arXiv papers published through 2024, producing 696,758 field- and time-matched citation-based preference pairs. We evaluate across three settings:

In-domain scientific judgement : paired paper preference prediction across Computer Science, Mathematics, Physics, and Other fields.
Generalization settings : future-year papers, ICLR peer-review preference, and bioRxiv biology transfer.
Ideation evaluation : pairwise win-rate comparisons for Scientific Thinker against its base policy and strong proprietary models.

Leaderboard

🏆 Scientific Judge on SciJudgeBench and OOD Settings

We report pairwise accuracy with position-swap consistency on in-domain and out-of-domain evaluations.

Models	In-Domain	Future-Year	ICLR Review	bioRxiv
Base Models
Qwen3-4B-Instruct	60.3	68.3	65.3	56.9
Qwen3-30B-A3B-Instruct	66.3	71.6	76.8	45.0
Scientific Judge Models
SciJudge-Qwen3-4B	75.3	74.5	79.1	57.5
SciJudge-Qwen3-30B	80.6	78.2	87.7	71.2
Strong Baselines
GPT-5.2-Thinking	72.7	-	-	-
GLM-5	73.6	-	-	-
Gemini-3.0-Pro-Preview	75.7	-	-	-

🏆 Scientific Thinker Win Rates Against Strong Models

Models	GPT-5.2-high	GLM-5	Gemini 3 Pro	Average
In-Domain
Qwen3-30B-A3B-Thinking	37.5	33.0	20.5	30.3
SciThinker-30B	61.0	58.5	43.0	54.2
Out-of-Domain
Qwen3-30B-A3B-Thinking	36.0	29.5	18.0	27.8
SciThinker-30B	59.0	61.0	42.5	54.2

🧪 Key Findings

Settings

Preference modeling: Scientific Judge is trained with GRPO on citation-based pairwise supervision and evaluated with position-swap consistency.
Preference alignment: Scientific Thinker is trained with comparison-based GRPO using Scientific Judge as the reward model.

Results

Scientific judgement scales with both data and model size. Larger models and more preference data consistently improve in-domain performance.
Learned judgement transfers across time. Scientific Judge generalizes to papers published after the training period.
Learned judgement transfers across fields and metrics. Gains persist on bioRxiv biology papers and on ICLR peer-review preference.
Scientific Thinker improves ideation quality. The trained policy model strongly outperforms its base policy on both in-domain and out-of-domain settings.
AI can learn scientific taste. Together, the results suggest that scientific judgement and high-potential ideation can both be improved through RLCF.

Scaling trends for scientific judgement across model size and training data.

Scientific Thinker win rates after preference alignment.

Citation

@misc{tong2026ailearnscientifictaste,
    title={AI Can Learn Scientific Taste},
    author={Jingqi Tong and Mingzhe Li and Hangcheng Li and Yongzhuo Yang and Yurong Mou and Weijie Ma and Zhiheng Xi and Hongji Chen and Xiaoran Liu and Qinyuan Cheng and Ming Zhang and Qiguang Chen and Weifeng Ge and Qipeng Guo and Tianlei Ying and Tianxiang Sun and Yining Zheng and Xinchi Chen and Jun Zhao and Ning Ding and Xuanjing Huang and Yugang Jiang and Xipeng Qiu},
    year={2026},
    eprint={2603.14473},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2603.14473},
}