Testing Better Ways for Scientists to Make Science Work Better
Reimagining Peer Review: Feasibility and Value of a Scientist-Designed, Metric-Driven Model
We, a group of immunologists, organized a study (BioRXIV ref ) to test a peer review model designed to improve how we help each other produce stronger, more reliable scientific work, including manuscripts, reports, and other discoveries. This effort was motivated by longstanding issues in the traditional publishing system, which often incentivizes researchers to overstate their results in pursuit of short-term personal gains rather than long-term scientific progress.
The Problem: Assessing what makes science “good” is not simple. It requires evaluating two distinct dimensions: Is the work rigorous? (Quality) and Does it matter? (Impact). Quality reflects how well a study’s conclusions are supported by rigorous, reproducible data. Impact reflects the extent to which the findings advance understanding, fill knowledge gaps, and guide future research. The current system often conflates these dimensions, with Impact frequently inferred from journal prestige rather than intrinsic merit. This misalignment favors novelty over rigor and undervalues replication and incremental contributions. To address this, we designed a review process that evaluates Quality and Impact separately using standardized metrics.
The Hypothesis: The Discovery Stack model will facilitate separate evaluation of a manuscript’s Quality and Impact using standardized, structured review. Additionally, it will foster a “peer-improvement” mindset, encouraging reviewers to provide clear, constructive, and actionable feedback that strengthens the rigor of the study.
The Discovery Stack Pilot: We tested the feasibility and effectiveness of a structured, multi-phase review model that first evaluates Quality, and then assess Impact, using standardized metrics (Fig. 1). Manuscripts enrolled in the pilot were simultaneously under traditional journal review, allowing for direct comparison between the two systems. A total of 162 reviews were completed, and survey data collected from 86 participants.
Key Finding: Reviewers Successfully Distinguished Quality from Impact
The Discovery Stack model effectively facilitated independent evaluation of Quality and Impact. Quality scores were more consistent across reviewers, whereas Impact scores showed greater variability, reflecting their more subjective nature (Fig. 2).
Key Finding: Strong Support for Core Elements of Discovery Stack
Authors and reviewers overwhelmingly supported the core elements of the Discovery Stack model, reported that it fostered a peer-improvement mindset, and expressed strong enthusiasm for its broader adoption to improve scientific publishing and peer review (Fig. 3).
Additional Observations: The study revealed widespread support for standardized metrics for both Quality and Impact, which could enable researchers to search and prioritize relevant research. Participants also strongly endorsed the use of an in-line annotation tool to improve clarity, collegiality, and efficiency. Additionally, reviewer identity disclosure was associated with greater perceived transparency and higher Impact scores, suggesting reviewers may be more likely to identify themselves when giving favorable evaluations. Finally, reviewer engagement relied on outreach, familiarity, and trainee participation.
Conclusion: A structured, multi-phase review system such as the Discovery Stack model has the potential to transform how we improve and validate the scientific rigor of novel findings. By separating Quality and Impact and embedding standardized metrics at the core of the process, this model lays the foundation for a more transparent, equitable, and evidence-driven system of peer review. Beyond improving how individual manuscripts are evaluated, the model strengthens the culture of peer review itself. Evaluators can receive targeted feedback, refine their approaches, and build meaningful reputations as contributors to scientific progress, while authors benefit from clearer, more actionable assessments. Most importantly, this framework provides the foundation for a platform that curates peer-reviewed, validated research at scale, accelerates the dissemination of reliable knowledge, elevates scientific rigor over branding, and focuses attention on what matters most: the Quality and Impact of the discovery itself.