References

Aczel, Balazs, Barnabas Szaszi, and Alex O Holcombe, “A billion-dollar donation: Estimating the cost of researchers’ time spent on peer review,” Research integrity and peer review, 6 (2021), 1–8 (Springer).

Asher, Samuel G. Z., Janet Malzahn, Jessica M. Persano, Elliot J. Paschal, Andrew C. W. Myers, and Andrew B. Hall, “Do claude code and codex p-hack? Sycophancy and statistical analysis in large language models,” 2026.

Asirvatham, Hemanth, Elliott Mokski, and Andrei Shleifer, “GPT as a measurement tool,” {NBER} Working Paper, 2026 (National Bureau of Economic Research).

Choi, Byungjin, Tae Joon Jun, Joung Won Sung, Il Woo Park, Jeong-Moo Lee, Soo Ick Cho, Hyung Jun Park, Ro Woon Lee, and Jungyo Suh, “Invisible text injection and peer review by AI models,” JAMA Network Open, 9 (2026), e2552099.

Elsevier, “Generative AI policies for journals,” <https://www.elsevier.com/about/policies-and-standards/generative-ai-policies-for-journals> (Feb. 19, 2026).

Hsu, Chao-Chun, and Chenhao Tan, “OpenAIReview: Open-source AI-assisted academic paper reviewing,” 2026.

IsItCredible.com, “Is it credible?” <https://www.isitcredible.com/> (Feb. 19, 2026).

Leung, Tiffany I., “LLMs in peer review—how publishing policies must advance,” JAMA Network Open, 9 (2026), e2552042.

QED Science, “QED science: Critical thinking AI for research,” 2026.

Rajakumar, Hamrish Kumar, Kailash Abhishek Sankaran, Manasi Pillai Ashok, and Srinivas Rachoori, “Peer review in the age of artificial intelligence: A comparative study of human and AI-generated review reports,” Postgraduate Medical Journal, (2026), qgag005.

Refine, “FAQ - refine,” <https://www.refine.ink/faq> (Feb. 19, 2026).

Spitzer, Markus Wolfgang Hermann, “The emerging submission crisis in behavioral science,” Trends in Neuroscience and Education, 42 (2026), 100276.

Thomas, Llewellyn D. W., Angelo Kenneth G. Romasanta, and Laia Pujol Priego, “Jagged competencies: Measuring the reliability of generative AI in academic research,” Journal of Business Research, 203 (2026), 115804.

Wang, Yuehan, Jinyan Huang, Lun Du, Yuxin Guo, Ying Liu, and Rong Wang, “Evaluating large language models as raters in large-scale writing assessments: A psychometric framework for reliability and validity,” Computers and Education: Artificial Intelligence, 9 (2025), 100481.

Zhang, Tianmai M, and Neil F Abernethy, “Reviewing scientific papers for critical problems with reasoning LLMs: Baseline approaches and automatic evaluation,” arXiv preprint arXiv:2505.23824, (2025).