The art and science of Peer Review

Anyone reading, searching or mining the scholarly literature wants assurance that the information they are accessing is reliable – that the ‘facts’ can be trusted or at least flagged as contentious where there is disagreement (and appropriately discussed). Peer review, despite being a relatively recent term (of the 1970s), has come to be seen as a guarantor for reliability, even if that wasn’t the original intention of those who came up with the term (Moxham et al, 2018, Fyfe et al 2019).

Research has become so unreliable to some that the current situation has been dubbed a reproducibility crisis.

In the past decade, however, public trust in the reliability of the process has been eroding fast. It is not just the growing list of outright fraud and manipulation by authors, editors, reviewers, and publishers that is raising concerns (e.g. as detailed by Retraction Watch) but rather the direction that scholarship itself is taking (especially in biomedicine e.g. Jones and Wilsdon, 2018). Research has become so unreliable to some that the current situation has been dubbed a reproducibility crisis.

Whether you think this is justified or not, the larger question to address is whether it is even feasible for peer review – a process originally put in place when the culture and technology was so different – to provide a reliable filter. And, if so, how would we know it was working?

To me, there is something intrinsically valuable about feedback from your peers and other experts on the soundness, interest or impact of your work. 

It is apt therefore that the theme for this week’s focus on peer review is ‘what is quality in peer review’. Anyone who has worked behind the scenes of a journal knows that with the best will in the world peer review can be hit or miss. When it works, it’s exciting and hugely rewarding. I have been inspired by the incredible investment of time, thought, expertise and constructive dialogue among authors, editors and reviewers, where you end up confident that the conclusions are supported sufficiently and the paper is a genuine contribution to the field. Researchers care about the quality of their work and their contributions to a discipline. But I have also seen almost every possible case of malpractice, whether unconscious or deliberate. And there is no publisher or editor, no matter how selective their journals or platforms and regardless of whether they are subscription or open access, that cannot tell similar stories. 

I also know of no one, myself included, who would abandon peer review altogether. To me, there is something intrinsically valuable about feedback from your peers and other experts on the soundness, interest or impact of your work. 

Integrity in peer review

The problem, however, is that peer review is largely hidden behind closed doors. We all think we know what quality is when we see it but as the many posts on peer review this week have revealed, quality means different things to different people. 

Moreover, for those evaluating researchers at universities and funding organisations, ‘high quality’, ‘high impact’ and ‘excellence’, are words that are used almost interchangeably. But all these terms are slippery and ambiguous – part of a rhetoric that is used to signal value of some kind without specifying what that value is and to whom (Moore et al., 2017).

Such rhetoric has also become synonymous with location and the venue of publication rather than the intrinsic quality or merits of the output itself. Rankings of researchers based on their citations (h-index) or by their journal (impact factors) and those of Universities (e.g. The Higher Education’s World University Rankings) have become the norm. The quality of your work, and likely career progression, is taken as a given if it is done at the right location or published in the right venue.

This has created a hypercompetitive culture in which only very few can win and those that act differently are disadvantaged. Increasing accounts of harassment and bullying and a research environment that bears little relation to the diversity we have in society or the global nature of research are all testament to this. The perverse consequences of this vicious cycle on the culture, practice and outputs of scholarship, as well as on publishing, are justifiably fuelling the drive to find alternative ways to evaluate research, researchers and institutions with the aim of making scholarship and its communication more accountable and trustworthy (e.g. DORA and COKI). 

So if this isn’t what we mean by quality, what is? The Nuffield Council on Bioethics, published a report on the ethics and culture of research in 2014. In a survey they conducted as part of that project, the five most frequently used words researchers and other scholarly stakeholders used to describe high quality were rigorous, accurate, original, honest and transparent, of which rigorous was the most frequently quoted (Joynson and Leyser, 2015).

If you take these attributes as a starting point, to what extent can they be applied to peer review? Despite the increase in research on peer review (e.g. the focus of the Peer Review Congress), we don’t yet know. The evidence about the efficacy of peer review is still largely anecdotal (like mine above) or is based on surveys of opinions or on experiments conducted by publishers that are generally not rigorously designed and where the data are not available for independent scrutiny. We also have little evidence that peer review actually improves an article, let alone ensures it has any sort of reliability. 

What the Nuffield Report does show however, is that to the survey participants quality was as much about integrity of process as it was about the impact of any output. If we apply this criterion to peer review, we need to be looking at the process, practice and responsibilities of publishers, institutions (including funders), learned societies, academies and researchers in authority (for example PIs in the training or modelling they provide to their students), as well as those of authors, editors and reviewers. It should also include an assessment of whether peer review does what we think it is doing, and whether it is done ethically and inclusively.

Putting peer review to the test

We need to share failure as well as celebrate success so that others can adopt what works and ditch what doesn’t.

Peer review is just another method, within a discipline called scholarly communication. Like other research methods, we need to know what’s involved and explore when it works and when it doesn’t. We need to test it. We also need to make that research, alongside the relevant policies, practices and data, openly available for independent analysis and reuse. And we need to share failure as well as celebrate success so that others can adopt what works and ditch what doesn’t. This is asking no more than what we currently ask of researchers themselves.

The Nuffield Report included transparency as one aspect of quality, but I am not arguing that everything about peer review should be open. Full transparency is not inherently good, and neither can it represent truth nor be a substitute for accountability. There are limits to transparency, as Ananny and Crawford, 2016, have described and these need to be applied sensitively and intelligently to the development of a more open research culture, including that of peer review.

Transparency can be harmful, for example, if it… limits honest conversation.

Transparency can be harmful, for example, if it reveals patient identity or the locations of endangered species. But it can also be harmful if it limits honest conversation. Revealing the identity of peer reviewers represents one such tension if reviewers feel they then can’t be honest about their criticisms for fear of retaliation. These fears – whether perceived or real – and a culture of scholarship that engenders such fear have to be addressed as we start to really explore what’s possible and what works in peer review, given the tools and technology now available.

If peer review works at all, it should at least pick up any fundamental flaws or biases in a study. These criticisms should be objectively arrived at and indisputable . Much of peer review, however, is subjective – it is both an art and a science – one that relies on human interactions, insight, experience and expertise as well as common values around the purpose of peer review. The politics at play and an understanding of values and basic human relations, within the context in which they are acting, are therefore also important variables to be explored.

The Nuffield report also cites four other components of quality that emerged from their survey – collaboration, multidisciplinarity, openness and creativity. And peer review is one area of scholarly research where collaboration and the collective expertise of the arts, social sciences and humanities can creatively combine with those of the physical, biomedical and life sciences to fuel rigor and innovation. My hunch is that there are many more similarities than differences between disciplines (for example the need for replicability does not just apply to biomedicine [Peels and Bouter, 2018; Peels 2019]).

But we have yet to work out what effective peer review looks like.

Fortunately, there is an increasing awareness of the need for research about research, such as that done at the Metaresearch and Innovation Center at Stanford. There are also some projects, such as PeerE funded by the European Union, which has been researching the efficiency, transparency and accountability of peer review. Part of the project included a collaboration with publishers to share data on peer review (Squazzoni et al 2017). Importantly, this protocol includes information about the peer review of rejected as well as accepted papers, something that is largely missing from the research on peer review to date.

Outcomes from that collaboration have shown that more reputable authors are less likely to be rejected by editors on the basis of negative reviews (Bravo et al., 2019) but that publishing peer review reports does not significantly affect reviewer behaviour, if their identity is not revealed (Bravo et al. 2018). TRANSPOSE (Ross-Hellauer et al., 2018) and FAIRsharing are others initiatives aimed at making the policies of publishers about open peer review or data sharing more transparent.

At Hindawi, we have started to make our policies transparent (e.g. through the FairSharing and the Transpose projects); all of our published research papers include a data availability statement; we encourage authors to post preprints and to make their structured methods independently available for scrutiny through platforms such as protocols.io.  And, as a company we are building community-driven software for the submission and peer review process that is entirely open source, enabling users of our systems to benefit from shared knowledge and expertise.

We continue to look at how we can further improve our peer review processes in a way that benefits our authors and adds value to the research community as a whole. Look out for future announcements.



References

Ananny, Mike, and Kate Crawford. ‘Seeing without Knowing: Limitations of the Transparency Ideal and Its Application to Algorithmic Accountability’. New Media & Society, 13 December 2016, 1461444816676645. https://doi.org/10.1177/1461444816676645.

Bravo, Giangiacomo, Francisco Grimaldo, Emilia López-Iñesta, Bahar Mehmani, and Flaminio Squazzoni. ‘The Effect of Publishing Peer Review Reports on Referee Behavior in Five Scholarly Journals’. Nature Communications 10, no. 1 (18 January 2019): 1–8. https://doi.org/10.1038/s41467-018-08250-2.

Bravo, Giangiacomo, Mike Farjam, Francisco Grimaldo Moreno, Aliaksandr Birukou, and Flaminio Squazzoni. ‘Hidden Connections: Network Effects on Editorial Decisions in Four Computer Science Journals’. Journal of Informetrics 12, no. 1 (1 February 2018): 101–12. https://doi.org/10.1016/j.joi.2017.12.002.

Fyfe, Aileen, Flaminio Squazzoni, Didier Torny, and Pierpaolo Dondio. ‘Managing the Growth of Peer Review at the Royal Society Journals, 1865-1965’. Science, Technology, & Human Values, 15 July 2019, 0162243919862868. https://doi.org/10.1177/0162243919862868.

Jones, Richard, and James Wisdon. ‘The Biomedical Bubble: Why UK Research and Innovation Needs a Greater Diversity of Priorities, Politics, Places and People’. Nesta, July 2018. https://www.nesta.org.uk/report/biomedical-bubble/.

Joynson, Catherine, and Ottoline Leyser. ‘The Culture of Scientific Research’. F1000Research 4 (13 March 2015). https://doi.org/10.12688/f1000research.6163.1.

Moore, Samuel, Cameron Neylon, Martin Paul Eve, Daniel Paul O’Donnell, and Damian Pattinson. ‘“Excellence R Us”: University Research and the Fetishisation of Excellence’. Palgrave Communications 3 (19 January 2017): 16105. https://doi.org/10.1057/palcomms.2016.105.

Moxham, Noah, and Aileen Fyfe. ‘The Royal Society and the Prehistory of Peer Review, 1665–1965’. The Historical Journal 61, no. 4 (December 2018): 863–89. https://doi.org/10.1017/S0018246X17000334.

Peels, Rik, and Lex Bouter. ‘The Possibility and Desirability of Replication in the Humanities’. Palgrave Communications 4, no. 1 (7 August 2018): 1–4. https://doi.org/10.1057/s41599-018-0149-x.

Peels, Rik. ‘Replicability and Replication in the Humanities’. Research Integrity and Peer Review 4, no. 1 (9 January 2019): 2. https://doi.org/10.1186/s41073-018-0060-4.

Ross-Hellauer, Tony, Samantha Hindl, Gary McDowell, and Jessica Polka. ‘Guest Post: Help TRANSPOSE Bring Journal Policies into the Open’. The Scholarly Kitchen (blog), 1 November 2018. https://scholarlykitchen.sspnet.org/2018/11/01/guest-post-help-transpose-bring-journal-policies-into-the-open/.

Squazzoni, Flaminio, Francisco Grimaldo, and Ana Marušić. ‘Publishing: Journals Could Share Peer-Review Data’. Nature 546, no. 7658 (June 2017): 352–352. https://doi.org/10.1038/546352a.


Catriona MacCallum is Director of Open Science at Hindawi. This blog post is distributed under the Creative Commons Attribution License (CC-BY). The illustration is by Hindawi and is also CC-BY.