Monday 7 September 2015

Preprints in science

I've recently been having a discussion with a colleague on Twitter about preprints in science, and thought it would be good to open the discussion here. For those of you that don't know, preprints are where you publish your manuscript before it goes to peer review. The most common and well known means of doing this is by publishing a manuscript on the arXiv, where most physicists post their papers before submitting to a journal. Some journals don't allow this as they consider it to be previously published, but most permit this and in some cases even encourage it, as people can comment on it before going to peer review.

Despite it's prominence in physics, biology and other sciences (I'm thinking of palaeontology of course) have been particularly slow on the uptake of this, for reasons I've never fully understood, but my colleague shared these views and was not convinced by the preprint process. On one hand, I understand some of the hesitations he mentioned. Preprints are essentially the first submission to a journal, wrought with errors (mostly just typos, but sometimes some scientific errors as well), which could cause some miscommunications to the public or to other scientists when errors get propagated in the literature or media. Also, preprints are not peer reviewed. While I agree that peer review has it's issues, I do still believe that it has a place in academia and is important. Articles published in preprints have not been properly peer reviewed, which could lead to fringe ideas or studies that have not been properly scrutinised being read and cited. I understand both of these concerns, but I think that in the issue of preprints, they are not particularly valid for several reasons.

1. The arXiv has been working in physics for over 20 years, in fact longer than the World Wide Web. They've had a lot of time to make it work the way they want it to. Once an article is posted on the arXiv, that article has priority. This means that if two (or say 10) groups around the world are working on the same problem, whoever gets it onto the arXiv first gets priority and is recognised as being first, without having to wait for months for it to go through peer review. Of course there will be typos and errors, but the general theme of the paper and description of the experiment or study is still the same. If you're working in a field where other people are working on similar things, this is essential. Why should your work be held up and you be prevented from having priority just because the journal you submitted to is slower? Or a reviewer is away at a conference or on field work? Or a reviewer is one of the other people working on this and wants to slow you down, knowing they have a paper in review? It takes out those human aspects of the system that slow it down. Having gone through this before, I would much rather have had the uncorrected manuscript of my first paper published immediately, rather than waiting the excruciatingly stressful year from acceptance to publication (plus the 6 months before that that it was in review, etc.), waiting for someone else to publish something similar. Preprints allow you to get priority there and then.

2. We know that work in preprints is not peer-reviewed, or edited, and some people have concerns that any incorrect information will be propagated. First of all, we are scientists, we are not idiots. We know that preprints are not peer reviewed, and for this reason, when reading a preprint, you should not expect it to be perfect, and take that into account. When reading a preprint on the arXiv, readers are cautious. Papers from the arXiv can be cited, but physicists won't cite specific details or quote lines on a paper on the arXiv. They will, however, cite general ideas of the paper - "Smith et al. (2015) was the first to do an experiment using X, Y, and Z." - because regardless of peer review or not, they WERE first.

3. This point is related to the first two. I mentioned earlier that some journals (in physics) actually prefer and actively encourage publication on the arXiv before submitting to the journal. The reasons for this are simple - authors are essentially getting unsolicited peer review, and allowing for their paper to be publicly scrutinised long before publication. For a journal, this means that if they've received a paper to review, and find that it's already made it to the press, and been heavily scrutinised by the relevant parties in the world BEFORE even going to peer review, and all the talk is positive, it's a shoe-in for publication. Job done. Or, imagine getting a paper to peer review and finding that it has already been cited. Congrats, your paper gets a gold star. A recent example of this happened a few weeks ago when a big physics paper was posted on the arXiv. I won't go into details because I'm not a physicist (and probably most of you aren't either), but essentially it solved a massive problem in quantum photonics doing something that had never been managed before. It was such a big story, that within days of the preprint going up, it had been covered by New Scientist, Nature, PhysOrg, Science, etc., without even being officially peer-reviewed. And of course what did this mean? It meant that groups around the world who work on this field immediately bunkered down for the day, dissecting the paper and experiment, and coming to the conclusion that they had done it right. This paper is likely going to be published in Nature or something very similar, and it's already got a huge stamp of approval on it. We're talking Nature guys. It's not just small journals that take papers that have been on preprints - Nature actually prefers physics papers that have been on the arXiv. Less work for them!

4. Finally, for those interested in open access, this means that regardless of where your manuscript ends up in the end, your research is accessible to everyone. You don't need a university ID or have to pay for papers that are preprints. While you could publish your paper in Nature and have the joy of a big fancy Nature paper, anyone in the world can still read your research and see what you did from the preprint.

So what are the downsides? What are the concerns that people have?:

  • Someone could steal my work - well actually no, they couldn't, because the preprint would mean that you had priority. If it becomes a big thing in biology, everyone would see you posted it first. No one can take it from you
  • But what about all the mistakes? It's not peer reviewed - I think I've gone through this pretty well, but essentially, it's more about the big picture than the nitty gritty details. 
  • Why would a journal publish something already online? - because it gives them free, easy, and quick peer review, and a good idea of what the community thinks about the paper. 
  • How does it benefit me? Why not just wait for the final paper instead of an unformatted manuscript? - because this is fast. Peer review can be so slow. And is subject to people with agendas. Your paper shouldn't be delayed because someone doesn't like you or is doing something similar. You should still get priority if you did it first.
  • But how do I know what happened to the preprint afterwards? - on the arXiv, there is a system that authors can upload additional versions and updates. Once a paper is listed, you can upload additional versions when they come back from review, just as manuscript files. This way anyone can see exactly what has changed between versions throughout peer review, and then at the end, a link to the final published paper is given. It's an easy way to track exactly what they changed from one version to the next, so you know if it was a major experimental problem, or just some typos. 
  • If there are other concerns, please let me know!

This discussion actually started from my colleague being unhappy that his uncorrected, unformatted accepted manuscript had been posted by the journal, rather than waiting 2 weeks for the final to come out. Of course this is a bit different, because it had already been through peer review and was accepted, but I think that the general topic of speed is still relevant here. If you are 100% positive that no one in the world has looked at the same thing as you and won't come out with the same paper in the next 2 weeks, then that's fine. But imagine if someone else out there is working on it? Wouldn't you want it out there ASAP regardless of a few typos?

As far as I can see, there are no downsides to preprints, once the community accepts them. In biology, we are still woefully behind in this regard, that many journals do not accept manuscripts that have been put as preprints, and current nomenclature acts may prevent things like species being named in preprints. But this can change, and it is slowly starting to. PeerJ now offers a preprint server, which people are starting to submit to.

Please please please comment if you have anything related to this to discuss. I'm curious about what other people think about preprints. I know the view of physicists and some biologists, but I'm interested in other views. What do you think about preprints or uncorrected/unformatted proofs?

Thanks Josh - Obviously, I'm not a physicist, but my partner Josh is. Through years of him posting papers on the arXiv I've gained perhaps a better understanding of how it works than many non-physicists, and thanks to him for answering all my questions and pointing me to specific articles I would find useful in this discussion.

EDIT: Previously this post suggested that the journal Science did not allow for their papers to be published elsewhere as preprints. As you can see from the comments below, that was previously their policy, but is not anymore. Papers in Science can indeed appear as preprints elsewhere.