Data manipulation – The Publication Plan for everyone interested in medical writing, the development of medical publications, and publication planning https://thepublicationplan.com A central online news resource for professionals involved in the development of medical publications and involved in publication planning and medical writing. Wed, 29 Oct 2025 15:38:20 +0000 en-US hourly 1 https://s0.wp.com/i/webclip.png Data manipulation – The Publication Plan for everyone interested in medical writing, the development of medical publications, and publication planning https://thepublicationplan.com 32 32 88258571 Safeguarding scientific image quality and integrity: what more can be done? https://thepublicationplan.com/2025/10/29/safeguarding-scientific-image-quality-and-integrity-what-more-can-be-done/ https://thepublicationplan.com/2025/10/29/safeguarding-scientific-image-quality-and-integrity-what-more-can-be-done/#respond Wed, 29 Oct 2025 15:38:19 +0000 https://thepublicationplan.com/?p=18377

KEY TAKEAWAYS

  • Scientific image editing serves a vital role in clear communication, but seeking presentation clarity must not compromise data integrity.
  • Combatting image manipulation requires systematic collaboration across the research ecosystem, including standardised guidelines and new verification technologies.

As concerns mount over image manipulation in scientific publishing, the research community has begun developing new strategies to balance visual clarity with data integrity. Writing in Nature, Sara Reardon explores the “fine line between clarifying and manipulating”, highlighting the challenge of making figures both accessible and faithful to original data.

The art and science of visual presentation

Scientific images often require editing for clarity, like adjusting brightness, adding scale bars, or enhancing contrast. While such modifications are essential for effective scientific communication, a 2021 study by Helena Jambor and colleagues revealed that poorly presented figures remain surprisingly common, suggesting researchers need better training in visual data presentation.

When enhancement becomes manipulation

The boundary between legitimate clarification and misconduct can be perilously thin. Science integrity consultant Elisabeth Bik warns that even minor edits – such as cloning image sections to cover dust particles – can undermine data credibility. Echoing a seminal 2004 article, Bik emphasises that “the images are the data”, meaning they should present the results actually observed rather than those the researchers expected. Any undisclosed alteration that changes the scientific message could constitute misconduct. As Reardon notes, the cardinal rule remains to “show your work” – enhancing clarity without obscuring underlying data.

“The boundary between legitimate clarification and misconduct can be perilously thin… the cardinal rule remains to ‘show your work’ – enhancing clarity without obscuring underlying data.”

Detection and prevention strategies

Phill Jones examines potential systemic solutions to what Bik calls science’s “nasty Photoshop problem” in The Scholarly Kitchen. Journals increasingly conduct pre-publication screening using image-integrity specialists or AI tools that have demonstrated substantial promise in identifying manipulated images. Guidelines such as those from the International Association of Scientific, Technical & Medical Publishers aim to standardise best practice, while individual journals are also establishing specific image integrity requirements. Beyond journals:

  • Institutions are urged to provide training and embed image integrity expectations into research culture.
  • Post-publication peer-review platforms also play a role in identifying problematic images after publication.

Looking ahead, technical innovations offer promise. Jones highlights developments such as encrypted hashes and digital ‘signatures’ embedded in images, akin to secure web certificates, that could enable reliable verification of image authenticity. Ongoing collaboration and systematic change across the research ecosystem will be required to ensure scientific images are both clear and credible.

—————————————————

Are current image integrity detection tools sufficient to prevent manipulation in scientific publishing?

]]>
https://thepublicationplan.com/2025/10/29/safeguarding-scientific-image-quality-and-integrity-what-more-can-be-done/feed/ 0 18377
Will a cross-publisher integrity hub aid the battle against fake research? https://thepublicationplan.com/2023/01/27/will-a-cross-publisher-integrity-hub-aid-the-battle-against-fake-research/ https://thepublicationplan.com/2023/01/27/will-a-cross-publisher-integrity-hub-aid-the-battle-against-fake-research/#respond Fri, 27 Jan 2023 14:06:37 +0000 https://thepublicationplan.com/?p=12979

KEY TAKEAWAYS

  • Publishers and analytics providers are collaborating with the International Association of Scientific, Technical, and Medical Publishers in the development of an online integrity hub.
  • Online tools within the hub will scan manuscripts for image alterations and indicators of paper mill submissions.

Falsified research from paper mills – companies that generate manuscripts based on fabricated data – has led to an increased number of retractions from journals, and is a growing challenge for publishers. In a recent Nature News article, Holly Else reported that new software solutions are now being tested that may detect paper mill activity and image manipulation in submitted manuscripts.

The International Association of Scientific, Technical and Medical Publishers (STM), in a joint effort with publishers and scholarly analytics providers, are developing common standards for software tools, which will form part of an STM Integrity Hub. The hub will contain three online tools to detect the following publication ethics violations:

  • submissions from paper mills, based on ~70 indicators
  • duplicate submission (manuscript submission to multiple publishers)
  • image manipulation (potentially fabricated figures).

The Nature News article outlined how large publishers such as Elsevier, Taylor & Francis, and Frontiers are currently testing two of these tools, to help address these important issues.

“The problem is significant not just because of volume, but also because there are different types of paper mill, and they are all highly adaptive.”
– Sabina Alam, Director of Publishing Ethics and Integrity, Taylor & Francis, UK

It is hoped that the first two screening tools will be more widely available early in 2023. To complement the availability of this new technology, STM and the Committee on Publication Ethics (COPE) also plan to issue guidance on handling research integrity breaches, thus further empowering publishers in their fight against fake science.

—————————————————–

What do you think – will standardised tools help combat fake or duplicated manuscript submissions?

]]>
https://thepublicationplan.com/2023/01/27/will-a-cross-publisher-integrity-hub-aid-the-battle-against-fake-research/feed/ 0 12979
Spotting fake images in scientific research: insights from science integrity consultant Elisabeth Bik https://thepublicationplan.com/2022/11/29/spotting-fake-images-in-scientific-research-insights-from-science-integrity-consultant-elisabeth-bik/ https://thepublicationplan.com/2022/11/29/spotting-fake-images-in-scientific-research-insights-from-science-integrity-consultant-elisabeth-bik/#respond Tue, 29 Nov 2022 10:04:33 +0000 https://thepublicationplan.com/?p=12667

Many of us will be familiar with the concept of plagiarised text as a form of misconduct within scientific literature, but perhaps a lesser-known problem, and one which most of us would find much harder to spot, is the publication of manipulated images. Elisabeth Bik is a science integrity consultant who has been described as a super-spotter or image sleuth due to her unique talent for identifying scientific photos that have been tampered with. Elisabeth strives to tackle the issue of scientific misconduct and has a blog dedicated to the topic of science integrity. To date, her scientific detective skills have led to 951 retractions, 122 expressions of concern, and 956 corrections. The Publication Plan spoke to Elisabeth to find out more about her work.

Could you tell us how and why you became involved in investigating fraudulent scientific work and how you discovered your talent for spotting duplicated/manipulated images?

“In 2013 I heard about plagiarism so I took a sentence that I had written and put in into Google Scholar to see if anybody had used my text. I had not expected any results, but by chance the sentence that I had picked randomly, had been stolen by somebody else, so I found a paper that had plagiarised my text, and that of many others. I subsequently kept on finding more and more papers that had plagiarised other people’s work. I worked on that for about a year whilst I was working full-time at Stanford, so it was a kind of weekend project. Then in around 2014 I came across a PhD thesis, not one that had stolen my work but one that had plagiarized text, and one that also contained images – western blots. A couple of the figures had panels that had been reused, so the same panel had been used to represent different experiments. The panel had a very distinctive shape and so I realised that I had some talent for spotting these things, and started searching for other papers with similar image issues.”

What do you look for when analysing images, and what are the most common issues you encounter?

I look for photos specifically because they contain a lot of information, much more than a line graph”.

“I look for photos specifically because they contain a lot of information, much more than a line graph. A line graph could be duplicated but it is very hard to remember, as it’s just a line. Whereas there are features in photos that you can remember at least for a short period, so I compare photos within scientific papers. Because I mainly focus on photos of blots or gels, or microscopy photos of tissues and cells, those are typically the types of images where I find issues, but sometimes I work on photos of plants or mice, visible objects that don’t require a microscope. Occasionally I will find a plot that has been duplicated but as I said plots are hard to find so I don’t focus on those. I look for duplications. There are three main duplication problems: two panels that have been duplicated; two panels that have been duplicated and shifted so that they sort of overlap; and duplication of elements within a photo, for example a group of cells might be visible multiple times. Occasionally I will also find evidence suggestive of tampering with a photo, for example you might see a different background around one particular band in a gel, which indicates that it did not originate from that photo. This example is not a duplication but a sign of potential tampering – that parts of the photo came from somewhere else.”

How common and widespread is the problem of duplicated/manipulated images within the scientific literature and what are the potential consequences of such images going unidentified?

“Duplications are found in around 4% of papers that contain at least one photo. This finding is based on a systematic search I performed for papers that contain the term ‘western blot’ to enrich for papers with molecular biology photos or other figures. In the resulting set of papers, I scanned 20,000, and I found around 800 to contain duplications, so that’s 4% of papers. Those contained one of the three types of duplication I listed, which could result from an honest error or could have been intentionally duplicated with an intention to mislead the reader. The first case, an honest error in a photo, is usually not a big problem. In my opinion it should be corrected, but we all make errors in papers, and so that’s the least concerning. But when images are duplicated with overlaps, or are rotated or stretched, or contain duplicated elements within the same photo, that’s clearly a manipulation of the data. To me those are visible signs of manipulation which cast doubt over all the data in that paper, because if one image has been potentially tampered with or manipulated then so might have other types of data, which are much harder to catch. For example, you cannot really see if values in a table have been fabricated or manipulated so it makes the whole paper less reliable and maybe also other works by those same authors. In some cases, images are manipulated to make the data look better. If a photo contains duplicated elements, then you can’t even be sure that the experiment happened and what the results were. Duplications within the same photo are very suggestive of an intention to mislead and that the results were not obtained as they have been presented. Such fraud in my opinion goes against everything that science should be – science should be about finding the truth and fraud is the opposite of that.”

“Fraud in my opinion goes against everything that science should be – science should be about finding the truth and fraud is the opposite of that.”

What proportion of questionable images do you think could result from honest error and how many are likely to be deliberate acts of misconduct?

“In the study I referred to previously, where I found 800 of 20,000 papers to contain duplicated figures, we estimated that about half of the duplications were deliberate. It is sometimes difficult to know whether a duplication is deliberate in an individual paper, but because we had 800, that was our best guess. It was based on there being roughly an equal distribution of papers over the three duplication categories, so 30% in each category. Since overlapping images could result from honest error, we estimated that about half of the 800 papers had deliberately duplicated or manipulated photos, so 2% of papers overall. Of course the real percentage of manipulation might be much higher because at least photos leave traces if you manipulate them, but as I said, manipulation in other types of data, such as tables or line graphs is much harder to detect so the real percentage of papers with misconduct might be much higher than 2%.”

What systems do journals have in place, if any, to identify problematic images before publication and what are the limitations of these systems?

“Some journals scan all incoming papers for image duplications and others have traditionally hired people like me who can spot these duplications, to scan all their accepted papers for image problems. This might only take a couple of minutes per paper so it’s really not a huge time investment if you know what to look for. After I raised my concerns about 4% of papers having image problems, some other journals upped their game and have hired people to look for these things. This is still mainly being done I believe by humans, but there is now software on the market that is being tested by some publishers to screen all incoming manuscripts. The software will search for duplications but can also search for duplicated elements of photos against a database of many papers, so it’s not just screening within a paper or across two papers or so, but it is working with a database to potentially find many more examples of duplications. I believe one of the software packages that is being tested is Proofig. I have never worked with this software so I don’t know exactly what it does or how good it is, but I would love to test it. Although there have been situations where an editor has informed me that Proofig didn’t find any evidence of a duplication or any evidence of tampering with an image in which I can clearly see a problem. So I think there is a danger if an editor doesn’t really know how to use the software or just blindly relies on the software’s verdict.”

What kind of response do you tend to get from journal editors when you report a potential issue in one of the papers they have published? Your work has resulted in numerous retractions and corrections – is that a common result when you notify a journal of an issue?

“In the past no response was common – I would just not hear anything. Nowadays I specifically write in my email that I keep track of which journals respond to my message, so I usually receive a notification or acknowledgement of receipt or something like that, but then very often I still hear nothing. I reported that initial set of 800 papers in which I found problems to the journals in roughly 2015, and kept track of what happened – two-thirds of those papers have not been retracted after 5 years, some are still being retracted so the number is steadily going down, but around 60% of papers have not been addressed. For the more current papers that I’ve reported, that number is slightly better with half not being addressed after waiting a year or two, but the majority are still not addressed. I get an acknowledgement of receipt but then it seems that nothing happens. When an issue is addressed, the two most common outcomes are a correction or a retraction, which each account for roughly half of cases. There is also a tool called expression of concern, which is very rarely used but I feel should be used more because it provides a very fast way for an editor to flag that they have been alerted to a big problem with the paper and are investigating it, so readers know to proceed with caution if they read that paper. As mentioned, corrections and retractions are the most common outcomes but they are only used in about 40 to 50% of cases – for the majority there is still no outcome after waiting a couple of years.

“Corrections and retractions are the most common outcomes but they are only used in about 40 to 50% of cases – for the majority there is still no outcome after waiting a couple of years.”

But I do feel that the situation is improving, maybe my work has finally earnt some acknowledgement that I’m signalling for positive reasons, not out of malice. In the past I have felt I’ve been ignored a little bit more and I go to social media sometimes too to vent about the lack of response from journals, which I feel has helped so the numbers are getting better but I feel that journals can still do a much better job.”

How important do you think websites such as PubPeer, Retraction Watch and your own blog, Science Integrity Digest, are in creating transparency and raising awareness of possible flawed research? Does the creation of such sites indicate an increasing problem or a greater awareness of the need to check the integrity of science?

“I don’t want to talk about my own blog too much, but I do feel that PubPeer and Retraction Watch have played a huge role in openness about problems in papers. There is no other good website where you can report problems. You may try writing privately to a journal, or sometimes there are comments sections in journals, but very often these comments disappear after a while or they never come out of moderation. I feel PubPeer does a really good job in alerting people that there might be a problem with a paper and it’s the only platform that I know of that we can use. Retraction Watch offers a glimpse of what happens once a paper gets retracted because they provide the background to a retraction. In many cases a retraction notice is very vague, simply stating that the authors or editors decided to retract the paper because of a problem without indicating what the problem was, which is not fair for the reader because parts of the paper may still be good. We want to know why the paper was retracted and what the specific problem was. Retraction Watch go into a little bit more detail, they interview people – the scientists, the authors, the editors – and ask them for their side of the story. Sometimes you learn that a retraction was actually a very good thing because an author found, for example a big problem with their paper due to a mistake in a formula, so they did the right thing in retracting their own paper. To hear people talk about why they retracted a paper is very useful and gives you a lot more information. I feel both Retraction Watch and PubPeer create transparency as a lot of these cases are otherwise hidden by the journals or institutions.

As to whether it is an increasing problem, I do believe it is for several reasons. First, papers are getting more and more complex, which provides more opportunities to fake data. Digital photography also means it is much easier to digitally alter a photo than it used to be – when I did my PhD you would still bring your gel to the photographer, there was no digital photography and subsequent Photoshopping.  Another reason is the increasing pressure to publish. Certain countries have really increased their pressure to publish and made it mandatory to publish for example, a paper when you finish your Master’s degree or to publish multiple papers when you finish your PhD, or in medical school you need to publish a paper to get a promotion. China in particular has issued a lot of these mandatory publication demands. In some cases they are impossible to fulfil as people do not have the time to do the research, but of course they still want to get a promotion or a position at a hospital so they might just buy a paper. Therefore, there is this whole growing market of papermills, which are companies that mass produce papers. There are different models but they basically sell fake papers to authors who need them, which was not a problem that existed 20 years ago. If you look at papers from 30 years ago I’m sure there was fraud but those papers usually only contained one figure and one table, so there were fewer opportunities to commit fraud compared with papers today that have 6 to 8 figures and additional supplementary figures. Although I feel that this is an increasing problem, I believe that there is also a greater awareness of the issue”

What more could be done to improve research integrity within the scientific literature? How do you think the research integrity landscape will have changed in 5 years?

 

“I hope there is more emphasis on reproducibility in the future because I feel reproducibility is the only way for us to know that an experiment has really been performed and yielded the reported results.”

“I hope there is more emphasis on reproducibility in the future because I feel reproducibility is the only way for us to know that an experiment has really been performed and yielded the reported results. I hope we have less emphasis on output – measuring a scientist’s output by measuring numbers of papers or impact factor – to remove some of that pressure and instead reward reproducibility. Reproducing a study may not be novel and of course there is not a lot of funding for it, but I feel it gives so much more validity to a study than trying to do something new. Pre-registration of clinical trials is a wonderful thing as it requires people to publish their results even if they are negative, which I feel might result in less cheating. I’m also very worried about artificial intelligence (AI) and its potential to create fake papers and images. We’ve seen several examples of what technology can do right now, if you think about dinosaurs in movies, they look more and more real every year, so I think in the next 5 years AI is going to be a huge problem for scientific publishing, because it might generate fake photos, data and text. Distinguishing what is real and what is fake, which may be impossible in 5 years from now, will be a problem for journalists too. We need to think about how we can prove that images, photos or other data are real. The obvious errors that we currently use to determine that a paper is probably faked can be overcome by a very smart fraudster – they can make their images look very realistic and AI is going to help them tremendously, so I’m very worried about that. I’m not quite sure if we can safeguard the integrity of science with the ever-increasing amount of pressure that we put on scientists and the advantages that digital photography and AI can offer fraudsters and so I’m a bit pessimistic there, but I hope we have more funding to look into solutions, technical solutions for that. Some of that is solvable – we can maybe look at original images, and ways of proving that they really came from a microscope for example, and were not generated by AI. I’m not quite sure how, that goes beyond my technical comprehension of the issue, but there are hopefully ways to solve that.”

Elisabeth Bik is a science integrity consultant. You can contact Elisabeth via LinkedIn.

—————————————————–

What do you think should be done to combat the issue of fraudulent images?

]]>
https://thepublicationplan.com/2022/11/29/spotting-fake-images-in-scientific-research-insights-from-science-integrity-consultant-elisabeth-bik/feed/ 0 12667
How to assess the credibility of clinical trial findings? https://thepublicationplan.com/2022/04/07/how-to-assess-the-credibility-of-clinical-trial-findings/ https://thepublicationplan.com/2022/04/07/how-to-assess-the-credibility-of-clinical-trial-findings/#respond Thu, 07 Apr 2022 12:25:21 +0000 https://thepublicationplan.com/?p=11079

KEY TAKEAWAYS

  • Cochrane provides recommendations on how to assess the trustworthiness of clinical trial data and tackle suspected misconduct.
  • Prof. Lisa Bero calls on journals, publishers, and research institutions to introduce routine data checks on manuscripts to help detect fraud.

Accurate reporting of clinical trials is essential to further our understanding of diseases and evaluate the efficacy and safety of treatments. In a recent World View article, published in Nature, Professor Lisa Bero discusses the approaches used by Cochrane reviewers to assess the trustworthiness of clinical trial findings and tackle suspected data accuracy issues.

Prof. Bero believes that fraudulent studies are widespread in the scientific literature. The fraud may not be intentional and can be linked to:

  • inaccurate description of how interventions were administered
  • use of inappropriate statistical analyses
  • reporting fabricated data
  • false representation of real data.

Cochrane provides tools to help their reviewers detect potential fraud, and templates for asking journals for investigations and retractions. The checks that the reviewers are encouraged to undertake to identify problematic studies include:

  • looking for evidence of prospective clinical trial registration and ethical approval
  • considering the plausibility of baseline and outcome data
  • watching out for overlapping text and other inconsistencies across the article
  • consulting platforms for post-publication peer review, such as PubPeer.

When reviewers find a problem, they are advised to request additional information from the authors and if the response is not satisfactorily reassuring, contact the journal editor. The journal can then launch its own investigation to decide whether the article should be retracted.

Detecting and removing fake studies from the literature requires coordinated efforts from all parties involved in the publication pipeline.

Prof. Bero does not agree with calls from some reviewers to exclude studies from certain countries or those that have not been prospectively registered. She points out that such measures would reduce global patient representation, and that trial registration does not ensure proper study conduct. Furthermore, prospective registration is uncommon for observational studies.

Prof. Bero emphasises that while the risks of mislabelling legitimate research as fraudulent cannot be ignored, detecting and removing fake studies from the literature is important and requires coordinated efforts from all parties involved in the publication pipeline. The article concludes with a call for research institutions, journals, and publishers to implement routine data checks on manuscripts and share information and technical resources to help identify anomalies.

—————————————————–

What do you think – should research institutions, journals, and publishers implement routine fraud-detection checks on manuscripts?

]]>
https://thepublicationplan.com/2022/04/07/how-to-assess-the-credibility-of-clinical-trial-findings/feed/ 0 11079
The elimination of negative study results: reporting and citation bias https://thepublicationplan.com/2018/10/18/the-elimination-of-negative-study-results-reporting-and-citation-bias/ https://thepublicationplan.com/2018/10/18/the-elimination-of-negative-study-results-reporting-and-citation-bias/#respond Thu, 18 Oct 2018 08:42:09 +0000 https://thepublicationplan.com/?p=5366 Reporting and citation bias.jpgAccording to a recent study published in Psychological Medicine, reporting and citation biases are eliminating negative study results from the scientific literature. The authors of the research assembled information on 105 trials of antidepressants that had been registered with the FDA and identified the cumulative effect of four specific publishing biases:

  • Study publication bias: The results of half of the trials examined were considered positive by the FDA and 98% of these were published, while the other half were considered negative or questionable and only 48% of these were published.
  • Outcome reporting bias: The authors considered that 10 of the published negative trials were reported as ‘positive’ within the publication, by switching the status of the primary and secondary outcomes or failing to include less favourable data.
  • Spin: Eleven of the remaining published negative trials were written using language the authors felt made the negative results appear positive, such as ‘a trend for efficacy’.
  • Citation bias: Positive trials and those including spin in the abstract were cited more frequently than negative trials.

Interestingly, the authors noted that all of the negative trials that remained unpublished were completed before 2004. Therefore, the more recent requirement for trials to be prospectively registered and to report results following completion may be helping to prevent study publication bias. However, a review conducted by Aaron Carroll in The New York Times delved further into the issue, highlighting the serious effects of various cumulative biases, including how they can skew the results of meta-analyses, a tool critical for evidence-based decision-making. Carrol calls for the scientific community to encourage journals to publish negative results and asks us to ”celebrate and elevate negative results, in both our arguments and reporting, as we do positive ones”.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

——————————————————–

Summary by Louise Niven, DPhil from Aspire Scientific

——————————————————–

With thanks to our sponsors, Aspire Scientific Ltd and NetworkPharma Ltd


]]>
https://thepublicationplan.com/2018/10/18/the-elimination-of-negative-study-results-reporting-and-citation-bias/feed/ 0 5366
An algorithm to police figure duplication? https://thepublicationplan.com/2018/04/05/an-algorithm-to-police-figure-duplication/ https://thepublicationplan.com/2018/04/05/an-algorithm-to-police-figure-duplication/#respond Thu, 05 Apr 2018 09:02:55 +0000 https://thepublicationplan.com/?p=4966 Screening for figure duplication using machine learningInappropriate figure duplication in publications is a surprisingly prevalent form of scientific misconduct. Perhaps the most infamous example in recent years was the fraudulent duplication of figure regions in the ‘STAP (stimulus-triggered acquisition of pluripotency)’ cell paper, a story that made headlines worldwide and contributed to the retraction of the paper in question. But how can such malpractice be effectively policed? Some journals manually screen images in submitted manuscripts — a laborious and time-consuming task. However, this process could potentially be automated.

A new study, published on the bioRxiv pre-print server, uses an algorithm to seek out duplication of figure regions, even after manipulation. The authors, Acuda et al, analysed 2 million figures from 760,000 open-access articles. Potential instances of duplication that were identified by the algorithm and machine learning were then reviewed by an author panel. The authors estimated that 9% of figure duplication was ‘suspicious’, while 0.6% could be considered fraudulent. Crucially, nearly half (43%) of inappropriate figure re-use occurred across articles.

Such technology could offer a more streamlined, rapid and accurate approach to figure screening by journals and aid scientific integrity. Publishers, however, would need to ensure a unified approach to successfully eliminate figure duplication across the literature.

——————————————————–

Summary by Emma Prest PhD from Aspire Scientific


]]>
https://thepublicationplan.com/2018/04/05/an-algorithm-to-police-figure-duplication/feed/ 0 4966
Focusing on big data: is seeing believing? https://thepublicationplan.com/2018/04/03/focusing-on-big-data-is-seeing-believing/ https://thepublicationplan.com/2018/04/03/focusing-on-big-data-is-seeing-believing/#respond Tue, 03 Apr 2018 16:58:19 +0000 https://thepublicationplan.com/?p=4957 Focusing on big data transparency.jpg

Both honest mistakes and the deliberate manipulation of data can affect the quality of published research. In a recent Forbes article, Kalev Leetaru delves into how bad data practice impacts scientific publishing.

The research and publishing communities prioritise new discoveries, but this can be at the expense of full data documentation and validation. Leetaru suggests that this is particularly the case in the age of ‘big data’, where large datasets can be misunderstood in the race to a breakthrough. He classifies the current status of bad data practice under five broad themes and suggests possible solutions:

  • Honest statistical/computing error. Even a simple calculation error in a spreadsheet can drastically alter the understanding of a particular dataset. Statistical review processes may identify such errors, but only full disclosure of raw data, software and workflows can ensure they become known.
  • Honest misunderstanding of data. This can include a failure to understand the limitations of particular data sources, such as solely utilising English language Western-origin news sources to study global trends. The conclusions being drawn from such data may be statistically sound, yet largely irrelevant to the question being posed.
  • Honest misapplication of methods. Powerful statistical and analytical software packages may be freely available, but if used by researchers unfamiliar with their applications and limitations, the output may be unreliable. Only full documentation of the specific tools, algorithms and parameters can allow such errors to be identified.
  • Honest failure to normalise. This can be an issue in media analyses; for example, reporting changes in the number of news articles published on a specific topic over time is meaningless without also reporting changes in the total number of published articles over the same time period.
  • Malicious manipulation. Image doctoring and deliberate data falsification of are two particularly egregious examples of alleged data fraud, and highlight the need for journals to be more vigilant for the possibility of manipulated data.

Leetaru notes that errors can also propagate through scientific publishing as authors copy and paste incorrect information from one paper into another. As poor data practice can result in the broad acceptance of questionable conclusions as fact, Leetaru appeals to journals to take action and adopt dedicated data review processes to eliminate these (mostly unintentional) errors.

——————————————————–

Summary by Julia Draper, DPhil

Julia Draper is a biomedical researcher and freelance writer. Her postdoctoral research background is in leukaemia biology and developmental haematopoiesis. Julia is open to being contacted regarding career opportunities in medical communications at julia.draper@gmail.com.


]]>
https://thepublicationplan.com/2018/04/03/focusing-on-big-data-is-seeing-believing/feed/ 0 4957