I was reviewing some originality reports generated by Turnitin for some first year undergraduate work today and it struck me that their database has grown to such an extent that I no longer trust it. I thought that this would never be the case. I assumed that as the database of harvested web pages and student papers grew, the accuraacy of detection would increase. Bigger is better, right? I didn’t think about noise though. There is some filtering in the Turnitin system, but increasingly it seems it isn’t good enough.
An example, one student has a list of 6 different sources all matching to small parts of student essays deposited as ‘reference material’ on courseworkinfo.com and related websites all owned byt he same company, and in likelihood sharing the same database. Do I believe that this student copied from this site? I’m not sure. Do I believe that each of the essays on the site may have originally been copied from wikipedia? more likely. Do I believe that all content on wikipedia is original? Not really.
Recycling and repurposing of text online is becoming so ubquituous that the noise is casuing a problem in the interpretation of originality reports. They used to save us time in investigating cases of plagiarism, now I’m not so sure.