Home / Chroniques / Cheating or chatting: is ChatGPT a threat to education?
Chatgpt writing assistance
π Digital π Society

Cheating or chatting : is ChatGPT a threat to education ?

GRIMAUD_Julien
Julien Grimaud
Assistant Professor of Life Sciences at Sup’Biotech
DEBELJAK_Pavla
Pavla Debeljak
Assistant Professor of Bioinformatics at Sup'Biotech
YATES_Frank
Frank Yates
Director of Research at Sup’Biotech Engineering School
Key takeaways
  • ChatGPT is a chatbot, i.e. a computer program designed to simulate a conversation with a human, which produces convincing and natural texts.
  • Educators are therefore concerned about the risks of using chatbots by students, who may ask ChatGPT to write their essays, for example.
  • Tools exist to identify whether a text has been written by a chatbot or not, but it is currently impossible to be 100% sure.
  • To identify whether a text has been generated by an AI, it is possible to track down strange wording, unnatural syntax, or instances of plagiarism.
  • With the right guidance, chatbots can nevertheless become powerful allies for teaching and studying, but also for the professional world.

Com­mon­ly used in cus­to­mer ser­vice and mar­ke­ting, as well as for gaming and edu­ca­tion, chat­bots have been around for decades12. The first ever chat­bot, ELIZA, deve­lo­ped in the 1960’s at MIT’s Arti­fi­cial Intel­li­gence Labo­ra­to­ry, was desi­gned to simu­late a psy­cho­the­ra­pist, using natu­ral lan­guage pro­ces­sing to respond to user input. Six­ty years on,chatbots are now beco­ming increa­sin­gly sophis­ti­ca­ted, using AI to unders­tand user input thus pro­vi­ding more natu­ral and intel­li­gent conver­sa­tions. As tech­no­lo­gy conti­nues to pro­gress, chat­bots are like­ly to become even more advan­ced, allo­wing for even more natu­ral and per­so­na­li­sed conver­sa­tions being used in a varie­ty of indus­tries, from heal­th­care to finance3.

ChatGPT, relea­sed to the public on Novem­ber 30th 2022, is a chat­bot – a com­pu­ter pro­gram desi­gned to simu­late conver­sa­tion with a human, deve­lo­ped by the San Fran­cis­co-based com­pa­ny Ope­nAI. As its name sug­gests, it relies on GPT (Gene­ra­tive Pre-trai­ned Trans­for­mer), which is a type of arti­fi­cial intel­li­gence (AI) model trai­ned on a large amount of text data, used to gene­rate new text in res­ponse to users’ prompts. ChatGPT has become popu­lar because of its abi­li­ty to gene­rate convin­cing and enga­ging text upon natu­ral lan­guage que­ries, which has made it a use­ful and user-friend­ly tool for tasks like content crea­tion, auto­ma­ted cus­to­mer sup­port, and natu­ral lan­guage pro­ces­sing4. As such, edu­ca­tors are ques­tio­ning whe­ther the use of chat­bots by stu­dents is a risk. Moreo­ver, just a few days ago Ope­nAi relea­sed GPT‑4, the suc­ces­sor to ChatGPT. It remains to be seen how much more advan­ced this new ver­sion is than the pre­vious one.

Could students use chatbots in a malicious way ? 

While chea­ting is an age-old pro­blem in edu­ca­tion5, AI-based chat­bots represent a new route for those willing to cheat by asking ques­tions about assi­gn­ments or tests. For example, ins­tead of using the rea­ding mate­rial pro­vi­ded by the pro­fes­sor, a student might use a chat­bot to ask for help with a math pro­blem or get the ans­wer to a mul­tiple-choice ques­tion. Note that this is simi­lar to typing a ques­tion in a search engine like Google or Bing (which may soon embark ChatGPT6). Whe­ther this rather mun­dane action is consi­de­red chea­ting is up to the teacher.

While chea­ting is an age-old pro­blem in edu­ca­tion, AI-based chat­bots represent a new route for those willing to cheat.

Fur­ther­more, some chat­bots are even spe­cia­li­sed in sol­ving cer­tain types of pro­blems. Dee­pL Trans­late, for ins­tance, is an online AI-based lan­guage trans­la­tion ser­vice, which allows users to trans­late text, web­sites, and docu­ments into dif­ferent lan­guages with high accu­ra­cy and speed. Other chat­bots spe­cia­lise in wri­ting com­pu­ter code, inclu­ding Code­bots and Auto­code. While these chat­bots are ini­tial­ly desi­gned to assist well inten­tio­ned users in sol­ving tedious or repe­ti­tive tasks, they have the poten­tial to be diver­ted from their ini­tial pur­pose by stu­dents willing to cheat. 

Besides ans­we­ring short ques­tions, pre-trai­ned AI can be used to gene­rate essays with a sem­blance of eru­di­tion. Para­phra­sing tools such as Quill­bot, Paper­pal, or Wor­dAI, have alrea­dy been avai­lable for seve­ral years and can convin­cin­gly change a poor­ly writ­ten manus­cript into a decent aca­de­mic paper, or indeed change an ori­gi­nal text to escape pla­gia­rism detec­tion. Howe­ver, more concer­ning is the abi­li­ty of some chat­bots to gene­rate leng­thy, human-loo­king essays in seconds, in res­ponse to a short prompt. 

In ChatGPT, stu­dents can very sim­ply adjust various para­me­ters, such as the length of the bot’s res­ponse, the level of ran­dom­ness added to the essay, or the AI model variant used by the chat­bot. The essay thus gene­ra­ted can then be used as is, or as a star­ting point that the student can then fur­ther edit. With this approach, stu­dents can easi­ly gene­rate a solid essay in a mat­ter of minutes. By pro­vi­ding a chat­bot with the same prompt mul­tiple times, the soft­ware will gene­rate mul­tiple ver­sions of the same essay (see Figure 1). This allows stu­dents to select the ver­sion which best suits their needs, or even copy and paste sec­tions from dif­ferent ver­sions to create a unique essay. It is cur­rent­ly impos­sible to veri­fy with 100% accu­ra­cy that the essay was enti­re­ly writ­ten by a chat­bot when this method is applied. 

Asking ChatGPT about the theo­ry of Evo­lu­tion. We asked ChatGPT to write a para­graph about the theo­ry of Evo­lu­tion mul­tiple times. On the first three que­ries, our ques­tion was the same – ChatGPT ans­we­red slight­ly dif­fe­rent­ly each time. For the fourth, we also asked the bot to for­mu­late its ans­wer in a way that would be sui­table for an expert in the field – which shows the extent of lan­guage pro­fi­cien­cy attai­nable by the software. 

What are the concerns ?

Chat­bots make it easy for stu­dents to pla­gia­rise without even rea­li­sing it as they might take the ans­wer gene­ra­ted by a chat­bot and sub­mit it as their own work without citing the bot’s sources. This type of pla­gia­rism is espe­cial­ly dif­fi­cult to detect because many chat­bots add ran­dom­ness to their models. Also, while the chat­bot may create novel sen­tences or para­graphs, it can still pro­vide users with ideas and phrases that are close to their ori­gi­nal cor­pus. It is the­re­fore cru­cial that users take steps to ensure that they are not pla­gia­ri­sing when using a chat­bot. In the future, given some chat­bots spe­cia­lise in fin­ding refe­rences7, soon we may see text-wri­ting chat­bots using refe­ren­cing chat­bots to source their essay ! 

Unlike humans, chat­bots are limi­ted in their abi­li­ty to unders­tand the context of a conver­sa­tion, so they can pro­vide incor­rect ans­wers to ques­tions or give mis­lea­ding infor­ma­tion. Also, chat­bots may show a wide range of biases. For example, a chat­bot might use lan­guage in a way that rein­forces ste­reo­types or gen­der roles, or it may pro­vide incor­rect infor­ma­tion about stig­ma­ti­sed topics or those which are contro­ver­sial8910. Micro­soft’s Tay chat­bot, relea­sed in 2016 by Micro­soft, was an arti­fi­cial intel­li­gence pro­ject crea­ted to inter­act with people on Twit­ter. It was desi­gned to learn from conver­sa­tions with real people and become smar­ter over time. A few weeks after its release, Tay was taken offline after it began making contro­ver­sial and offen­sive sta­te­ments11

Image gene­ra­ted with DALL‑E (Ope­nAI) using the prompt “An oil pain­ting of a class­room of student robots with a pro­fes­sor in the style of Hen­ri Rovel” © OpenAI. 

A par­ti­cu­lar­ly dis­tres­sing concern lies in the pos­si­bi­li­ty that the use of chat­bots could lead to a lack of cri­ti­cal thin­king skills. As chat­bots become more advan­ced, they may be able to pro­vide stu­dents with the ans­wers to their ques­tions without requi­ring them to think for them­selves. This could lead to stu­dents beco­ming pas­sive lear­ners, which would be a detriment to their edu­ca­tio­nal deve­lop­ment but could also lead to a decrease in creativity. 

Should educators be concerned ?

Chat­bots may seem new and exci­ting, but the tech­no­lo­gy itself has been around for decades. Chances are that you read AI-gene­ra­ted text on a regu­lar basis without kno­wing it. News agen­cies such as Asso­cia­ted Press or the Washing­ton Post, for ins­tance, use chat­bots to gene­rate short news articles. While Asso­cia­ted Press tur­ned to a com­mer­cial­ly-avai­lable solu­tion, Word­smith, in 201412, the Washing­ton Post has been using its own in-house chat­bot, Helio­graf, since at least 201713

The qua­li­ty of the ans­wers pro­vi­ded by chat­bots has sub­stan­tial­ly increa­sed in the past few years, and AI- gene­ra­ted texts, even in aca­de­mic set­tings, are now dif­fi­cult to dif­fe­ren­tiate from human writ­ten texts14. Indeed, although frow­ned upon by the scien­ti­fic com­mu­ni­ty, ChatGPT has been (albeit pro­vo­ca­ti­ve­ly) cited as full-fled­ged authors in some scien­ti­fic papers15.

News agen­cies use chat­bots to gene­rate short news articles.

Also, while chat­bots can (and will1617) be used to cheat, they are just one more tool on the student’s belt. Even without consi­de­ring the recent­ly gai­ned popu­la­ri­ty of ChatGPT, there are seve­ral ways stu­dents can cheat on their home­work, such as copying ans­wers from class­mates, using online resources to look up and pla­gia­ri­sing ans­wers, or even hiring someone to do the work for them. In other words : where there is a will to cheat, there is a way. 

How can educators act ? 

One of the very first steps edu­ca­tors may take against the mali­cious use of chat­bots is to adopt new regu­la­tions, whe­ther as a course poli­cy or, even bet­ter, at school level18. Upda­ting the stan­dards of conduct would cer­tain­ly increase stu­dents’ and edu­ca­tors’ awa­re­ness towards the issue. It may also dis­cou­rage many stu­dents from trying to cheat, in fear of the conse­quences. Howe­ver, it would hard­ly solve the pro­blem in its entirety. 

How about chan­ging the way we test stu­dents ? One could ima­gine new, crea­tive types of assi­gn­ments that may not be easi­ly sol­ved by chat­bots. While temp­ting, this solu­tion bears two issues. On one hand, AI-based tech­no­lo­gies, espe­cial­ly chat­bots, are a flou­ri­shing field. The­re­fore, a teacher’s efforts to adapt their assi­gn­ments may very well be rui­ned at the next chatbot’s soft­ware update. On the other hand, forms of ques­tio­ning that would be consi­de­red “chat­bot friend­ly”, such as writ­ten essays and quizzes, are inva­luable tools for edu­ca­tors to test skills like com­pre­hen­sion, ana­ly­sis, or syn­the­sis19. New, inno­va­tive ques­tio­ning stra­te­gies are always great, but they should not be the only solution. 

Ano­ther solu­tion yet to be explo­red is sta­tis­ti­cal water­mar­king20. Sta­tis­ti­cal water­mar­king is a type of digi­tal water­mar­king tech­nique used to embed a hid­den mes­sage or data within a digi­tal signal. In the case of chat­bots, the water­mark would be a set of non-ran­dom pro­ba­bi­li­ties to pick cer­tain words or phrases, desi­gned to be unde­tec­table to the human eye, yet still reco­gni­sable by com­pu­ters. Sta­tis­ti­cal water­mar­king could be used to detect chat­bot-gene­ra­ted text.

Sta­tis­ti­cal water­mar­king is a type of digi­tal water­mar­king tech­nique used to embed a hid­den mes­sage or data within a digi­tal signal.

Howe­ver, this approach has various draw­backs that seve­re­ly limit its usage in the class­room. For ins­tance, tech com­pa­nies may be reluc­tant to imple­ment sta­tis­ti­cal water­mar­king, because of the repu­ta­tio­nal and legal risks if their chat­bot was asso­cia­ted with repre­hen­sible actions such as ter­ro­rism or cyber bul­lying. In addi­tion, sta­tis­ti­cal water­mar­king works only if the chea­ting student copy-pastes a large por­tion of text. If they edit the chat­bot gene­ra­ted essay, or if the text is too short to run a sta­tis­ti­cal ana­ly­sis, then water­mar­king is useless. 

How to detect AI-generated text ? 

One way to detect AI-gene­ra­ted text is to look for unna­tu­ral or awk­ward phra­sing and syn­tax. AI algo­rithms are gene­ral­ly limi­ted in their abi­li­ty to natu­ral­ly express ideas, so their gene­ra­ted text may have sen­tences that are over­ly long or too short. Addi­tio­nal­ly, chat­bots may lack natu­ral flow of ideas, as well as use words or phrases in inap­pro­priate contexts. In other words, their gene­ra­ted content may lack the depth and nuance of human-gene­ra­ted text21. This is espe­cial­ly true for long essays. Ano­ther concern we rai­sed ear­lier regar­ding chat­bots was the risk of pla­gia­rism. As such, a simple way to detect AI-gene­ra­ted text is to look for the pre­sence of such pla­gia­rism22. Pla­gia­rism-detec­ting engines are rea­di­ly available. 

In addi­tion, people can detect AI-gene­ra­ted text by loo­king for the pre­sence of a “sta­tis­ti­cal signa­ture”. On a basic level, chat­bots are all desi­gned to per­form one task : they pre­dict the words or phrases that are the most like­ly to fol­low a user’s given prompt. The­re­fore, at each posi­tion within the text, the words or phrases picked by the chat­bot are very like­ly to be there. This is dif­ferent from humans, which write ans­wers and essays based on their cog­ni­tive abi­li­ties rather than pro­ba­bi­li­ty charts, and hence may create uncom­mon word asso­cia­tions that would still make sense. Put sim­ply, a human’s ans­wer to a given ques­tion should be less pre­dic­table, or more crea­tive, than a chatbot’s. 

This dif­fe­rence in their sta­tis­ti­cal signa­ture may be used to detect whe­ther a sequence of words is more pre­dic­table (a sta­tis­ti­cal signa­ture of chat­bots) or crea­tive (hence like­ly human). Some pro­grams alrea­dy exist, such as Giant Lan­guage model Test Room (GLTR), deve­lo­ped joint­ly by MIT and Har­vard Uni­ver­si­ty using the pre­vious ver­sion of openAI’s lan­guage model, GPT‑2. We tes­ted GLTR with short essays either writ­ten by some of our own stu­dents or gene­ra­ted by ChatGPT. We are hap­py to report that our stu­dents’ ans­wers were easi­ly dis­tin­gui­shable from the chat­bot (see box below)!

Since GLTR, other AI-detec­ting pro­grams have emer­ged, such as Ope­nAI-Detec­tor, a pro­gram relea­sed short­ly after GLTR and based on simi­lar prin­ciples, or GPT­Ze­ro, a com­mer­cial ven­ture ini­tial­ly crea­ted by a col­lege student in 2023. Soon, we hope to see the emer­gence of new tools to detect chat­bot-gene­ra­ted text, more tai­lo­red to the needs of edu­ca­tors, simi­lar to rea­di­ly-avai­lable pla­gia­rism detec­tion engines. 

To cheat or to chat ?

To end on a posi­tive note, let’s not for­get that most stu­dents willin­gly com­plete their assi­gn­ments without chea­ting. The first pre­ven­tive action should be to moti­vate stu­dents by explai­ning why the know­ledge and skills taught during the course are impor­tant, use­ful, and inter­es­ting23. Cal­cu­la­tors did not put math tea­chers out of a job. Google did not cause schools to shut down. Like­wise, we believe that edu­ca­tors will cer­tain­ly adapt to chat­bots which, des­pite the legi­ti­mate concerns they raise, may soon prove inva­luable in many ways. With the pro­per fra­me­work and gui­dance, chat­bots can become power­ful tea­ching and stu­dying assis­tants, as well as inva­luable tools for businesses. 

As such, edu­ca­tors should take the ini­tia­tive to fami­lia­rise their stu­dents with chat­bots, help them unders­tand the poten­tial and limits of this tech­no­lo­gy, and teach them how to use chat­bots in an effi­cient, yet res­pon­sible and ethi­cal way. 

Sta­tis­ti­cal signa­ture could be used to detect chat­bot-gene­ra­ted essays.

The expe­riment : As part of a neu­ros­cience course given at Sup’Biotech in Fall 2022, we gathe­red the writ­ten ans­wers of 51 stu­dents to the fol­lo­wing ques­tion : “Brie­fly define the term « recep­tive field », then explain how you would mea­sure the recep­tive field of a neu­ron in the soma­to­sen­so­ry cor­tex of a cat.” The ques­tion was part of a take-home, open-book, timed quiz to be taken on the course web­site. In paral­lel, we asked ChatGPT to ans­wer the same ques­tion 10 times, to obtain 10 dif­ferent chat­bot ans­wers. We used GLTR to com­pare the sta­tis­ti­cal signa­ture of the stu­dents’ and chatbot’s answers. 

How GLTR works : For each posi­tion in the text, GLTR looks at what a chat­bot (spe­ci­fi­cal­ly : GPT‑2, an older ver­sion of ChatGPT model) would have picked, before com­pa­ring it to the actual word. For example, in the fol­lo­wing text : “Bio­lo­gy is great!”, the word “great” is ran­ked 126th among all pos­sible words that the chat­bot could have cho­sen (the top chat­bot choice being “a”). GLTR then gene­rates a his­to­gram of all ran­kings, which may be used as a simple form of sta­tis­ti­cal signa­ture : GPT-2- gene­ra­ted texts will be domi­na­ted by high ran­kings, while human prompts will contain a grea­ter pro­por­tion of low rankings. 

Panel A : Two exem­plar ans­wers, one from an actual student, the other by ChatGPT. The texts are colo­red based on GLTR ran­king. The his­to­grams on the right show their sta­tis­ti­cal signa­ture. Note that the human res­ponse contains more low ran­kings than the chatbot. 

Panel B : We over­layed the his­to­grams obtai­ned from all 51 stu­dents’ and 10 chatbot’s ans­wers (in blue and red, res­pec­ti­ve­ly). Again, we notice a clear dif­fe­rence bet­ween the human and ChatGPT texts. In other words, based on the visual ins­pec­tion of the sta­tis­ti­cal signa­tures, we are quite confi­dent that our stu­dents did not use ChatGPT to ans­wer the question. 

1Ina. The His­to­ry Of Chat­bots – From ELIZA to ChatGPT. In Onlim​.com. Publi­shed 03-15-2022. Retrie­ved 01–19- 2023. 
2Thor­becke C. Chat­bots : A long and com­pli­ca­ted his­to­ry. In CNN busi­ness. Publi­shed 08-20-2022. Retrie­ved 01- 19–2023. 
3Marr B. What Does ChatGPT Real­ly Mean For Busi­nesses ? In Forbes. Publi­shed 12-28-2022. Retrie­ved 01–19- 2023. 
4Timo­thy M. 11 Things You Can Do With ChatGPT. In MakeU​seOf​.com. Publi­shed 12-20-2022. Retrie­ved 01–19- 2023.
5Bush­way A, Nash WR (1977). School Chea­ting Beha­vior. Review of Edu­ca­tio­nal Research, 47(4), 623–632. 
6Holmes A. Micro­soft and Ope­nAI Wor­king on ChatGPT-Powe­red Bing in Chal­lenge to Google. In The Infor­ma­tion. Publi­shed 01-03-2023. Retrie­ved 01-19-2023. 
7Vincze J (2017). Vir­tual Refe­rence Libra­rians (Chat­bots). Libra­ry Hi Tech News 34(4), 5–8.
8Feine J et al. (2020). Gen­der Bias in Chat­bot Desi­gn. Conver­sa­tions 2019. Lec­ture Notes in Com­pu­ter Science, vol 11970. Sprin­ger, Cham. 
9Haroun O. Racist Chat­bots & Sexist Robo-Recrui­ters : Deco­ding Algo­rith­mic Bias. In The AI Jour­nal. Publi­shed 10-11-2023. Retrie­ved 01-19-2023.
10Biddle S. The Internet’s New Favo­rite AI Pro­poses Tor­tu­ring Ira­nians and Sur­veilling Mosques. In The Inter­cept. Publi­shed 12-08-2022. Retrie­ved 01-19-2023. 
11Vinvent J. Twit­ter taught Microsoft’s AI chat­bot to be a racist asshole in less than a day. In The Verge. Publi­shed 03-24-2016. Retrie­ved 01-19-2023.
12Mil­ler R. AP’s ‘robot jour­na­lists’ are wri­ting their own sto­ries now. In The Verge. Pos­ted 01-29-2015. Retrei­ved 01-19-2023. 
13Moses L. The Washing­ton Post’s robot repor­ter has publi­shed 850 articles in the past year. In Digi​day​.com. Pos­ted 09-14-2017. Retrei­ved 01-19-2023.
14Else H (2023). Abs­tracts writ­ten by ChatGPT fool scien­tists. Nature, 613(7944), 423. 
15Sto­kel-Wal­ker C (2023). ChatGPT lis­ted as author on research papers : many scien­tists disap­prove. Nature (retrie­ved online ahead of print on 01-23-2023). 
16Gor­don B. North Caro­li­na Pro­fes­sors Catch Stu­dents Chea­ting With ChatGPT. In Govern­ment Tech­no­lo­gy. Publi­shed 01-12-2023. Retrie­ved 01-19-2023.
17Nolan B. Two pro­fes­sors who say they caught stu­dents chea­ting on essays with ChatGPT explain why AI pla­gia­rism can be hard to prove. In Insi­der. Publi­shed 01-14-2023. Retrie­ved 01-19-2023.
18John­son A. ChatGPT In Schools : Here’s Where It’s Banned—And How It Could Poten­tial­ly Help Stu­dents. In Forbes. Publi­shed 01-18-2023. Retrie­ved 01-19-2023.
19Kra­th­wohl DR (2002). A revi­sion of Bloom’s taxo­no­my : An over­view. Theo­ry into prac­tice, 41(4), 212–218.
20Aaron­son S. My AI Safe­ty Lec­ture for UT Effec­tive Altruism. In Shtetl-Opti­mi­zed, The Blog of Scott Aaron­son. Pos­ted 11-29-2022. Retrei­ved 01-19-2023. 
21Bogost I. ChatGPT Is Dum­ber Than You Think. In The Atlan­tic. Publi­shed 12-07-2022. Retrie­ved 01-19-2023. 
22Mol­len­kamp D. Can Anti-Pla­gia­rism Tools Detect When AI Chat­bots Write Student Essays ? In EdSurge. Publi­shed 12-21-2022. Retrie­ved 01-19-2023.
23Shres­tha G (2020). Impor­tance of Moti­va­tion in Edu­ca­tion. Inter­na­tio­nal Jour­nal of Science and Research, 9(3), 91–93.

Support accurate information rooted in the scientific method.

Donate