Home / Chroniques / Cheating or chatting: is ChatGPT a threat to education?
Chatgpt writing assistance
π Digital π Society

Cheating or chatting: is ChatGPT a threat to education?

GRIMAUD_Julien
Julien Grimaud
Assistant Professor of Life Sciences at Sup’Biotech
DEBELJAK_Pavla
Pavla Debeljak
Assistant Professor of Bioinformatics at Sup'Biotech
YATES_Frank
Frank Yates
Director of Research at Sup’Biotech Engineering School
Key takeaways
  • ChatGPT is a chatbot, i.e. a computer program designed to simulate a conversation with a human, which produces convincing and natural texts.
  • Educators are therefore concerned about the risks of using chatbots by students, who may ask ChatGPT to write their essays, for example.
  • Tools exist to identify whether a text has been written by a chatbot or not, but it is currently impossible to be 100% sure.
  • To identify whether a text has been generated by an AI, it is possible to track down strange wording, unnatural syntax, or instances of plagiarism.
  • With the right guidance, chatbots can nevertheless become powerful allies for teaching and studying, but also for the professional world.

Com­mon­ly used in cus­tomer ser­vice and mar­ket­ing, as well as for gam­ing and edu­ca­tion, chat­bots have been around for decades12. The first ever chat­bot, ELIZA, devel­oped in the 1960’s at MIT’s Arti­fi­cial Intel­li­gence Lab­o­ra­to­ry, was designed to sim­u­late a psy­chother­a­pist, using nat­ur­al lan­guage pro­cess­ing to respond to user input. Six­ty years on,chatbots are now becom­ing increas­ing­ly sophis­ti­cat­ed, using AI to under­stand user input thus pro­vid­ing more nat­ur­al and intel­li­gent con­ver­sa­tions. As tech­nol­o­gy con­tin­ues to progress, chat­bots are like­ly to become even more advanced, allow­ing for even more nat­ur­al and per­son­alised con­ver­sa­tions being used in a vari­ety of indus­tries, from health­care to finance3.

Chat­G­PT, released to the pub­lic on Novem­ber 30th 2022, is a chat­bot – a com­put­er pro­gram designed to sim­u­late con­ver­sa­tion with a human, devel­oped by the San Fran­cis­co-based com­pa­ny Ope­nAI. As its name sug­gests, it relies on GPT (Gen­er­a­tive Pre-trained Trans­former), which is a type of arti­fi­cial intel­li­gence (AI) mod­el trained on a large amount of text data, used to gen­er­ate new text in response to users’ prompts. Chat­G­PT has become pop­u­lar because of its abil­i­ty to gen­er­ate con­vinc­ing and engag­ing text upon nat­ur­al lan­guage queries, which has made it a use­ful and user-friend­ly tool for tasks like con­tent cre­ation, auto­mat­ed cus­tomer sup­port, and nat­ur­al lan­guage pro­cess­ing4. As such, edu­ca­tors are ques­tion­ing whether the use of chat­bots by stu­dents is a risk. More­over, just a few days ago Ope­nAi released GPT‑4, the suc­ces­sor to Chat­G­PT. It remains to be seen how much more advanced this new ver­sion is than the pre­vi­ous one.

Could students use chatbots in a malicious way? 

While cheat­ing is an age-old prob­lem in edu­ca­tion5, AI-based chat­bots rep­re­sent a new route for those will­ing to cheat by ask­ing ques­tions about assign­ments or tests. For exam­ple, instead of using the read­ing mate­r­i­al pro­vid­ed by the pro­fes­sor, a stu­dent might use a chat­bot to ask for help with a math prob­lem or get the answer to a mul­ti­ple-choice ques­tion. Note that this is sim­i­lar to typ­ing a ques­tion in a search engine like Google or Bing (which may soon embark Chat­G­PT6). Whether this rather mun­dane action is con­sid­ered cheat­ing is up to the teacher.

While cheat­ing is an age-old prob­lem in edu­ca­tion, AI-based chat­bots rep­re­sent a new route for those will­ing to cheat.

Fur­ther­more, some chat­bots are even spe­cialised in solv­ing cer­tain types of prob­lems. DeepL Trans­late, for instance, is an online AI-based lan­guage trans­la­tion ser­vice, which allows users to trans­late text, web­sites, and doc­u­ments into dif­fer­ent lan­guages with high accu­ra­cy and speed. Oth­er chat­bots spe­cialise in writ­ing com­put­er code, includ­ing Code­bots and Autocode. While these chat­bots are ini­tial­ly designed to assist well inten­tioned users in solv­ing tedious or repet­i­tive tasks, they have the poten­tial to be divert­ed from their ini­tial pur­pose by stu­dents will­ing to cheat. 

Besides answer­ing short ques­tions, pre-trained AI can be used to gen­er­ate essays with a sem­blance of eru­di­tion. Para­phras­ing tools such as Quill­bot, Paper­pal, or Wor­dAI, have already been avail­able for sev­er­al years and can con­vinc­ing­ly change a poor­ly writ­ten man­u­script into a decent aca­d­e­m­ic paper, or indeed change an orig­i­nal text to escape pla­gia­rism detec­tion. How­ev­er, more con­cern­ing is the abil­i­ty of some chat­bots to gen­er­ate lengthy, human-look­ing essays in sec­onds, in response to a short prompt. 

In Chat­G­PT, stu­dents can very sim­ply adjust var­i­ous para­me­ters, such as the length of the bot’s response, the lev­el of ran­dom­ness added to the essay, or the AI mod­el vari­ant used by the chat­bot. The essay thus gen­er­at­ed can then be used as is, or as a start­ing point that the stu­dent can then fur­ther edit. With this approach, stu­dents can eas­i­ly gen­er­ate a sol­id essay in a mat­ter of min­utes. By pro­vid­ing a chat­bot with the same prompt mul­ti­ple times, the soft­ware will gen­er­ate mul­ti­ple ver­sions of the same essay (see Fig­ure 1). This allows stu­dents to select the ver­sion which best suits their needs, or even copy and paste sec­tions from dif­fer­ent ver­sions to cre­ate a unique essay. It is cur­rent­ly impos­si­ble to ver­i­fy with 100% accu­ra­cy that the essay was entire­ly writ­ten by a chat­bot when this method is applied. 

Ask­ing Chat­G­PT about the the­o­ry of Evo­lu­tion. We asked Chat­G­PT to write a para­graph about the the­o­ry of Evo­lu­tion mul­ti­ple times. On the first three queries, our ques­tion was the same – Chat­G­PT answered slight­ly dif­fer­ent­ly each time. For the fourth, we also asked the bot to for­mu­late its answer in a way that would be suit­able for an expert in the field – which shows the extent of lan­guage pro­fi­cien­cy attain­able by the software. 

What are the concerns?

Chat­bots make it easy for stu­dents to pla­gia­rise with­out even real­is­ing it as they might take the answer gen­er­at­ed by a chat­bot and sub­mit it as their own work with­out cit­ing the bot’s sources. This type of pla­gia­rism is espe­cial­ly dif­fi­cult to detect because many chat­bots add ran­dom­ness to their mod­els. Also, while the chat­bot may cre­ate nov­el sen­tences or para­graphs, it can still pro­vide users with ideas and phras­es that are close to their orig­i­nal cor­pus. It is there­fore cru­cial that users take steps to ensure that they are not pla­gia­ris­ing when using a chat­bot. In the future, giv­en some chat­bots spe­cialise in find­ing ref­er­ences7, soon we may see text-writ­ing chat­bots using ref­er­enc­ing chat­bots to source their essay! 

Unlike humans, chat­bots are lim­it­ed in their abil­i­ty to under­stand the con­text of a con­ver­sa­tion, so they can pro­vide incor­rect answers to ques­tions or give mis­lead­ing infor­ma­tion. Also, chat­bots may show a wide range of bias­es. For exam­ple, a chat­bot might use lan­guage in a way that rein­forces stereo­types or gen­der roles, or it may pro­vide incor­rect infor­ma­tion about stig­ma­tised top­ics or those which are con­tro­ver­sial8910. Microsoft­’s Tay chat­bot, released in 2016 by Microsoft, was an arti­fi­cial intel­li­gence project cre­at­ed to inter­act with peo­ple on Twit­ter. It was designed to learn from con­ver­sa­tions with real peo­ple and become smarter over time. A few weeks after its release, Tay was tak­en offline after it began mak­ing con­tro­ver­sial and offen­sive state­ments11

Image gen­er­at­ed with DALL‑E (Ope­nAI) using the prompt “An oil paint­ing of a class­room of stu­dent robots with a pro­fes­sor in the style of Hen­ri Rov­el” © OpenAI. 

A par­tic­u­lar­ly dis­tress­ing con­cern lies in the pos­si­bil­i­ty that the use of chat­bots could lead to a lack of crit­i­cal think­ing skills. As chat­bots become more advanced, they may be able to pro­vide stu­dents with the answers to their ques­tions with­out requir­ing them to think for them­selves. This could lead to stu­dents becom­ing pas­sive learn­ers, which would be a detri­ment to their edu­ca­tion­al devel­op­ment but could also lead to a decrease in creativity. 

Should educators be concerned?

Chat­bots may seem new and excit­ing, but the tech­nol­o­gy itself has been around for decades. Chances are that you read AI-gen­er­at­ed text on a reg­u­lar basis with­out know­ing it. News agen­cies such as Asso­ci­at­ed Press or the Wash­ing­ton Post, for instance, use chat­bots to gen­er­ate short news arti­cles. While Asso­ci­at­ed Press turned to a com­mer­cial­ly-avail­able solu­tion, Word­smith, in 201412, the Wash­ing­ton Post has been using its own in-house chat­bot, Heli­ograf, since at least 201713

The qual­i­ty of the answers pro­vid­ed by chat­bots has sub­stan­tial­ly increased in the past few years, and AI- gen­er­at­ed texts, even in aca­d­e­m­ic set­tings, are now dif­fi­cult to dif­fer­en­ti­ate from human writ­ten texts14. Indeed, although frowned upon by the sci­en­tif­ic com­mu­ni­ty, Chat­G­PT has been (albeit provoca­tive­ly) cit­ed as full-fledged authors in some sci­en­tif­ic papers15.

News agen­cies use chat­bots to gen­er­ate short news articles.

Also, while chat­bots can (and will1617) be used to cheat, they are just one more tool on the student’s belt. Even with­out con­sid­er­ing the recent­ly gained pop­u­lar­i­ty of Chat­G­PT, there are sev­er­al ways stu­dents can cheat on their home­work, such as copy­ing answers from class­mates, using online resources to look up and pla­gia­ris­ing answers, or even hir­ing some­one to do the work for them. In oth­er words: where there is a will to cheat, there is a way. 

How can educators act? 

One of the very first steps edu­ca­tors may take against the mali­cious use of chat­bots is to adopt new reg­u­la­tions, whether as a course pol­i­cy or, even bet­ter, at school lev­el18. Updat­ing the stan­dards of con­duct would cer­tain­ly increase stu­dents’ and edu­ca­tors’ aware­ness towards the issue. It may also dis­cour­age many stu­dents from try­ing to cheat, in fear of the con­se­quences. How­ev­er, it would hard­ly solve the prob­lem in its entirety. 

How about chang­ing the way we test stu­dents? One could imag­ine new, cre­ative types of assign­ments that may not be eas­i­ly solved by chat­bots. While tempt­ing, this solu­tion bears two issues. On one hand, AI-based tech­nolo­gies, espe­cial­ly chat­bots, are a flour­ish­ing field. There­fore, a teacher’s efforts to adapt their assign­ments may very well be ruined at the next chatbot’s soft­ware update. On the oth­er hand, forms of ques­tion­ing that would be con­sid­ered “chat­bot friend­ly”, such as writ­ten essays and quizzes, are invalu­able tools for edu­ca­tors to test skills like com­pre­hen­sion, analy­sis, or syn­the­sis19. New, inno­v­a­tive ques­tion­ing strate­gies are always great, but they should not be the only solution. 

Anoth­er solu­tion yet to be explored is sta­tis­ti­cal water­mark­ing20. Sta­tis­ti­cal water­mark­ing is a type of dig­i­tal water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a dig­i­tal sig­nal. In the case of chat­bots, the water­mark would be a set of non-ran­dom prob­a­bil­i­ties to pick cer­tain words or phras­es, designed to be unde­tectable to the human eye, yet still recog­nis­able by com­put­ers. Sta­tis­ti­cal water­mark­ing could be used to detect chat­bot-gen­er­at­ed text.

Sta­tis­ti­cal water­mark­ing is a type of dig­i­tal water­mark­ing tech­nique used to embed a hid­den mes­sage or data with­in a dig­i­tal signal.

How­ev­er, this approach has var­i­ous draw­backs that severe­ly lim­it its usage in the class­room. For instance, tech com­pa­nies may be reluc­tant to imple­ment sta­tis­ti­cal water­mark­ing, because of the rep­u­ta­tion­al and legal risks if their chat­bot was asso­ci­at­ed with rep­re­hen­si­ble actions such as ter­ror­ism or cyber bul­ly­ing. In addi­tion, sta­tis­ti­cal water­mark­ing works only if the cheat­ing stu­dent copy-pastes a large por­tion of text. If they edit the chat­bot gen­er­at­ed essay, or if the text is too short to run a sta­tis­ti­cal analy­sis, then water­mark­ing is useless. 

How to detect AI-generated text? 

One way to detect AI-gen­er­at­ed text is to look for unnat­ur­al or awk­ward phras­ing and syn­tax. AI algo­rithms are gen­er­al­ly lim­it­ed in their abil­i­ty to nat­u­ral­ly express ideas, so their gen­er­at­ed text may have sen­tences that are over­ly long or too short. Addi­tion­al­ly, chat­bots may lack nat­ur­al flow of ideas, as well as use words or phras­es in inap­pro­pri­ate con­texts. In oth­er words, their gen­er­at­ed con­tent may lack the depth and nuance of human-gen­er­at­ed text21. This is espe­cial­ly true for long essays. Anoth­er con­cern we raised ear­li­er regard­ing chat­bots was the risk of pla­gia­rism. As such, a sim­ple way to detect AI-gen­er­at­ed text is to look for the pres­ence of such pla­gia­rism22. Pla­gia­rism-detect­ing engines are read­i­ly available. 

In addi­tion, peo­ple can detect AI-gen­er­at­ed text by look­ing for the pres­ence of a “sta­tis­ti­cal sig­na­ture”. On a basic lev­el, chat­bots are all designed to per­form one task: they pre­dict the words or phras­es that are the most like­ly to fol­low a user’s giv­en prompt. There­fore, at each posi­tion with­in the text, the words or phras­es picked by the chat­bot are very like­ly to be there. This is dif­fer­ent from humans, which write answers and essays based on their cog­ni­tive abil­i­ties rather than prob­a­bil­i­ty charts, and hence may cre­ate uncom­mon word asso­ci­a­tions that would still make sense. Put sim­ply, a human’s answer to a giv­en ques­tion should be less pre­dictable, or more cre­ative, than a chatbot’s. 

This dif­fer­ence in their sta­tis­ti­cal sig­na­ture may be used to detect whether a sequence of words is more pre­dictable (a sta­tis­ti­cal sig­na­ture of chat­bots) or cre­ative (hence like­ly human). Some pro­grams already exist, such as Giant Lan­guage mod­el Test Room (GLTR), devel­oped joint­ly by MIT and Har­vard Uni­ver­si­ty using the pre­vi­ous ver­sion of openAI’s lan­guage mod­el, GPT‑2. We test­ed GLTR with short essays either writ­ten by some of our own stu­dents or gen­er­at­ed by Chat­G­PT. We are hap­py to report that our stu­dents’ answers were eas­i­ly dis­tin­guish­able from the chat­bot (see box below)!

Since GLTR, oth­er AI-detect­ing pro­grams have emerged, such as Ope­nAI-Detec­tor, a pro­gram released short­ly after GLTR and based on sim­i­lar prin­ci­ples, or GPTZe­ro, a com­mer­cial ven­ture ini­tial­ly cre­at­ed by a col­lege stu­dent in 2023. Soon, we hope to see the emer­gence of new tools to detect chat­bot-gen­er­at­ed text, more tai­lored to the needs of edu­ca­tors, sim­i­lar to read­i­ly-avail­able pla­gia­rism detec­tion engines. 

To cheat or to chat?

To end on a pos­i­tive note, let’s not for­get that most stu­dents will­ing­ly com­plete their assign­ments with­out cheat­ing. The first pre­ven­tive action should be to moti­vate stu­dents by explain­ing why the knowl­edge and skills taught dur­ing the course are impor­tant, use­ful, and inter­est­ing23. Cal­cu­la­tors did not put math teach­ers out of a job. Google did not cause schools to shut down. Like­wise, we believe that edu­ca­tors will cer­tain­ly adapt to chat­bots which, despite the legit­i­mate con­cerns they raise, may soon prove invalu­able in many ways. With the prop­er frame­work and guid­ance, chat­bots can become pow­er­ful teach­ing and study­ing assis­tants, as well as invalu­able tools for businesses. 

As such, edu­ca­tors should take the ini­tia­tive to famil­iarise their stu­dents with chat­bots, help them under­stand the poten­tial and lim­its of this tech­nol­o­gy, and teach them how to use chat­bots in an effi­cient, yet respon­si­ble and eth­i­cal way. 

Sta­tis­ti­cal sig­na­ture could be used to detect chat­bot-gen­er­at­ed essays.

The exper­i­ment: As part of a neu­ro­science course giv­en at Sup’Biotech in Fall 2022, we gath­ered the writ­ten answers of 51 stu­dents to the fol­low­ing ques­tion: “Briefly define the term « recep­tive field », then explain how you would mea­sure the recep­tive field of a neu­ron in the somatosen­so­ry cor­tex of a cat.” The ques­tion was part of a take-home, open-book, timed quiz to be tak­en on the course web­site. In par­al­lel, we asked Chat­G­PT to answer the same ques­tion 10 times, to obtain 10 dif­fer­ent chat­bot answers. We used GLTR to com­pare the sta­tis­ti­cal sig­na­ture of the stu­dents’ and chatbot’s answers. 

How GLTR works: For each posi­tion in the text, GLTR looks at what a chat­bot (specif­i­cal­ly: GPT‑2, an old­er ver­sion of Chat­G­PT mod­el) would have picked, before com­par­ing it to the actu­al word. For exam­ple, in the fol­low­ing text: “Biol­o­gy is great!”, the word “great” is ranked 126th among all pos­si­ble words that the chat­bot could have cho­sen (the top chat­bot choice being “a”). GLTR then gen­er­ates a his­togram of all rank­ings, which may be used as a sim­ple form of sta­tis­ti­cal sig­na­ture: GPT-2- gen­er­at­ed texts will be dom­i­nat­ed by high rank­ings, while human prompts will con­tain a greater pro­por­tion of low rankings. 

Pan­el A: Two exem­plar answers, one from an actu­al stu­dent, the oth­er by Chat­G­PT. The texts are col­ored based on GLTR rank­ing. The his­tograms on the right show their sta­tis­ti­cal sig­na­ture. Note that the human response con­tains more low rank­ings than the chatbot. 

Pan­el B: We over­layed the his­tograms obtained from all 51 stu­dents’ and 10 chatbot’s answers (in blue and red, respec­tive­ly). Again, we notice a clear dif­fer­ence between the human and Chat­G­PT texts. In oth­er words, based on the visu­al inspec­tion of the sta­tis­ti­cal sig­na­tures, we are quite con­fi­dent that our stu­dents did not use Chat­G­PT to answer the question. 

1Ina. The His­to­ry Of Chat­bots – From ELIZA to Chat­G­PT. In Onlim​.com. Pub­lished 03-15-2022. Retrieved 01–19- 2023. 
2Thor­becke C. Chat­bots: A long and com­pli­cat­ed his­to­ry. In CNN busi­ness. Pub­lished 08-20-2022. Retrieved 01- 19–2023. 
3Marr B. What Does Chat­G­PT Real­ly Mean For Busi­ness­es? In Forbes. Pub­lished 12-28-2022. Retrieved 01–19- 2023. 
4Tim­o­thy M. 11 Things You Can Do With Chat­G­PT. In MakeUse​Of​.com. Pub­lished 12-20-2022. Retrieved 01–19- 2023.
5Bush­way A, Nash WR (1977). School Cheat­ing Behav­ior. Review of Edu­ca­tion­al Research, 47(4), 623–632. 
6Holmes A. Microsoft and Ope­nAI Work­ing on Chat­G­PT-Pow­ered Bing in Chal­lenge to Google. In The Infor­ma­tion. Pub­lished 01-03-2023. Retrieved 01-19-2023. 
7Vincze J (2017). Vir­tu­al Ref­er­ence Librar­i­ans (Chat­bots). Library Hi Tech News 34(4), 5–8.
8Feine J et al. (2020). Gen­der Bias in Chat­bot Design. Con­ver­sa­tions 2019. Lec­ture Notes in Com­put­er Sci­ence, vol 11970. Springer, Cham. 
9Haroun O. Racist Chat­bots & Sex­ist Robo-Recruiters: Decod­ing Algo­rith­mic Bias. In The AI Jour­nal. Pub­lished 10-11-2023. Retrieved 01-19-2023.
10Bid­dle S. The Internet’s New Favorite AI Pro­pos­es Tor­tur­ing Ira­ni­ans and Sur­veilling Mosques. In The Inter­cept. Pub­lished 12-08-2022. Retrieved 01-19-2023. 
11Vin­vent J. Twit­ter taught Microsoft’s AI chat­bot to be a racist ass­hole in less than a day. In The Verge. Pub­lished 03-24-2016. Retrieved 01-19-2023.
12Miller R. AP’s ‘robot jour­nal­ists’ are writ­ing their own sto­ries now. In The Verge. Post­ed 01-29-2015. Retreived 01-19-2023. 
13Moses L. The Wash­ing­ton Post’s robot reporter has pub­lished 850 arti­cles in the past year. In Digi​day​.com. Post­ed 09-14-2017. Retreived 01-19-2023.
14Else H (2023). Abstracts writ­ten by Chat­G­PT fool sci­en­tists. Nature, 613(7944), 423. 
15Stokel-Walk­er C (2023). Chat­G­PT list­ed as author on research papers: many sci­en­tists dis­ap­prove. Nature (retrieved online ahead of print on 01-23-2023). 
16Gor­don B. North Car­oli­na Pro­fes­sors Catch Stu­dents Cheat­ing With Chat­G­PT. In Gov­ern­ment Tech­nol­o­gy. Pub­lished 01-12-2023. Retrieved 01-19-2023.
17Nolan B. Two pro­fes­sors who say they caught stu­dents cheat­ing on essays with Chat­G­PT explain why AI pla­gia­rism can be hard to prove. In Insid­er. Pub­lished 01-14-2023. Retrieved 01-19-2023.
18John­son A. Chat­G­PT In Schools: Here’s Where It’s Banned—And How It Could Poten­tial­ly Help Stu­dents. In Forbes. Pub­lished 01-18-2023. Retrieved 01-19-2023.
19Krath­wohl DR (2002). A revi­sion of Bloom’s tax­on­o­my: An overview. The­o­ry into prac­tice, 41(4), 212–218.
20Aaron­son S. My AI Safe­ty Lec­ture for UT Effec­tive Altru­ism. In Shtetl-Opti­mized, The Blog of Scott Aaron­son. Post­ed 11-29-2022. Retreived 01-19-2023. 
21Bogost I. Chat­G­PT Is Dumb­er Than You Think. In The Atlantic. Pub­lished 12-07-2022. Retrieved 01-19-2023. 
22Mol­lenkamp D. Can Anti-Pla­gia­rism Tools Detect When AI Chat­bots Write Stu­dent Essays? In EdSurge. Pub­lished 12-21-2022. Retrieved 01-19-2023.
23Shrestha G (2020). Impor­tance of Moti­va­tion in Edu­ca­tion. Inter­na­tion­al Jour­nal of Sci­ence and Research, 9(3), 91–93.

Our world explained with science. Every week, in your inbox.

Get the newsletter