Home / Chroniques / Generative AI: what are the next steps?
Eye of futuristic and Innovative Imagery AI and Automation use of artificial intelligence and automation in business processes, illustrating efficiency and productivity enhancements
π Science and technology π Digital

Generative AI: what are the next steps?

Andrew Rogoyski
Andrew Rogoyski
Innovation Director for the Surrey Institute for People-Centred AI
Key takeaways
  • AI is moving forward a breakneck speed, and its pace of development is unlikely to slow.
  • Some developments, like multimodal AI, AI agents, and AI-optimised chips are just around the corner.
  • However, the development of AI is not yet profitable and is dominated by a few large commercial organisations.
  • Bigger leaps like AI-powered robots and AI mentors are further away, but likely to happen.
  • With these developments, regulatory bodies need to keep up.

AI has been a long time coming. But over the past couple of years, while it’s been in the public eye, it has seemed to advance at warp speed. Andrew Rogoyski shares his insights into what to expect next. What powerful new features can we expect are just over the hill for AI?

We should explain that when we use the term “AI”, we’re cur­rent­ly most­ly focus­ing this dis­cus­sion on “Gen­er­a­tive AI” or “GenAI” which plat­forms like OpenAI’s Chat-GPT have brought to the world in the last two years. Fur­ther big advance­ments, being pushed for­ward by actors all around the world, are like­ly to come out soon. These already have a roadmap.

One of these is AI becom­ing increas­ing­ly mul­ti­modal. That means that large lan­guage mod­els (LLMs) will learn and under­stand text, video, and sound and how they relate to each oth­er. Some mod­els are already breach­ing that bar­ri­er and reach­ing the mar­kets. Sin­gle mode AIs like Copi­lot can gen­er­ate images from text and vice ver­sa. Sora can gen­er­ate video from text. Run­way and Pika Labs are also offer­ing image-to-video gen­er­a­tion. The new­er Large Mul­ti­modal Mod­els (LMMs) from Ope­nAI, Meta, Google and oth­ers can gen­er­ate video from an image, text, and oth­er data modes. For exam­ple, some GenAI mod­els will answer text ques­tions about the con­tent of videos. Many indus­tries are being affect­ed with stu­dios in Hol­ly­wood rapid­ly assess­ing what this could mean for the movie indus­try. One of the down­sides of this pow­er­ful tech­nol­o­gy is that you can cre­ate fair­ly intri­cate deep­fakes with small­er budgets.

Anoth­er big, expect­ed advance will be AI becom­ing an invis­i­ble tool. Instead of hav­ing to log on to a ded­i­cat­ed plat­form on a com­put­er or phone, we’ll be able to con­verse with our cars, phones, and appli­ances, and get very nat­ur­al answers. There are sev­er­al com­pa­nies work­ing on this, Apple with Apple intel­li­gence, Google with Google AI, Ama­zon with Alexa, and others.

The next step then is hav­ing AI act as a sort of agent on your behalf, allow­ing it to book in trips, hotel stays and so on. At this point, GenAI isn’t very good at plan­ning. That’s what Ope­nAI and oth­ers are work­ing on, get­ting GenAI that can break down a prob­lem into steps and take action on those steps. The ques­tion is, then how much author­i­ty do you give an agent to act on your behalf? It seems like­ly that such agents will be inter­act­ing with oth­er agents, lead­ing to entire AI dis­cus­sions and nego­ti­a­tions tak­ing place with­out human intervention.

Anoth­er fair­ly big devel­op­ment will be the improve­ment of AI retrieval. That may sound bor­ing, but it’s real­ly excit­ing in terms of pro­duc­tiv­i­ty. Cor­po­ra­tions col­lect thou­sands of doc­u­ments con­tain­ing cus­tomer inter­ac­tions, bids, poli­cies, pro­ce­dures, and oth­er use­ful infor­ma­tion. How­ev­er, retrieval of such infor­ma­tion is gen­er­al­ly poor. GenAI may be the solu­tion to the cor­po­rate “knowl­edge man­age­ment” prob­lem. Would­n’t it be won­der­ful to be able to ask your lap­top: “What was that big bid we did three years ago where we part­nered with that bank?” and have it infer the right answers and give you a sum­ma­ry rather than a string of doc­u­ments you have to read through?

Of course, before we can do this, we need to tack­le AI hal­lu­ci­na­tion, which is the false infor­ma­tion gen­er­at­ed by AI.. We have devel­oped a tech­nol­o­gy that will hal­lu­ci­nate images, sounds, poet­ry, and so on. But we are less keen on it hal­lu­ci­nat­ing the com­pa­ny accounts or a med­ical record. The trick now will be to take that real­ly nice con­ver­sa­tion­al inter­face and link it to hard facts. Gen­er­a­tive AI can cre­ate non­sense, which can be a big prob­lem. Recent­ly, Air Cana­da faced a small claims court case1 from a pas­sen­ger who tried to retroac­tive­ly apply for a refund on his tick­et after check­ing the company’s bereave­ment pol­i­cy on their AI-pow­ered chat­bot. The AI hal­lu­ci­nat­ed that pas­sen­gers could claim back mon­ey with­in 90 days of trav­el, which isn’t in the company’s pol­i­cy. The court sided with the passenger.

Part of the move forward with AI will be limiting its cost, right?

Yes, cost of run­ning these mod­els today, in terms of ener­gy, cool­ing, and com­put­ing pow­er, makes them unsus­tain­able, both com­mer­cial­ly and in the con­text of the cli­mate cri­sis. Com­pa­nies are like­ly to move from the exist­ing graph­ics pro­cess­ing units (GPUs) to hard­ware designed around AI applications.

Apple have a “neur­al pro­cess­ing unit”, Google have a “Ten­sor pro­cess­ing unit”, Microsoft, IBM, Ama­zon, Sam­sung and oth­ers are all devel­op­ing spe­cialised hard­ware that can deliv­er per­for­mance improve­ments hun­dreds or thou­sands of times more effi­cient than GPUs and CPUs. These chips are mas­sive­ly par­al­lel and opti­mised for the matrix oper­a­tions at the heart of machine learn­ing algorithms.

New chip archi­tec­tures are also being pro­posed to run these mod­els with very low ener­gy. That’s the case for IBM’s North Pole AI chip2, for instance, which promis­es to reduce the pow­er for typ­i­cal appli­ca­tions by a fac­tor of 253. Google is also work­ing on its Ten­sor Pro­cess­ing Unit to accel­er­ate AI pro­cess­ing and Groq’s Lan­guage Pro­cess­ing Unit is also show­ing promise.

Then there are more eso­teric archi­tec­tures, such as the neu­ro­mor­phic chips. These are designed to sup­port so-called spik­ing neur­al net­works — com­put­ing mod­els that mim­ic the way human brains work. Those are most­ly in the aca­d­e­m­ic domain at the moment, but they are start­ing to move into oth­er areas.

What about the fact that AI is that it is so heavily dominated by a few commercial entities at the moment?

Cur­rent­ly, there is a big debate about open­ing up LLMs to open source. Due to the scale of oper­a­tions need­ed to devel­op LLMs and LMMs, com­mer­cial organ­i­sa­tions have been very much at the fore­front of devel­op­ment. Around 80–90% of them are devel­oped by com­mer­cial organ­i­sa­tions. That means that the tech­nol­o­gy has remained most­ly in the hands of its proprietors—with some notable excep­tions like Meta’s LLa­MA and Mistral’s Large and Code­stral, which were made open source ear­ly on. There are also open-source com­mu­ni­ty LLM/LMMs like Platy­pus, Bloom, and Falcon.

On the one hand, more peo­ple exper­i­ment­ing and play­ing with the tech­nol­o­gy could trig­ger new advances, expose vul­ner­a­bil­i­ties, and so on. On the oth­er hand, there are peo­ple who will mis­use that tech­nol­o­gy. There are cur­rent­ly fail-safes built into most of the mod­els so that peo­ple can’t do what­ev­er they want, how­ev­er they’re rel­a­tive­ly easy to cir­cum­vent. And some open-source mod­els are avail­able in their “raw” state with no guardrails. We can expect that open source GenAI will con­tin­ue to grow. This goes hand in hand with the push to devel­op small­er, more sus­tain­able, mod­els that don’t require hun­dreds of mil­lions of dol­lars to run.

What issues can there be expected in terms of misuse of such new technology? 

Cyber­se­cu­ri­ty will con­tin­ue to be a huge issue. Crim­i­nal organ­i­sa­tions are quick­ly learn­ing to har­ness this tech­nol­o­gy for nefar­i­ous pur­pos­es. They have already start­ed using gen­er­a­tive AI to stream­line online sur­veil­lance, to mine his­tor­i­cal data for vul­ner­a­bil­i­ties, or to auto­mate attacks with fake texts. Scam­mers are also using deep fakes to swin­dle mon­ey out of com­pa­nies. The Hong Kong police recent­ly made six arrests4 in rela­tion to an elab­o­rate scam that robbed UK engi­neer­ing firm Arup5 of $25 mil­lion. One of the company’s work­ers was pulled into a video con­fer­ence call with what he thought was his chief finan­cial offi­cer. This turned out to be a deep­fake video. Deep fakes are also tar­get­ing vot­ers’ inten­tions with mis­in­for­ma­tion. It’s a very dan­ger­ous trend and real threat this year, 2024 run­ning the most elec­tions that humans have ever held in our history.

While cyber scam­mers will con­tin­ue to improve, defend­ers on the oth­er side are also learn­ing, using gen­er­a­tive AI and oth­er forms of AI to find attack­ers. There’s this con­stant cycle of attack and defence in the cyber­se­cu­ri­ty world.

There is also a big dis­cus­sion around the use of AI in a mil­i­tary con­text. AI is already used to analyse satel­lite imagery or pro­vide nav­i­ga­tion for drones, but it is not yet known to be used to take human life. At this point, it’s still cheap­er not to put AI on drones even if it is tech­ni­cal­ly fea­si­ble. And that’s a very impor­tant line, in my view, not to cross. We don’t want to enter a world where you have been fight­ing a machine speed and your adver­sary is an AI – it’s then a short step to the dystopi­an worlds of James Cameron’s Ter­mi­na­tor movies or the Wachows­ki brothers’s Matrix series.

We are seeing some movement from regulatory bodies, where do you expect that to go?

There is reg­u­la­tion start­ing to emerge. The Euro­pean Union AI Act came into force6 in August 2023, with the details being finalised in April this year. Every­one will be watch­ing what impact the EU leg­is­la­tion has. A US pres­i­den­tial order was pub­lished7 in Octo­ber intro­duced a long list of con­trols, includ­ing statu­to­ry report­ing above a cer­tain lev­el of com­put­ing and net­work­ing pow­er. We can expect more leg­is­la­tion to come out of the US, UK and oth­er coun­tries soon.

Sci­ence fic­tion has a dis­turb­ing habit of becom­ing sci­ence fact.

Still, unless you hold those devel­op­ing AI account­able, that reg­u­la­tion will only go so far. At the moment, it’s free reign. If the tech­nol­o­gy puts mil­lions of peo­ple out of jobs or caus­es a men­tal health epi­dem­ic, cor­po­ra­tions can shrug their shoul­ders and say they don’t con­trol how peo­ple use the tech­nol­o­gy. On the oth­er hand, if large cor­po­rates are the only organ­i­sa­tions will­ing or able to invest the tens of bil­lions nec­es­sary to devel­op these AI sys­tems, nobody wants to stall this and risk falling behind oth­er countries.

We need leg­is­la­tion and reg­u­la­tion where organ­i­sa­tions and indi­vid­u­als are account­able for the impact of their tech­nolo­gies. That would make them think care­ful­ly about how their tech­nol­o­gy is going to be used and put the onus on them to prop­er­ly explore and test the impact of their tech­nol­o­gy. You can see this is an area of ten­sion for some of the GenAI com­pa­nies, for exam­ple, Ope­nAI has lost sev­er­al lead­ing peo­ple8 from their com­pa­ny, each of whom is hint­ing at the lack of over­sight in GenAI development.

Anything else we should be looking out for?

There are advances that are over the hori­zon, but you can see that they will come. And those will be very sig­nif­i­cant. I think the con­ver­gence of quan­tum com­put­ing and AI will be inter­est­ing. Some com­pa­nies like IBM are now bring­ing for­ward their roadmaps on quan­tum com­put­ing. IBM is now fore­shad­ow­ing 200 qubits and 100 mil­lion com­put­ing gates by 20299. That is very pow­er­ful tech­nol­o­gy that may allow AI to learn in real-time and that gets real­ly exciting.

Over the past 12 months or so, peo­ple have been apply­ing the large lan­guage mod­el approach to robot­ics, so-called Vision Lan­guage Action mod­els, or VLA. In the same way that we’ve built foun­da­tion mod­els for text and images, we may be able to build them for robot­ic per­cep­tion, action, and move­ment. These aim to get to a place where, for instance, you can tell a robot to pick up a banana and it’s got enough gen­er­al knowl­edge to not only spot the banana with its sen­sor but fig­ure out what to do with it, with­out requir­ing spe­cif­ic algo­rith­mic input. It’s quite an inter­est­ing advance­ment in robot­ics because it also allows the AI to learn from phys­i­cal and real-world experience.

AI men­tors could be anoth­er big thing. AIs are already being used to gen­er­ate learn­ing mate­r­i­al, but you can imag­ine a world where an AI scans your CV, and is able to sug­gest train­ing, read­ing mate­r­i­al, and so on. AIs could also act as tutors, guid­ing you through edu­ca­tion, sug­gest­ing ways of learn­ing, doing exams and assess­ments, and fol­low­ing your devel­op­ment. Schools are already pilot­ing the use of GenAI as tutors, for exam­ple, David Game Col­lege in Lon­don10 is tri­alling an accel­er­at­ed GCSE in which stu­dents are only taught by AI. You’re get­ting into and chang­ing the entire edu­ca­tion­al loop then.

The ques­tion might then be: why you would go to uni­ver­si­ty? Why would you even go to school, apart from its social ben­e­fits? It could fun­da­men­tal­ly change how we learn and teach. Some may be con­cerned that we start to build new edu­ca­tion sys­tems that are depen­dent on US tech com­pa­nies, rather than in-coun­try qual­i­fied human beings.

What kind of timescale are we thinking of for these advancements?

I think if we’ve learned any­thing from the last cou­ple of years, it’s that things can hap­pen real­ly fast. Things are nev­er as far-fetched as we would imag­ine it to be – Sci­ence fic­tion has a dis­turb­ing habit of becom­ing sci­ence fact. I would say much of it is dis­turbing­ly close.

Now we need to start think­ing the con­se­quences of this. What is human­i­ty’s role in this future? What do economies look like if humans are tak­en out of the equa­tion? What does truth and democ­ra­cy look like when any­thing can be faked? What does edu­ca­tion, the foun­da­tion of our mod­ern qual­i­ty of life, look like in the future? These are very big, fun­da­men­tal ques­tions that I think no one has the answer to at the moment.

Interview by Marianne Guenot
1https://​www​.cbsnews​.com/​n​e​w​s​/​a​i​r​c​a​n​a​d​a​-​c​h​a​t​b​o​t​-​d​i​s​c​o​u​n​t​-​c​u​s​t​omer/
2https://​research​.ibm​.com/​b​l​o​g​/​n​o​r​t​h​p​o​l​e​-​i​b​m​-​a​i​-chip
3https://​spec​trum​.ieee​.org/​n​e​u​r​o​m​o​r​p​h​i​c​-​c​o​m​p​u​t​i​n​g​-​i​b​m​-​n​o​r​t​hpole
4https://​edi​tion​.cnn​.com/​2​0​2​4​/​0​2​/​0​4​/​a​s​i​a​/​d​e​e​p​f​a​k​e​-​c​f​o​-​s​c​a​m​-​h​o​n​g​-​k​o​n​g​-​i​n​t​l​-​h​n​k​/​i​n​d​e​x​.html
5https://​www​.ft​.com/​c​o​n​t​e​n​t​/​b​9​7​7​e​8​d​4​-​6​6​4​c​-​4​a​e​4​-​8​a​8​e​-​e​b​9​3​b​d​f​785ea
6https://commission.europa.eu/news/ai-act-enters-force-2024–08-01_en#:~:text=On%201%20August%202024%2C%20the,and%20deployment%20in%20the%20EU.
7https://​www​.white​house​.gov/​b​r​i​e​f​i​n​g​-​r​o​o​m​/​p​r​e​s​i​d​e​n​t​i​a​l​-​a​c​t​i​o​n​s​/​2​0​2​3​/​1​0​/​3​0​/​e​x​e​c​u​t​i​v​e​-​o​r​d​e​r​-​o​n​-​t​h​e​-​s​a​f​e​-​s​e​c​u​r​e​-​a​n​d​-​t​r​u​s​t​w​o​r​t​h​y​-​d​e​v​e​l​o​p​m​e​n​t​-​a​n​d​-​u​s​e​-​o​f​-​a​r​t​i​f​i​c​i​a​l​-​i​n​t​e​l​l​i​g​ence/
8https://www.google.com/url?q=https://www.ft.com/content/638f67f7-5375–47fc-b3a7-af7c9e05b9e0&sa=D&source=docs&ust=1723465545920218&usg=AOvVaw2FCcNq-6E4kxQIIipdSSuh
9https://​www​.ibm​.com/​r​o​a​d​m​a​p​s​/​q​u​a​n​t​u​m.pdf
10https://​www​.bbc​.co​.uk/​s​o​u​n​d​s​/​p​l​a​y​/​m​0​0​21x2v

Our world explained with science. Every week, in your inbox.

Get the newsletter