Home / Chroniques / How open-source AI could modernise public services
Powerful gust of wind symbolized by dynamic, swirling lines enveloping an open laptop that displays lines of open-source code on the screen
Généré par l'IA / Generated using AI
π Digital π Society π Science and technology

How open-source AI could modernise public services

Christophe Gaie
Christophe Gaie
Head of the Engineering and Digital Innovation Division at the Prime Minister's Office
Laurent Denis
Laurent Denis
Technical Architect in the Prime Minister's Office
Key takeaways
  • AI and LLMs represent a major opportunity to transform public action, in particular by improving the quality and efficiency of services.
  • Open AI appears to be an interesting option for modernising digital public services, with risks that remain to be assessed.
  • Open AI has many advantages, including complete transparency of the source code, reduced costs and independence of administrations from publishers.
  • Closed AI models also have advantages, such as less susceptibility to certain settings being tampered with and better control over how the AI operates.
  • It is essential to conduct an in-depth study of the ethical issues involved in using AI in the public sector, particularly to guard against certain biases.

Arti­fi­cial intel­li­gence (AI), and more specif­i­cal­ly large lan­guage mod­els (LLMs) rep­re­sent a major oppor­tu­ni­ty to trans­form pub­lic ser­vices. AI can be used in many areas to improve effi­cien­cy, the qual­i­ty of ser­vices pro­vid­ed to cit­i­zens and deci­sion-mak­ing. How­ev­er, the imple­men­ta­tion of AI in pub­lic ser­vices presents major chal­lenges. First, the cho­sen solu­tion must guar­an­tee fair treat­ment and trans­paren­cy of deci­sions and actions on a case or ensure respect for fun­da­men­tal rights through­out its use. In addi­tion, the rig­or­ous pro­tec­tion of per­son­al data, which is often sen­si­tive in the con­text of pub­lic ser­vices, is a sig­nif­i­cant secu­ri­ty issue. Final­ly, the trans­paren­cy of deci­sions is a major fac­tor in the trust placed in the solu­tions used and their accept­abil­i­ty to cit­i­zens. Thus, the use of a solu­tion offer­ing a high lev­el of trans­paren­cy is an asset in the imple­men­ta­tion and accep­tance of arti­fi­cial intel­li­gence solu­tions. But in view of the com­plex­i­ty of the sub­ject, the cri­te­ria for ensur­ing the expect­ed lev­el of trans­paren­cy are far from easy to define.

The definition of a free AI is still a subject of debate

The major lan­guage mod­els are based on neur­al net­works trained on a very large amount of data. From a sequence of words, they sta­tis­ti­cal­ly deter­mine the word that best match­es the giv­en sequence. By apply­ing this prin­ci­ple recur­sive­ly, LLMs are able to pro­duce struc­tured texts, giv­ing the impres­sion that the machine is analysing and under­stand­ing the ques­tion asked.

The text pro­duced will there­fore depend on:

  • the algo­rithms used, which will enable the mod­el to weigh the impor­tance of each word in a sen­tence in rela­tion to the oth­ers. This capac­i­ty is pro­vid­ed in par­tic­u­lar through “trans­former” type archi­tec­tures1.
  • the weight assigned to the dif­fer­ent neu­rons, which will enable the net­work to be acti­vat­ed in order to pro­duce the out­put data;
  • the learn­ing cor­pus, which has a direct impact on the deter­mi­na­tion of the weights used by the model.

The 4 prin­ci­ples (use, study, mod­i­fy, share) asso­ci­at­ed with free soft­ware2 must there­fore be applied to all these ele­ments3. This is still a sub­ject of debate and is there­fore a source of much con­fu­sion4. For exam­ple, some AIs claim­ing to be free have usage restric­tions that go against the defined prin­ci­ples5. After a long process, the Open Source Ini­tia­tive (OSI), which brings togeth­er researchers, lawyers, pol­i­cy mak­ers, activists and rep­re­sen­ta­tives of large tech­nol­o­gy com­pa­nies, has pro­posed a def­i­n­i­tion that cor­re­lates the four free­doms asso­ci­at­ed with free soft­ware and the ele­ments on which machine learn­ing mod­els are based.

Accord­ing to the Open Source Ini­tia­tive, a free machine learn­ing sys­tem must include the fol­low­ing ele­ments6:

  • suf­fi­cient­ly detailed infor­ma­tion on the data used to train the sys­tem, enabling a com­pe­tent per­son to build a sub­stan­tial­ly equiv­a­lent sys­tem. This infor­ma­tion must be avail­able under terms approved by the OSI;
  • the source code of the AI, includ­ing the infer­ence code to exe­cute the model;
  • all the learned para­me­ters that are super­im­posed on the mod­el archi­tec­ture to pro­duce an out­put from a giv­en input.

The pub­li­ca­tion of the learn­ing cor­pus is there­fore not com­pul­so­ry, but a detailed descrip­tion of the lat­ter must be includ­ed. It is clear that many mod­els offer­ing excel­lent per­for­mance and describ­ing them­selves as open source do not com­ply with this last point. These are referred to as open-weight mod­els. A com­par­i­son of AI mod­els is also avail­able from the Pôle d’Ex­per­tise de la Régu­la­tion Numérique (PEReN).

What are the risks and benefits associated with the different types of licences?

The source code is human-read­able and pro­vides access to the algo­rithms used. The weights are the result of train­ing and rep­re­sent the knowl­edge of the mod­el. In the case of open-weight mod­els, this knowl­edge can be cus­tomised through a fine-tun­ing process7.

How­ev­er, this does not allow for total trans­paren­cy, such as the detec­tion of bias or “poi­son­ing” attacks, which con­sist of alter­ing the knowl­edge of a mod­el with­out these mod­i­fi­ca­tions being eas­i­ly detectable by stan­dard tests89. Only a free mod­el that pro­vides access to its learn­ing cor­pus guar­an­tees a total lev­el of trans­paren­cy, in par­tic­u­lar by allow­ing com­plete con­trol of its train­ing. How­ev­er, this approach of recon­struct­ing from the sources still requires sig­nif­i­cant com­put­ing resources that few enti­ties are able to acquire.

On 30th Octo­ber 2023, Pres­i­dent Biden issued an exec­u­tive order enti­tled Safe, Secure, and Trust­wor­thy Devel­op­ment and Use of Arti­fi­cial Intel­li­gence to assess the risks and ben­e­fits of foun­da­tion mod­els for which weights are avail­able. The report result­ing from this study10 recog­nis­es the ben­e­fits of open access to mod­el weights, such as inno­va­tion and research, but also high­lights the poten­tial risks, includ­ing the pos­si­bil­i­ty of mali­cious use, the removal of safe­ty mech­a­nisms and the impact on com­pe­ti­tion. The report con­cludes that the cur­rent data is insuf­fi­cient to defin­i­tive­ly deter­mine whether restric­tions on open-weight mod­els are jus­ti­fied and rec­om­mends active mon­i­tor­ing of these models.

Closed mod­els, even if they do not ben­e­fit from the same lev­el of trans­paren­cy and adapt­abil­i­ty as their free or open-weight coun­ter­parts, are not with­out advan­tages. They are less sub­ject to the risks of manip­u­la­tion men­tioned above because their weights can­not be mod­i­fied by a third par­ty, the risks to the intel­lec­tu­al prop­er­ty of the train­ing data are borne by the mod­el sup­pli­er, the pub­lish­er can quick­ly act on its mod­el in order to react in the event of abuse, thus help­ing to mit­i­gate the poten­tial risks asso­ci­at­ed with AI, such as the dis­sem­i­na­tion of inap­pro­pri­ate con­tent11. How­ev­er, all this is at the expense of the auton­o­my that can be had over the AI model.

Should priority be given to AI with an open licence?

The use of open AIs as defined by the OSI has many advan­tages. First of all, the trans­paren­cy of their oper­a­tion is guar­an­teed, since it is pos­si­ble to direct­ly access and mod­i­fy their source code and inspect the train­ing data.

This pos­si­bil­i­ty is a fun­da­men­tal guar­an­tee since each mod­el used can be sub­ject­ed to in-depth ver­i­fi­ca­tion to ensure that the deci­sion-mak­ing process com­plies with cur­rent leg­is­la­tion and does not present any dis­crim­i­na­to­ry bias, for exam­ple. On the oth­er hand, when AI is used in the con­text of retrieval-aug­ment­ed gen­er­a­tion (RAG12), the lev­el of trans­paren­cy that must be required may be low­er because the data used to for­mu­late the respons­es are pro­vid­ed through an algo­rithm over which it is eas­i­er to have the expect­ed lev­el of con­trol. As the body of answers is pro­vid­ed by con­ven­tion­al search algo­rithms, it is rel­a­tive­ly easy to pro­vide the end user with the expect­ed answer, the raw data and their lev­el of con­fi­dence. How­ev­er, this requires a crit­i­cal eye on the part of the end user.

Even if the State’s mis­sions are rel­a­tive­ly spe­cif­ic by their very nature, there are many cas­es of use that are sim­i­lar to those that can be found in pri­vate com­pa­nies, name­ly pro­vid­ing an answer to a ques­tion by using a cor­pus of doc­u­ments with the help of clas­sic or vec­tor search algo­rithms based on the con­cept of sim­i­lar­i­ty13. It is there­fore not sur­pris­ing to see a con­ver­gence in the mod­els used in both worlds. For the State, the decid­ing cri­te­ri­on in the choice of mod­els will there­fore be relat­ed to the preser­va­tion of per­son­al infor­ma­tion or sen­si­tive infor­ma­tion trans­mit­ted to AI models.

The use of open-source solu­tions makes it pos­si­ble to dras­ti­cal­ly reduce expenses

Beyond the aspects men­tioned above, the use of open-source solu­tions also allows the State to dis­sem­i­nate its work so that it can be reused by the pub­lic or pri­vate sec­tor. For exam­ple, the DGFiP has pub­lished work on a mod­el for syn­the­sis­ing par­lia­men­tary amend­ments1415. They are thus able to active­ly share their knowl­edge with­in the lim­its of con­fi­den­tial­i­ty nec­es­sary for sov­er­eign missions.

Final­ly, the use of open-source solu­tions makes it pos­si­ble to dras­ti­cal­ly reduce expens­es, by lim­it­ing them to tech­ni­cal sup­port with­out licence costs.

Are there any difficulties in implementing AI under a free licence?

The use of AI under a free licence also presents var­i­ous chal­lenges. First of all, the imple­men­ta­tion of free solu­tions requires a good under­stand­ing of how the under­ly­ing mod­els work. In addi­tion to this com­plex­i­ty, there is also the need for tech­ni­cal skills to adapt the mod­els to busi­ness needs, to have the data nec­es­sary for learn­ing, to con­fig­ure the mod­el (fine-tun­ing), if the busi­ness appli­ca­tion requires it, to deploy it in the admin­is­tra­tion’s IS and to guar­an­tee the high­est lev­el of security.

In addi­tion, their ongo­ing main­te­nance and cor­rec­tive main­te­nance require a sig­nif­i­cant invest­ment of time, both to update the mod­els or ensure a sat­is­fac­to­ry lev­el of non-regres­sion and to ensure that they func­tion prop­er­ly. Although the code is free, the use of these AIs often also requires IT infra­struc­tures based on spe­cialised com­put­ing units, which can rep­re­sent an indi­rect cost. Final­ly, the qual­i­ty of open-source mod­els can vary con­sid­er­ably, par­tic­u­lar­ly depend­ing on the busi­ness cas­es to be addressed, and there are no absolute guar­an­tees as to their per­for­mance. It is there­fore essen­tial to define the expec­ta­tions pre­cise­ly with the busi­ness teams and to ver­i­fy the expect­ed results before any ver­sion is put into service.

Conclusion

The inte­gra­tion of arti­fi­cial intel­li­gence with­in pub­lic ser­vices rep­re­sents a unique oppor­tu­ni­ty to improve the effi­cien­cy and qual­i­ty of ser­vices pro­vid­ed to cit­i­zens and deci­sion-mak­ing in a con­text of strain on avail­able human resources. Free lan­guage mod­els seem to be par­tic­u­lar­ly well-suit­ed to this challenge.

Despite the chal­lenges, the advan­tages of free AIs are numer­ous. They pro­mote inno­va­tion, reduce costs and strength­en the auton­o­my of administrations.

How­ev­er, it is essen­tial to study in depth the eth­i­cal issues relat­ed to the use of AI in the pub­lic sec­tor. It is nec­es­sary to put in place process­es and meth­ods to guard against algo­rith­mic bias and to guar­an­tee the rea­son­able use of tech­nolo­gies, ensur­ing that they are mon­i­tored by dig­i­tal and legal experts, or even by cit­i­zens themselves.

Dis­claimer: The con­tent of this arti­cle is the sole respon­si­bil­i­ty of its authors and has no scope oth­er than that of infor­ma­tion and aca­d­e­m­ic research.

1A. Vaswani et al., “Atten­tion Is All You Need”. 2023. [Online]. Avail­able at: https://​arx​iv​.org/​a​b​s​/​1​7​0​6​.​03762
2“Logi­ciel libre” Wikipedia. 14 Novem­ber 2024. [Online]. Avail­able at: https://​fr​.wikipedia​.org/​w​/​i​n​d​e​x​.​p​h​p​?​t​i​t​l​e​=​L​o​g​i​c​i​e​l​_​l​i​b​r​e​&​o​l​d​i​d​=​2​2​0​2​93632
3B. Doer­rfeld, “Be care­ful with “open source” AI”, Lead­Dev. [Online]. Avail­able at: https://​lead​dev​.com/​t​e​c​h​n​i​c​a​l​-​d​i​r​e​c​t​i​o​n​/​b​e​-​c​a​r​e​f​u​l​-​o​p​e​n​-​s​o​u​r​ce-ai
4W. Rhi­an­non, “We final­ly have a def­i­n­i­tion for open-source AI”, MIT Tech­nol­o­gy Review. [Online]. Avail­able at: https://​www​.tech​nol​o​gyre​view​.com/​2​0​2​4​/​0​8​/​2​2​/​1​0​9​7​2​2​4​/​w​e​-​f​i​n​a​l​l​y​-​h​a​v​e​-​a​-​d​e​f​i​n​i​t​i​o​n​-​f​o​r​-​o​p​e​n​-​s​o​u​r​c​e-ai/
5N. Lam­bert, “The koan of an open-source LLM”, Inter­con­nects. [Online]. Avail­able at: https://​www​.inter​con​nects​.ai/​p​/​a​n​-​o​p​e​n​-​s​o​u​r​c​e-llm
6“The Open Source AI Def­i­n­i­tion – 1.0 – Open Source Ini­tia­tive”, Open source ini­tia­tive. [Online]. Avail­able at: https://​open​source​.org/​a​i​/​o​p​e​n​-​s​o​u​r​c​e​-​a​i​-​d​e​f​i​n​ition
7Stéphane Le Calme, “L’équilibre déli­cat entre sécu­rité et inno­va­tion dans l’IA : “ban­nir les mod­èles “open weights” serait un désas­tre”. [Online]. Avail­able at: https://​intel​li​gence​-arti​fi​cielle​.devel​oppez​.com/​a​c​t​u​/​3​5​6​0​1​2​/​T​h​e​-​d​e​l​i​c​a​t​e​-​b​a​l​a​n​c​e​-​b​e​t​w​e​e​n​-​s​a​f​e​t​y​-​a​n​d​-​i​n​n​o​v​a​t​i​o​n​-​i​n​-​A​I​-​b​a​n​n​i​n​g​-​o​p​e​n​-​w​e​i​g​h​t​s​-​m​o​d​e​l​s​-​w​o​u​l​d​-​b​e​-​a​-​d​i​s​a​s​t​e​r​-​a​c​c​o​r​d​i​n​g​-​t​o​-​a​-​r​e​s​e​a​r​c​h​e​r​-​t​h​e​-​B​i​d​e​n​-​a​d​m​i​n​i​s​t​r​a​t​i​o​n​-​i​s​-​c​o​n​s​i​d​e​r​i​n​g​-​b​l​o​c​k​i​n​g​-​a​c​c​e​s​s​-​t​o​-​t​h​e​s​e​-​m​o​d​e​l​s​-​t​o​-​p​r​e​v​e​n​t​-​l​e​s​-​abus/
8“Poi­sonG­PT: des LLM détournés à la racine – Data & IA – Sil​i​con​.fr”. [Online]. Avail­able at: https://​www​.sil​i​con​.fr/​T​h​e​m​a​t​i​q​u​e​/​d​a​t​a​-​i​a​-​1​3​7​2​/​B​r​e​v​e​s​/​P​o​i​s​o​n​G​P​T​-​d​e​s​-​L​L​M​-​d​e​t​o​u​r​n​e​s​-​a​-​l​a​-​r​a​c​i​n​e​-​4​0​2​7​8​3.htm
9“LLM03: Train­ing Data Poi­son­ing – OWASP Top 10 for LLM & Gen­er­a­tive AI Secu­ri­ty”, OWASP. [Online]. Avail­able at: https://​genai​.owasp​.org/​l​l​m​r​i​s​k​/​l​l​m​0​3​-​t​r​a​i​n​i​n​g​-​d​a​t​a​-​p​o​i​s​o​ning/
10NTIA Report, “Dual-Use Foun­da­tion Mod­els with Wide­ly Avail­able Mod­el Weights”, July 2024. [Online]. Avail­able at: https://​www​.ntia​.gov/​s​i​t​e​s​/​d​e​f​a​u​l​t​/​f​i​l​e​s​/​p​u​b​l​i​c​a​t​i​o​n​s​/​n​t​i​a​-​a​i​-​o​p​e​n​-​m​o​d​e​l​-​r​e​p​o​r​t.pdf
11I. Solaiman, ‘Gen­er­a­tive AI Sys­tems Aren’t Just Open or Closed Source,’ Wired. [Online]. Avail­able at: https://​www​.wired​.com/​s​t​o​r​y​/​g​e​n​e​r​a​t​i​v​e​-​a​i​-​s​y​s​t​e​m​s​-​a​r​e​n​t​-​j​u​s​t​-​o​p​e​n​-​o​r​-​c​l​o​s​e​d​-​s​o​urce/
12“What is Retrieval-Aug­ment­ed Gen­er­a­tion (RAG)? | The Com­plete Guide”. [Online]. Avail­able at: https://​www​.k2view​.com/​w​h​a​t​-​i​s​-​r​e​t​r​i​e​v​a​l​-​a​u​g​m​e​n​t​e​d​-​g​e​n​e​r​ation
13M. Syed et E. Rus­si, “Qu’est-ce que la recherche vec­to­rielle?” [Online]. Avail­able at: https://​www​.ibm​.com/​f​r​-​f​r​/​t​o​p​i​c​s​/​v​e​c​t​o​r​-​s​earch
14J. Ges­nouin et al., “LLa­Man­de­ment: Large Lan­guage Mod­els for Sum­ma­riza­tion of French Leg­isla­tive Pro­pos­als”. 2024. [Online]. Avail­able at: https://​arx​iv​.org/​a​b​s​/​2​4​0​1​.​16182
15“LLa­Man­de­ment, le LLM open source du gou­verne­ment français” [Online]. Avail­able at: https://​www​.actu​ia​.com/​a​c​t​u​a​l​i​t​e​/​l​l​a​m​a​n​d​e​m​e​n​t​-​l​e​-​l​l​m​-​o​p​e​n​-​s​o​u​r​c​e​-​d​u​-​g​o​u​v​e​r​n​e​m​e​n​t​-​f​r​a​n​cais/

Our world explained with science. Every week, in your inbox.

Get the newsletter