Home / Chroniques / How open-source AI could modernise public services
Powerful gust of wind symbolized by dynamic, swirling lines enveloping an open laptop that displays lines of open-source code on the screen
Généré par l'IA / Generated using AI
π Digital π Society π Science and technology

How open-source AI could modernise public services

Christophe Gaie
Christophe Gaie
Head of the Engineering and Digital Innovation Division at the Prime Minister's Office
Laurent Denis
Laurent Denis
Technical Architect in the Prime Minister's Office
Key takeaways
  • AI and LLMs represent a major opportunity to transform public action, in particular by improving the quality and efficiency of services.
  • Open AI appears to be an interesting option for modernising digital public services, with risks that remain to be assessed.
  • Open AI has many advantages, including complete transparency of the source code, reduced costs and independence of administrations from publishers.
  • Closed AI models also have advantages, such as less susceptibility to certain settings being tampered with and better control over how the AI operates.
  • It is essential to conduct an in-depth study of the ethical issues involved in using AI in the public sector, particularly to guard against certain biases.

Arti­fi­cial intel­li­gence (AI), and more spe­ci­fi­cal­ly large lan­guage models (LLMs) represent a major oppor­tu­ni­ty to trans­form public ser­vices. AI can be used in many areas to improve effi­cien­cy, the qua­li­ty of ser­vices pro­vi­ded to citi­zens and deci­sion-making. Howe­ver, the imple­men­ta­tion of AI in public ser­vices pre­sents major chal­lenges. First, the cho­sen solu­tion must gua­ran­tee fair treat­ment and trans­pa­ren­cy of deci­sions and actions on a case or ensure res­pect for fun­da­men­tal rights throu­ghout its use. In addi­tion, the rigo­rous pro­tec­tion of per­so­nal data, which is often sen­si­tive in the context of public ser­vices, is a signi­fi­cant secu­ri­ty issue. Final­ly, the trans­pa­ren­cy of deci­sions is a major fac­tor in the trust pla­ced in the solu­tions used and their accep­ta­bi­li­ty to citi­zens. Thus, the use of a solu­tion offe­ring a high level of trans­pa­ren­cy is an asset in the imple­men­ta­tion and accep­tance of arti­fi­cial intel­li­gence solu­tions. But in view of the com­plexi­ty of the sub­ject, the cri­te­ria for ensu­ring the expec­ted level of trans­pa­ren­cy are far from easy to define.

The definition of a free AI is still a subject of debate

The major lan­guage models are based on neu­ral net­works trai­ned on a very large amount of data. From a sequence of words, they sta­tis­ti­cal­ly deter­mine the word that best matches the given sequence. By applying this prin­ciple recur­si­ve­ly, LLMs are able to pro­duce struc­tu­red texts, giving the impres­sion that the machine is ana­ly­sing and unders­tan­ding the ques­tion asked.

The text pro­du­ced will the­re­fore depend on :

  • the algo­rithms used, which will enable the model to weigh the impor­tance of each word in a sen­tence in rela­tion to the others. This capa­ci­ty is pro­vi­ded in par­ti­cu­lar through “trans­for­mer” type archi­tec­tures1.
  • the weight assi­gned to the dif­ferent neu­rons, which will enable the net­work to be acti­va­ted in order to pro­duce the out­put data ;
  • the lear­ning cor­pus, which has a direct impact on the deter­mi­na­tion of the weights used by the model.

The 4 prin­ciples (use, stu­dy, modi­fy, share) asso­cia­ted with free soft­ware2 must the­re­fore be applied to all these ele­ments3. This is still a sub­ject of debate and is the­re­fore a source of much confu­sion4. For example, some AIs clai­ming to be free have usage res­tric­tions that go against the defi­ned prin­ciples5. After a long pro­cess, the Open Source Ini­tia­tive (OSI), which brings toge­ther resear­chers, lawyers, poli­cy makers, acti­vists and repre­sen­ta­tives of large tech­no­lo­gy com­pa­nies, has pro­po­sed a defi­ni­tion that cor­re­lates the four free­doms asso­cia­ted with free soft­ware and the ele­ments on which machine lear­ning models are based.

Accor­ding to the Open Source Ini­tia­tive, a free machine lear­ning sys­tem must include the fol­lo­wing ele­ments6 :

  • suf­fi­cient­ly detai­led infor­ma­tion on the data used to train the sys­tem, enabling a com­petent per­son to build a sub­stan­tial­ly equi­va­lent sys­tem. This infor­ma­tion must be avai­lable under terms appro­ved by the OSI ;
  • the source code of the AI, inclu­ding the infe­rence code to exe­cute the model ;
  • all the lear­ned para­me­ters that are super­im­po­sed on the model archi­tec­ture to pro­duce an out­put from a given input.

The publi­ca­tion of the lear­ning cor­pus is the­re­fore not com­pul­so­ry, but a detai­led des­crip­tion of the lat­ter must be inclu­ded. It is clear that many models offe­ring excellent per­for­mance and des­cri­bing them­selves as open source do not com­ply with this last point. These are refer­red to as open-weight models. A com­pa­ri­son of AI models is also avai­lable from the Pôle d’Ex­per­tise de la Régu­la­tion Numé­rique (PER­eN).

What are the risks and benefits associated with the different types of licences ?

The source code is human-rea­dable and pro­vides access to the algo­rithms used. The weights are the result of trai­ning and represent the know­ledge of the model. In the case of open-weight models, this know­ledge can be cus­to­mi­sed through a fine-tuning pro­cess7.

Howe­ver, this does not allow for total trans­pa­ren­cy, such as the detec­tion of bias or “poi­so­ning” attacks, which consist of alte­ring the know­ledge of a model without these modi­fi­ca­tions being easi­ly detec­table by stan­dard tests89. Only a free model that pro­vides access to its lear­ning cor­pus gua­ran­tees a total level of trans­pa­ren­cy, in par­ti­cu­lar by allo­wing com­plete control of its trai­ning. Howe­ver, this approach of recons­truc­ting from the sources still requires signi­fi­cant com­pu­ting resources that few enti­ties are able to acquire.

On 30th Octo­ber 2023, Pre­sident Biden issued an exe­cu­tive order entit­led Safe, Secure, and Trust­wor­thy Deve­lop­ment and Use of Arti­fi­cial Intel­li­gence to assess the risks and bene­fits of foun­da­tion models for which weights are avai­lable. The report resul­ting from this stu­dy10 reco­gnises the bene­fits of open access to model weights, such as inno­va­tion and research, but also high­lights the poten­tial risks, inclu­ding the pos­si­bi­li­ty of mali­cious use, the remo­val of safe­ty mecha­nisms and the impact on com­pe­ti­tion. The report concludes that the cur­rent data is insuf­fi­cient to defi­ni­ti­ve­ly deter­mine whe­ther res­tric­tions on open-weight models are jus­ti­fied and recom­mends active moni­to­ring of these models.

Clo­sed models, even if they do not bene­fit from the same level of trans­pa­ren­cy and adap­ta­bi­li­ty as their free or open-weight coun­ter­parts, are not without advan­tages. They are less sub­ject to the risks of mani­pu­la­tion men­tio­ned above because their weights can­not be modi­fied by a third par­ty, the risks to the intel­lec­tual pro­per­ty of the trai­ning data are borne by the model sup­plier, the publi­sher can qui­ck­ly act on its model in order to react in the event of abuse, thus hel­ping to miti­gate the poten­tial risks asso­cia­ted with AI, such as the dis­se­mi­na­tion of inap­pro­priate content11. Howe­ver, all this is at the expense of the auto­no­my that can be had over the AI model.

Should priority be given to AI with an open licence ?

The use of open AIs as defi­ned by the OSI has many advan­tages. First of all, the trans­pa­ren­cy of their ope­ra­tion is gua­ran­teed, since it is pos­sible to direct­ly access and modi­fy their source code and ins­pect the trai­ning data.

This pos­si­bi­li­ty is a fun­da­men­tal gua­ran­tee since each model used can be sub­jec­ted to in-depth veri­fi­ca­tion to ensure that the deci­sion-making pro­cess com­plies with cur­rent legis­la­tion and does not present any dis­cri­mi­na­to­ry bias, for example. On the other hand, when AI is used in the context of retrie­val-aug­men­ted gene­ra­tion (RAG12), the level of trans­pa­ren­cy that must be requi­red may be lower because the data used to for­mu­late the res­ponses are pro­vi­ded through an algo­rithm over which it is easier to have the expec­ted level of control. As the body of ans­wers is pro­vi­ded by conven­tio­nal search algo­rithms, it is rela­ti­ve­ly easy to pro­vide the end user with the expec­ted ans­wer, the raw data and their level of confi­dence. Howe­ver, this requires a cri­ti­cal eye on the part of the end user.

Even if the Sta­te’s mis­sions are rela­ti­ve­ly spe­ci­fic by their very nature, there are many cases of use that are simi­lar to those that can be found in pri­vate com­pa­nies, name­ly pro­vi­ding an ans­wer to a ques­tion by using a cor­pus of docu­ments with the help of clas­sic or vec­tor search algo­rithms based on the concept of simi­la­ri­ty13. It is the­re­fore not sur­pri­sing to see a conver­gence in the models used in both worlds. For the State, the deci­ding cri­te­rion in the choice of models will the­re­fore be rela­ted to the pre­ser­va­tion of per­so­nal infor­ma­tion or sen­si­tive infor­ma­tion trans­mit­ted to AI models.

The use of open-source solu­tions makes it pos­sible to dras­ti­cal­ly reduce expenses

Beyond the aspects men­tio­ned above, the use of open-source solu­tions also allows the State to dis­se­mi­nate its work so that it can be reu­sed by the public or pri­vate sec­tor. For example, the DGFiP has publi­shed work on a model for syn­the­si­sing par­lia­men­ta­ry amend­ments1415. They are thus able to acti­ve­ly share their know­ledge within the limits of confi­den­tia­li­ty neces­sa­ry for sove­rei­gn missions.

Final­ly, the use of open-source solu­tions makes it pos­sible to dras­ti­cal­ly reduce expenses, by limi­ting them to tech­ni­cal sup­port without licence costs.

Are there any difficulties in implementing AI under a free licence ?

The use of AI under a free licence also pre­sents various chal­lenges. First of all, the imple­men­ta­tion of free solu­tions requires a good unders­tan­ding of how the under­lying models work. In addi­tion to this com­plexi­ty, there is also the need for tech­ni­cal skills to adapt the models to busi­ness needs, to have the data neces­sa­ry for lear­ning, to confi­gure the model (fine-tuning), if the busi­ness appli­ca­tion requires it, to deploy it in the admi­nis­tra­tion’s IS and to gua­ran­tee the highest level of security.

In addi­tion, their ongoing main­te­nance and cor­rec­tive main­te­nance require a signi­fi­cant invest­ment of time, both to update the models or ensure a satis­fac­to­ry level of non-regres­sion and to ensure that they func­tion pro­per­ly. Although the code is free, the use of these AIs often also requires IT infra­struc­tures based on spe­cia­li­sed com­pu­ting units, which can represent an indi­rect cost. Final­ly, the qua­li­ty of open-source models can vary consi­de­ra­bly, par­ti­cu­lar­ly depen­ding on the busi­ness cases to be addres­sed, and there are no abso­lute gua­ran­tees as to their per­for­mance. It is the­re­fore essen­tial to define the expec­ta­tions pre­ci­se­ly with the busi­ness teams and to veri­fy the expec­ted results before any ver­sion is put into service.

Conclusion

The inte­gra­tion of arti­fi­cial intel­li­gence within public ser­vices repre­sents a unique oppor­tu­ni­ty to improve the effi­cien­cy and qua­li­ty of ser­vices pro­vi­ded to citi­zens and deci­sion-making in a context of strain on avai­lable human resources. Free lan­guage models seem to be par­ti­cu­lar­ly well-sui­ted to this challenge.

Des­pite the chal­lenges, the advan­tages of free AIs are nume­rous. They pro­mote inno­va­tion, reduce costs and streng­then the auto­no­my of administrations.

Howe­ver, it is essen­tial to stu­dy in depth the ethi­cal issues rela­ted to the use of AI in the public sec­tor. It is neces­sa­ry to put in place pro­cesses and methods to guard against algo­rith­mic bias and to gua­ran­tee the rea­so­nable use of tech­no­lo­gies, ensu­ring that they are moni­to­red by digi­tal and legal experts, or even by citi­zens themselves.

Dis­clai­mer : The content of this article is the sole res­pon­si­bi­li­ty of its authors and has no scope other than that of infor­ma­tion and aca­de­mic research.

1A. Vas­wa­ni et al., “Atten­tion Is All You Need”. 2023. [Online]. Avai­lable at : https://​arxiv​.org/​a​b​s​/​1​7​0​6​.​03762
2“Logi­ciel libre” Wiki­pe­dia. 14 Novem­ber 2024. [Online]. Avai­lable at : https://​fr​.wiki​pe​dia​.org/​w​/​i​n​d​e​x​.​p​h​p​?​t​i​t​l​e​=​L​o​g​i​c​i​e​l​_​l​i​b​r​e​&​o​l​d​i​d​=​2​2​0​2​93632
3B. Doerr­feld, “Be care­ful with “open source” AI”, Lead­Dev. [Online]. Avai­lable at : https://​lead​dev​.com/​t​e​c​h​n​i​c​a​l​-​d​i​r​e​c​t​i​o​n​/​b​e​-​c​a​r​e​f​u​l​-​o​p​e​n​-​s​o​u​r​ce-ai
4W. Rhian­non, “We final­ly have a defi­ni­tion for open-source AI”, MIT Tech­no­lo­gy Review. [Online]. Avai­lable at : https://​www​.tech​no​lo​gy​re​view​.com/​2​0​2​4​/​0​8​/​2​2​/​1​0​9​7​2​2​4​/​w​e​-​f​i​n​a​l​l​y​-​h​a​v​e​-​a​-​d​e​f​i​n​i​t​i​o​n​-​f​o​r​-​o​p​e​n​-​s​o​u​r​c​e-ai/
5N. Lam­bert, “The koan of an open-source LLM”, Inter­con­nects. [Online]. Avai­lable at : https://​www​.inter​con​nects​.ai/​p​/​a​n​-​o​p​e​n​-​s​o​u​r​c​e-llm
6“The Open Source AI Defi­ni­tion – 1.0 – Open Source Ini­tia­tive”, Open source ini­tia­tive. [Online]. Avai­lable at : https://​open​source​.org/​a​i​/​o​p​e​n​-​s​o​u​r​c​e​-​a​i​-​d​e​f​i​n​ition
7Sté­phane Le Calme, “L’équilibre déli­cat entre sécu­ri­té et inno­va­tion dans l’IA : “ban­nir les modèles “open weights” serait un désastre”. [Online]. Avai­lable at : https://​intel​li​gence​-arti​fi​cielle​.deve​lop​pez​.com/​a​c​t​u​/​3​5​6​0​1​2​/​T​h​e​-​d​e​l​i​c​a​t​e​-​b​a​l​a​n​c​e​-​b​e​t​w​e​e​n​-​s​a​f​e​t​y​-​a​n​d​-​i​n​n​o​v​a​t​i​o​n​-​i​n​-​A​I​-​b​a​n​n​i​n​g​-​o​p​e​n​-​w​e​i​g​h​t​s​-​m​o​d​e​l​s​-​w​o​u​l​d​-​b​e​-​a​-​d​i​s​a​s​t​e​r​-​a​c​c​o​r​d​i​n​g​-​t​o​-​a​-​r​e​s​e​a​r​c​h​e​r​-​t​h​e​-​B​i​d​e​n​-​a​d​m​i​n​i​s​t​r​a​t​i​o​n​-​i​s​-​c​o​n​s​i​d​e​r​i​n​g​-​b​l​o​c​k​i​n​g​-​a​c​c​e​s​s​-​t​o​-​t​h​e​s​e​-​m​o​d​e​l​s​-​t​o​-​p​r​e​v​e​n​t​-​l​e​s​-​abus/
8“Poi­sonGPT : des LLM détour­nés à la racine – Data & IA – Sili​con​.fr”. [Online]. Avai­lable at : https://​www​.sili​con​.fr/​T​h​e​m​a​t​i​q​u​e​/​d​a​t​a​-​i​a​-​1​3​7​2​/​B​r​e​v​e​s​/​P​o​i​s​o​n​G​P​T​-​d​e​s​-​L​L​M​-​d​e​t​o​u​r​n​e​s​-​a​-​l​a​-​r​a​c​i​n​e​-​4​0​2​7​8​3.htm
9“LLM03 : Trai­ning Data Poi­so­ning – OWASP Top 10 for LLM & Gene­ra­tive AI Secu­ri­ty”, OWASP. [Online]. Avai­lable at : https://​genai​.owasp​.org/​l​l​m​r​i​s​k​/​l​l​m​0​3​-​t​r​a​i​n​i​n​g​-​d​a​t​a​-​p​o​i​s​o​ning/
10NTIA Report, “Dual-Use Foun­da­tion Models with Wide­ly Avai­lable Model Weights”, July 2024. [Online]. Avai­lable at : https://​www​.ntia​.gov/​s​i​t​e​s​/​d​e​f​a​u​l​t​/​f​i​l​e​s​/​p​u​b​l​i​c​a​t​i​o​n​s​/​n​t​i​a​-​a​i​-​o​p​e​n​-​m​o​d​e​l​-​r​e​p​o​r​t.pdf
11I. Solai­man, ‘Gene­ra­tive AI Sys­tems Aren’t Just Open or Clo­sed Source,’ Wired. [Online]. Avai­lable at : https://​www​.wired​.com/​s​t​o​r​y​/​g​e​n​e​r​a​t​i​v​e​-​a​i​-​s​y​s​t​e​m​s​-​a​r​e​n​t​-​j​u​s​t​-​o​p​e​n​-​o​r​-​c​l​o​s​e​d​-​s​o​urce/
12“What is Retrie­val-Aug­men­ted Gene­ra­tion (RAG)? | The Com­plete Guide”. [Online]. Avai­lable at : https://​www​.k2view​.com/​w​h​a​t​-​i​s​-​r​e​t​r​i​e​v​a​l​-​a​u​g​m​e​n​t​e​d​-​g​e​n​e​r​ation
13M. Syed et E. Rus­si, “Qu’est-ce que la recherche vec­to­rielle?” [Online]. Avai­lable at : https://​www​.ibm​.com/​f​r​-​f​r​/​t​o​p​i​c​s​/​v​e​c​t​o​r​-​s​earch
14J. Ges­nouin et al., “LLa­Man­de­ment : Large Lan­guage Models for Sum­ma­ri­za­tion of French Legis­la­tive Pro­po­sals”. 2024. [Online]. Avai­lable at : https://​arxiv​.org/​a​b​s​/​2​4​0​1​.​16182
15“LLa­Man­de­ment, le LLM open source du gou­ver­ne­ment fran­çais” [Online]. Avai­lable at : https://​www​.actuia​.com/​a​c​t​u​a​l​i​t​e​/​l​l​a​m​a​n​d​e​m​e​n​t​-​l​e​-​l​l​m​-​o​p​e​n​-​s​o​u​r​c​e​-​d​u​-​g​o​u​v​e​r​n​e​m​e​n​t​-​f​r​a​n​cais/

Support accurate information rooted in the scientific method.

Donate