How open-source AI could modernise public services
- AI and LLMs represent a major opportunity to transform public action, in particular by improving the quality and efficiency of services.
- Open AI appears to be an interesting option for modernising digital public services, with risks that remain to be assessed.
- Open AI has many advantages, including complete transparency of the source code, reduced costs and independence of administrations from publishers.
- Closed AI models also have advantages, such as less susceptibility to certain settings being tampered with and better control over how the AI operates.
- It is essential to conduct an in-depth study of the ethical issues involved in using AI in the public sector, particularly to guard against certain biases.
Artificial intelligence (AI), and more specifically large language models (LLMs) represent a major opportunity to transform public services. AI can be used in many areas to improve efficiency, the quality of services provided to citizens and decision-making. However, the implementation of AI in public services presents major challenges. First, the chosen solution must guarantee fair treatment and transparency of decisions and actions on a case or ensure respect for fundamental rights throughout its use. In addition, the rigorous protection of personal data, which is often sensitive in the context of public services, is a significant security issue. Finally, the transparency of decisions is a major factor in the trust placed in the solutions used and their acceptability to citizens. Thus, the use of a solution offering a high level of transparency is an asset in the implementation and acceptance of artificial intelligence solutions. But in view of the complexity of the subject, the criteria for ensuring the expected level of transparency are far from easy to define.
The definition of a free AI is still a subject of debate
The major language models are based on neural networks trained on a very large amount of data. From a sequence of words, they statistically determine the word that best matches the given sequence. By applying this principle recursively, LLMs are able to produce structured texts, giving the impression that the machine is analysing and understanding the question asked.
The text produced will therefore depend on:
- the algorithms used, which will enable the model to weigh the importance of each word in a sentence in relation to the others. This capacity is provided in particular through “transformer” type architectures1.
- the weight assigned to the different neurons, which will enable the network to be activated in order to produce the output data;
- the learning corpus, which has a direct impact on the determination of the weights used by the model.
The 4 principles (use, study, modify, share) associated with free software2 must therefore be applied to all these elements3. This is still a subject of debate and is therefore a source of much confusion4. For example, some AIs claiming to be free have usage restrictions that go against the defined principles5. After a long process, the Open Source Initiative (OSI), which brings together researchers, lawyers, policy makers, activists and representatives of large technology companies, has proposed a definition that correlates the four freedoms associated with free software and the elements on which machine learning models are based.
According to the Open Source Initiative, a free machine learning system must include the following elements6:
- sufficiently detailed information on the data used to train the system, enabling a competent person to build a substantially equivalent system. This information must be available under terms approved by the OSI;
- the source code of the AI, including the inference code to execute the model;
- all the learned parameters that are superimposed on the model architecture to produce an output from a given input.
The publication of the learning corpus is therefore not compulsory, but a detailed description of the latter must be included. It is clear that many models offering excellent performance and describing themselves as open source do not comply with this last point. These are referred to as open-weight models. A comparison of AI models is also available from the Pôle d’Expertise de la Régulation Numérique (PEReN).
What are the risks and benefits associated with the different types of licences?
The source code is human-readable and provides access to the algorithms used. The weights are the result of training and represent the knowledge of the model. In the case of open-weight models, this knowledge can be customised through a fine-tuning process7.

However, this does not allow for total transparency, such as the detection of bias or “poisoning” attacks, which consist of altering the knowledge of a model without these modifications being easily detectable by standard tests89. Only a free model that provides access to its learning corpus guarantees a total level of transparency, in particular by allowing complete control of its training. However, this approach of reconstructing from the sources still requires significant computing resources that few entities are able to acquire.
On 30th October 2023, President Biden issued an executive order entitled Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence to assess the risks and benefits of foundation models for which weights are available. The report resulting from this study10 recognises the benefits of open access to model weights, such as innovation and research, but also highlights the potential risks, including the possibility of malicious use, the removal of safety mechanisms and the impact on competition. The report concludes that the current data is insufficient to definitively determine whether restrictions on open-weight models are justified and recommends active monitoring of these models.
Closed models, even if they do not benefit from the same level of transparency and adaptability as their free or open-weight counterparts, are not without advantages. They are less subject to the risks of manipulation mentioned above because their weights cannot be modified by a third party, the risks to the intellectual property of the training data are borne by the model supplier, the publisher can quickly act on its model in order to react in the event of abuse, thus helping to mitigate the potential risks associated with AI, such as the dissemination of inappropriate content11. However, all this is at the expense of the autonomy that can be had over the AI model.
Should priority be given to AI with an open licence?
The use of open AIs as defined by the OSI has many advantages. First of all, the transparency of their operation is guaranteed, since it is possible to directly access and modify their source code and inspect the training data.
This possibility is a fundamental guarantee since each model used can be subjected to in-depth verification to ensure that the decision-making process complies with current legislation and does not present any discriminatory bias, for example. On the other hand, when AI is used in the context of retrieval-augmented generation (RAG12), the level of transparency that must be required may be lower because the data used to formulate the responses are provided through an algorithm over which it is easier to have the expected level of control. As the body of answers is provided by conventional search algorithms, it is relatively easy to provide the end user with the expected answer, the raw data and their level of confidence. However, this requires a critical eye on the part of the end user.
Even if the State’s missions are relatively specific by their very nature, there are many cases of use that are similar to those that can be found in private companies, namely providing an answer to a question by using a corpus of documents with the help of classic or vector search algorithms based on the concept of similarity13. It is therefore not surprising to see a convergence in the models used in both worlds. For the State, the deciding criterion in the choice of models will therefore be related to the preservation of personal information or sensitive information transmitted to AI models.
The use of open-source solutions makes it possible to drastically reduce expenses
Beyond the aspects mentioned above, the use of open-source solutions also allows the State to disseminate its work so that it can be reused by the public or private sector. For example, the DGFiP has published work on a model for synthesising parliamentary amendments1415. They are thus able to actively share their knowledge within the limits of confidentiality necessary for sovereign missions.
Finally, the use of open-source solutions makes it possible to drastically reduce expenses, by limiting them to technical support without licence costs.
Are there any difficulties in implementing AI under a free licence?
The use of AI under a free licence also presents various challenges. First of all, the implementation of free solutions requires a good understanding of how the underlying models work. In addition to this complexity, there is also the need for technical skills to adapt the models to business needs, to have the data necessary for learning, to configure the model (fine-tuning), if the business application requires it, to deploy it in the administration’s IS and to guarantee the highest level of security.
In addition, their ongoing maintenance and corrective maintenance require a significant investment of time, both to update the models or ensure a satisfactory level of non-regression and to ensure that they function properly. Although the code is free, the use of these AIs often also requires IT infrastructures based on specialised computing units, which can represent an indirect cost. Finally, the quality of open-source models can vary considerably, particularly depending on the business cases to be addressed, and there are no absolute guarantees as to their performance. It is therefore essential to define the expectations precisely with the business teams and to verify the expected results before any version is put into service.
Conclusion
The integration of artificial intelligence within public services represents a unique opportunity to improve the efficiency and quality of services provided to citizens and decision-making in a context of strain on available human resources. Free language models seem to be particularly well-suited to this challenge.
Despite the challenges, the advantages of free AIs are numerous. They promote innovation, reduce costs and strengthen the autonomy of administrations.
However, it is essential to study in depth the ethical issues related to the use of AI in the public sector. It is necessary to put in place processes and methods to guard against algorithmic bias and to guarantee the reasonable use of technologies, ensuring that they are monitored by digital and legal experts, or even by citizens themselves.
Disclaimer: The content of this article is the sole responsibility of its authors and has no scope other than that of information and academic research.