Welcome to LLM4All

Open-source LLMs are catching up with ChatGPT, and they are also key enabling technologies for research (theorizing training algorithms requires a complete control over them), companies (for privacy, technology ownership), governments (sovereignty and dependence) and the large community of individual practitioners who have already enriched the open-source LLM landscape at an unprecedented pace.
We focus in this project on such open-source LLMs that:
- Everyone, preferably with some GPU resources, can deploy in her/his own computers and totally control.
- Will be finetuned to better handle human meetings and conversations (but no chatbots), especially in French.
- Will be incrementally updated with the last pieces of news, emerging lexicon, events.
- Will be connected to the best speech recognition models (Whisper, MMS...) to handle in particular emergency calls in hospitals.
Funding
LLM4All is a project funded by the French ANR (Agence Nationale de la Recherche): project ANR-23-IAS1-0008
Acknowledgements in papers: please use the following sentence: "The authors acknowledge the ANR (French National Research Agency) for its financial support of the LLM4ALL project n°ANR-23-IAS1-0008."
Press
Beyond scientific publications, efforts to disseminate the project results to a wider public are made, including invited talks, keynotes, and press:
- Sciences et Avenir n°935, daté janvier 2025: Grands modèles de langage : trois initiatives françaises pour plus d'éthique et de fiabilité
Consortium
The consortium is composed of, in alphabetical order:
- The APHP hospitals in Paris
- The Linagora company in Paris, focused on open-source solutions for language
- The LIX laboratory in Paris:
- The DaSciM team specialized in data analytics and machine learning
- The LORIA laboratory in Nancy:
- The Multispeech team focused on speech, audio and multimodal signal processing
- The Synalp team (leader of LLM4All) specialized in Natural Language Processing
with a strong support from the Hugginface company on LLM training.
Planning
- Start date: Oct 1st, 2023
- Duration: 49 months (until end of November 2027)
Companion projects
Work packages

| Nb | Leader | Name |
|---|---|---|
| WP0 | LORIA | Project management |
| WP1 | LORIA | Fine-tuning, continual updating |
| WP2 | LIX | Low-cost LLMs |
| WP3 | Linagora | LLMs for spoken dialogue |
| WP4 | AP-HP | Boosting LLMs with other data |
| WP5 | Linagora | Communication, dissemination, exploitation |
Contact
cerisara at loria dot fr
Deliverables
- T0+6 = 1st April 2024
- T0+12 = 1st October 2024
- T0+24 = 1st October 2025; All M24 deliverables as a single PDF
- T0+36 = 1st October 2026
- T0+42 = 1st April 2027
| D | date | desc |
|---|---|---|
| X | 1 avr 24 | DMP |
| X | 1 oct 24 | Accord de consortium |
| X | 1 avr 25 | Rapport intermediaire a 18 mois |
| 1 oct 25 | DMP a 24 mois | |
| 1 nov 27 | Rapport final | |
| 1 nov 27 | DMP final | |
| 0.1 | 18 | progress report v1 |
| 0.1 | 40 | progress report v2 |
| 0.2 | 6 | DMP |
| 0.2 | 24 | DMP |
| 0.2 | 42 | DMP |
| 1.1 | 12 | software LLM training and evaluation + report |
| 1.1 | 24 | software LLM training and evaluation + report |
| 1.1 | 40 | software LLM training and evaluation + report |
| 1.2 | 24 | model + release every 2 months |
| 2.1 | 12 | software LLM low-cost inference, training + report |
| 2.1 | 24 | software LLM low-cost inference, training + report |
| 2.1 | 40 | software LLM low-cost inference, training + report |
| 2.2 | 24 | distilled version of model from WP1 |
| 2.2 | 40 | distilled version of model from WP1 |
| 3.1 | 12 | augmented dialogue dataset |
| 3.1 | 24 | augmented dialogue dataset |
| 3.2 | 12 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
| 3.2 | 24 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
| 3.2 | 40 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
| 3.3 | 24 | model for dialogue and dialogue summarization |
| 3.3 | 40 | model for dialogue and dialogue summarization |
| 4.1 | 12 | SimSAMU dataset |
| 4.1 | 24 | SimSAMU dataset |
| 4.1 | 40 | SimSAMU dataset |
| 4.2 | 12 | soft + report: ASR domain adaptation |
| 4.2 | 24 | soft + report: ASR domain adaptation |
| 4.2 | 40 | soft + report: ASR domain adaptation |
| 4.3 | 24 | ASR models for meetings and ER calls |
| 4.3 | 40 | ASR models for meetings and ER calls |
| 4.4 | 12 | soft + report: adaptation to ER calls |
| 4.4 | 24 | soft + report: adaptation to ER calls |
| 4.4 | 40 | soft + report: adaptation to ER calls |
| 4.5 | 24 | LLM for ER calls |
| 4.5 | 40 | LLM for ER calls |
| 5.1 | 30 | Workshop |
| 5.2 | 42 | dissemination report |
| 5.3 | 12 | exploitation plan |
| 5.3 | 36 | exploitation plan |
PMT meetings
Our PMT meetings occur at 2PM on the first Thursday of every month at URL https://jitsi.linagora.com/llm4all
- 2nd November 2023
- 7th December 2023
- 11th January 2024
- 1st February 2024
- 7th March 2024
- 4th April 2024
- 13th June 2024
- 4th July 2024
- talk Imed
- 5th September 2024
- talk Linagora
- 3rd October 2024
- 5th December 2024
- talk Yaya
- 9th January 2025
- 6th March 2025
- 4th September 2025
Workshops
- Kickoff meeting: 11th October 2023 at Linagora's offices, Paris
- ANR Workshop: February, 2nd, 2024
- ANR Workshop: April, 25th, 2024
- Y1 meeting: 24th October 2024 at INRIA's offices, Room Floyd, Paris
- Paris Open Source AI Summit
- Paris AI Action Summit
- Journée partenaires du GDR TAL, Mars 2025