Welcome to LLM4All
Open-source LLMs are catching up with ChatGPT, and they are also key enabling technologies for research (theorizing training algorithms requires a complete control over them), companies (for privacy, technology ownership), governments (sovereignty and dependence) and the large community of individual practitioners who have already enriched the open-source LLM landscape at an unprecedented pace.
We focus in this project on such open-source LLMs that:
- Everyone, preferably with some GPU resources, can deploy in her/his own computers and totally control.
- Will be finetuned to better handle human meetings and conversations (but no chatbots), especially in French.
- Will be incrementally updated with the last pieces of news, emerging lexicon, events.
- Will be connected to the best speech recognition models (Whisper, MMS...) to handle in particular emergency calls in hospitals.
Funding
LLM4All is a project funded by the French ANR (Agence Nationale de la Recherche): project ANR-23-IAS1-0008
Acknowledgements in papers: please use the following sentence: "The authors acknowledge the ANR (French National Research Agency) for its financial support of the LLM4ALL project n°ANR-23-IAS1-0008."
Press
Beyond scientific publications, efforts to disseminate the project results to a wider public are made, including invited talks, keynotes, and press:
- Sciences et Avenir n°935, daté janvier 2025: Grands modèles de langage : trois initiatives françaises pour plus d'éthique et de fiabilité
Consortium
The consortium is composed of, in alphabetical order:
- The APHP hospitals in Paris
- The Linagora company in Paris, focused on open-source solutions for language
- The LIX laboratory in Paris:
- The DaSciM team specialized in data analytics and machine learning
- The LORIA laboratory in Nancy:
- The Multispeech team focused on speech, audio and multimodal signal processing
- The Synalp team (leader of LLM4All) specialized in Natural Language Processing
with a strong support from the Hugginface company on LLM training.
Planning
- Start date: Oct 1st, 2023
- Duration: 49 months (until end of November 2027)
Companion projects
Work packages
Nb | Leader | Name |
---|---|---|
WP0 | LORIA | Project management |
WP1 | LORIA | Fine-tuning, continual updating |
WP2 | LIX | Low-cost LLMs |
WP3 | Linagora | LLMs for spoken dialogue |
WP4 | AP-HP | Boosting LLMs with other data |
WP5 | Linagora | Communication, dissemination, exploitation |
Contact
cerisara at loria dot fr
Deliverables
- T0+6 = 1st April 2024
- T0+12 = 1st October 2024
- T0+24 = 1st October 2024
D | date | desc |
---|---|---|
X | 1 avr 24 | DMP |
X | 1 oct 24 | Accord de consortium |
X | 1 avr 25 | Rapport intermediaire a 18 mois |
1 oct 25 | DMP a 24 mois | |
1 nov 27 | Rapport final | |
1 nov 27 | DMP final | |
0.1 | 18 | progress report v1 |
0.1 | 40 | progress report v2 |
0.2 | 6 | DMP v1 |
0.2 | 42 | DMP v2 |
1.1 | 12 | software LLM training and evaluation + report |
1.1 | 24 | software LLM training and evaluation + report |
1.1 | 40 | software LLM training and evaluation + report |
1.2 | 24 | model + release every 2 months |
2.1 | 12 | software LLM low-cost inference, training + report |
2.1 | 24 | software LLM low-cost inference, training + report |
2.1 | 40 | software LLM low-cost inference, training + report |
2.2 | 24 | distilled version of model from WP1 |
2.2 | 40 | distilled version of model from WP1 |
3.1 | 12 | augmented dialogue dataset |
3.1 | 24 | augmented dialogue dataset |
3.2 | 12 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.2 | 24 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.2 | 40 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.3 | 24 | model for dialogue and dialogue summarization |
3.3 | 40 | model for dialogue and dialogue summarization |
4.1 | 12 | SimSAMU dataset |
4.1 | 24 | SimSAMU dataset |
4.1 | 40 | SimSAMU dataset |
4.2 | 12 | soft + report: ASR domain adaptation |
4.2 | 24 | soft + report: ASR domain adaptation |
4.2 | 40 | soft + report: ASR domain adaptation |
4.3 | 24 | ASR models for meetings and ER calls |
4.3 | 40 | ASR models for meetings and ER calls |
4.4 | 12 | soft + report: adaptation to ER calls |
4.4 | 24 | soft + report: adaptation to ER calls |
4.4 | 40 | soft + report: adaptation to ER calls |
4.5 | 24 | LLM for ER calls |
4.5 | 40 | LLM for ER calls |
5.1 | 30 | Workshop |
5.2 | 42 | dissemination report |
5.3 | 12 | exploitation plan |
5.3 | 36 | exploitation plan |
PMT meetings
Our PMT meetings occur at 2PM on the first Thursday of every month at URL https://jitsi.linagora.com/llm4all
- 2nd November 2023
- 7th December 2023
- 11th January 2024
- 1st February 2024
- 7th March 2024
- 4th April 2024
- 13th June 2024
- 4th July 2024
- talk Imed
- 5th September 2024
- talk Linagora
- 3rd October 2024
- 5th December 2024
- talk Yaya
- 9th January 2025
- 6th March 2025
Workshops
- Kickoff meeting: 11th October 2023 at Linagora's offices, Paris
- ANR Workshop: February, 2nd, 2024
- ANR Workshop: April, 25th, 2024
- Y1 meeting: 24th October 2024 at INRIA's offices, Room Floyd, Paris
- Paris Open Source AI Summit
- Paris AI Action Summit
- Journée partenaires du GDR TAL, Mars 2025