Welcome to LLM4All
Open-source LLMs are catching up with ChatGPT, and they are also key enabling technologies for research (theorizing training algorithms requires a complete control over them), companies (for privacy, technology ownership), governments (sovereignty and dependence) and the large community of individual practitioners who have already enriched the open-source LLM landscape at an unprecedented pace.
We focus in this project on such open-source LLMs that:
- Everyone, preferably with some GPU resources, can deploy in her/his own computers and totally control.
- Will be finetuned to better handle human meetings and conversations (but no chatbots), especially in French.
- Will be incrementally updated with the last pieces of news, emerging lexicon, events.
- Will be connected to the best speech recognition models (Whisper, MMS...) to handle in particular emergency calls in hospitals.
Funding
LLM4All is a project funded by the French ANR (Agence Nationale de la Recherche).
Consortium
The consortium is composed of, in alphabetical order:
- The APHP hospitals in Paris
- The Linagora company in Paris, focused on open-source solutions for language
- The LIX laboratory in Paris:
- The DaSciM team specialized in data analytics and machine learning
- The LORIA laboratory in Nancy:
- The Multispeech team focused on speech, audio and multimodal signal processing
- The Synalp team (leader of LLM4All) specialized in Natural Language Processing
with a strong support from the Hugginface company on LLM training.
Planning
- Start date: Oct 1st, 2023
- Duration: 42 months (until end of March 2027)
Companion projects
Work packages
Nb | Leader | Name |
---|---|---|
WP0 | LORIA | Project management |
WP1 | LORIA | Fine-tuning, continual updating |
WP2 | LIX | Low-cost LLMs |
WP3 | Linagora | LLMs for spoken dialogue |
WP4 | AP-HP | Boosting LLMs with other data |
WP5 | Linagora | Communication, dissemination, exploitation |
Contact
cerisara at loria dot fr
Deliverables
- T0+6 = 1st April 2024
- T0+12 = 1st October 2024
D | date | desc |
---|---|---|
1/4/24 | DMP | |
1/10/24 | Accord de consortium | |
1/4/25 | Rapport intermediaire a 18 mois | |
1/10/25 | DMP a 18 mois | |
31/3/27 | Rapport final | |
31/3/27 | DMP final | |
0.1 | 18 | progress report v1 |
0.1 | 40 | progress report v2 |
0.2 | 6 | DMP v1 |
0.2 | 42 | DMP v2 |
1.1 | 12 | software LLM training and evaluation + report |
1.1 | 24 | software LLM training and evaluation + report |
1.1 | 40 | software LLM training and evaluation + report |
1.2 | 24 | model + release every 2 months |
2.1 | 12 | software LLM low-cost inference, training + report |
2.1 | 24 | software LLM low-cost inference, training + report |
2.1 | 40 | software LLM low-cost inference, training + report |
2.2 | 24 | distilled version of model from WP1 |
2.2 | 40 | distilled version of model from WP1 |
3.1 | 12 | augmented dialogue dataset |
3.1 | 24 | augmented dialogue dataset |
3.2 | 12 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.2 | 24 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.2 | 40 | soft + report: adaptation of LLM to dialogue + dialogue summarization |
3.3 | 24 | model for dialogue and dialogue summarization |
3.3 | 40 | model for dialogue and dialogue summarization |
4.1 | 12 | SimSAMU dataset |
4.1 | 24 | SimSAMU dataset |
4.1 | 40 | SimSAMU dataset |
4.2 | 12 | soft + report: ASR domain adaptation |
4.2 | 24 | soft + report: ASR domain adaptation |
4.2 | 40 | soft + report: ASR domain adaptation |
4.3 | 24 | ASR models for meetings and ER calls |
4.3 | 40 | ASR models for meetings and ER calls |
4.4 | 12 | soft + report: adaptation to ER calls |
4.4 | 24 | soft + report: adaptation to ER calls |
4.4 | 40 | soft + report: adaptation to ER calls |
4.5 | 24 | LLM for ER calls |
4.5 | 40 | LLM for ER calls |
5.1 | 30 | Workshop |
5.2 | 42 | dissemination report |
5.3 | 12 | exploitation plan |
5.3 | 36 | exploitation plan |
PMT meetings
Our PMT meetings occur at 2PM on the first Thursday of every month at URL https://jitsi.linagora.com/llm4all
- 2nd November 2023
- 7th December 2023
- 11th January 2024
- 1st February 2024
- 7th March 2024
- 4th April 2024
- 13th June 2024
- 4th July 2024
- talk Imed
- 5th September 2024
- talk Linagora
- 3rd October 2024
Workshop
- Kickoff meeting: 11th October 2023 at Linagora's offices, Paris
- ANR Workshop: February, 2nd, 2024
- ANR Workshop: April, 25th, 2024
- Y1 meeting: 24th October 2024 at INRIA's offices, Room Floyd, Paris
Beg | End | Who | About |
---|---|---|---|
10:15 | 10:30 | accueil | |
10:30 | 10:45 | LORIA | WP1 |
10:45 | 11:05 | all | discussions |
11:05 | 11:35 | Linagora | WP3 + WP5 |
11:35 | 11:55 | all | discussions |
12:30 | 13:30 | lunch | |
14:20 | 14:50 | LIX | WP2 |
14:50 | 15:10 | all | discussions |
15:10 | 15:40 | APHP | WP4 |
15:40 | 16:00 | all | discussions |
16:00 | 17:00 | all | future |