Welcome to LLM4All

Open-source LLMs are catching up with ChatGPT, and they are also key enabling technologies for research (theorizing training algorithms requires a complete control over them), companies (for privacy, technology ownership), governments (sovereignty and dependence) and the large community of individual practitioners who have already enriched the open-source LLM landscape at an unprecedented pace.

We focus in this project on such open-source LLMs that:

  • Everyone, preferably with some GPU resources, can deploy in her/his own computers and totally control.
  • Will be finetuned to better handle human meetings and conversations (but no chatbots), especially in French.
  • Will be incrementally updated with the last pieces of news, emerging lexicon, events.
  • Will be connected to the best speech recognition models (Whisper, MMS...) to handle in particular emergency calls in hospitals.

Funding

LLM4All is a project funded by the French ANR (Agence Nationale de la Recherche).

Consortium

The consortium is composed of, in alphabetical order:

with a strong support from the Hugginface company on LLM training.

Planning

  • Start date: Oct 1st, 2023
  • Duration: 42 months (until end of March 2027)

Companion projects

Work packages

Nb Leader Name
WP0 LORIA Project management
WP1 LORIA Fine-tuning, continual updating
WP2 LIX Low-cost LLMs
WP3 Linagora LLMs for spoken dialogue
WP4 AP-HP Boosting LLMs with other data
WP5 Linagora Communication, dissemination, exploitation

Contact

cerisara at loria dot fr


Deliverables

  • T0+6 = 1st April 2024
  • T0+12 = 1st October 2024
D date desc
1/4/24 DMP
1/10/24 Accord de consortium
1/4/25 Rapport intermediaire a 18 mois
1/10/25 DMP a 18 mois
31/3/27 Rapport final
31/3/27 DMP final
0.1 18 progress report v1
0.1 40 progress report v2
0.2 6 DMP v1
0.2 42 DMP v2
1.1 12 software LLM training and evaluation + report
1.1 24 software LLM training and evaluation + report
1.1 40 software LLM training and evaluation + report
1.2 24 model + release every 2 months
2.1 12 software LLM low-cost inference, training + report
2.1 24 software LLM low-cost inference, training + report
2.1 40 software LLM low-cost inference, training + report
2.2 24 distilled version of model from WP1
2.2 40 distilled version of model from WP1
3.1 12 augmented dialogue dataset
3.1 24 augmented dialogue dataset
3.2 12 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.2 24 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.2 40 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.3 24 model for dialogue and dialogue summarization
3.3 40 model for dialogue and dialogue summarization
4.1 12 SimSAMU dataset
4.1 24 SimSAMU dataset
4.1 40 SimSAMU dataset
4.2 12 soft + report: ASR domain adaptation
4.2 24 soft + report: ASR domain adaptation
4.2 40 soft + report: ASR domain adaptation
4.3 24 ASR models for meetings and ER calls
4.3 40 ASR models for meetings and ER calls
4.4 12 soft + report: adaptation to ER calls
4.4 24 soft + report: adaptation to ER calls
4.4 40 soft + report: adaptation to ER calls
4.5 24 LLM for ER calls
4.5 40 LLM for ER calls
5.1 30 Workshop
5.2 42 dissemination report
5.3 12 exploitation plan
5.3 36 exploitation plan

PMT meetings

Our PMT meetings occur at 2PM on the first Thursday of every month at URL https://jitsi.linagora.com/llm4all

  • 2nd November 2023
  • 7th December 2023
  • 11th January 2024
  • 1st February 2024
  • 7th March 2024
  • 4th April 2024
  • 13th June 2024
  • 4th July 2024
    • talk Imed
  • 5th September 2024
    • talk Linagora
  • 3rd October 2024

Workshop

Beg End Who About
10:15 10:30 accueil
10:30 10:45 LORIA WP1
10:45 11:05 all discussions
11:05 11:35 Linagora WP3 + WP5
11:35 11:55 all discussions
12:30 13:30 lunch
14:20 14:50 LIX WP2
14:50 15:10 all discussions
15:10 15:40 APHP WP4
15:40 16:00 all discussions
16:00 17:00 all future