Welcome to LLM4All

There are many powerful Large Language Models (LLM) beyond ChatGPT and Bard. We focus in this project on such open-source LLMs that:

  • Everyone, preferably with some GPU resources, can deploy in her/his own computers and control in much more depth than ChatGPT and Bard.
  • Will be finetuned to better handle human meetings and conversations (but no chatbots!), especially in French.
  • Will be incrementally updated with the last pieces of news, emerging lexicon, events.
  • Will be connected to the best speech recognition models (Whisper, MMS...) to handle in particular emergency calls in hospitals.

Funding

LLM4All is a project funded by the French ANR (Agence Nationale de la Recherche).

Consortium

The consortium is composed of, in alphabetical order:

with a strong support from the Hugginface company on LLM training.

Planning

  • Start date: Oct 1st, 2023
  • Duration: 42 months

Companion projects

Work packages

Nb Leader Name
WP0 LORIA Project management
WP1 LORIA Fine-tuning, continual updating
WP2 LIX Low-cost LLMs
WP3 Linagora LLMs for spoken dialogue
WP4 AP-HP Boosting LLMs with other data
WP5 Linagora Communication, dissemination, exploitation

Contact

cerisara at loria dot fr


Kickoff meeting

Deliverables

  • T0+6 = 1st April 2024
  • T0+12 = 1st October 2024
D date desc
1/4/24 DMP
1/10/24 Accord de consortium
1/4/25 Rapport intermediaire a 18 mois
1/10/25 DMP a 18 mois
31/3/27 Rapport final
31/3/27 DMP final
0.1 18 progress report v1
0.1 40 progress report v2
0.2 6 DMP v1
0.2 42 DMP v2
1.1 12 software LLM training and evaluation + report
1.1 24 software LLM training and evaluation + report
1.1 40 software LLM training and evaluation + report
1.2 24 model + release every 2 months
2.1 12 software LLM low-cost inference, training + report
2.1 24 software LLM low-cost inference, training + report
2.1 40 software LLM low-cost inference, training + report
2.2 24 distilled version of model from WP1
2.2 40 distilled version of model from WP1
3.1 12 augmented dialogue dataset
3.1 24 augmented dialogue dataset
3.2 12 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.2 24 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.2 40 soft + report: adaptation of LLM to dialogue + dialogue summarization
3.3 24 model for dialogue and dialogue summarization
3.3 40 model for dialogue and dialogue summarization
4.1 12 SimSAMU dataset
4.1 24 SimSAMU dataset
4.1 40 SimSAMU dataset
4.2 12 soft + report: ASR domain adaptation
4.2 24 soft + report: ASR domain adaptation
4.2 40 soft + report: ASR domain adaptation
4.3 24 ASR models for meetings and ER calls
4.3 40 ASR models for meetings and ER calls
4.4 12 soft + report: adaptation to ER calls
4.4 24 soft + report: adaptation to ER calls
4.4 40 soft + report: adaptation to ER calls
4.5 24 LLM for ER calls
4.5 40 LLM for ER calls
5.1 30 Workshop
5.2 42 dissemination report
5.3 12 exploitation plan
5.3 36 exploitation plan

PMT meetings

Our PMT meetings occur at 2PM on the first Thrusday of every month at URL https://jitsi.linagora.com/llm4all

  • 2nd November 2023
    • Discussion about progress in FT and CL of LLMs
    • Data Management Plan: the DMP is online here or from the top menu
    • TODO everyone (for mid-january): complete a first version by editing this markdown file or by sending me your updates by email
  • 7th December 2023
    • Discussion about progress in finetuning Claire
  • 11th January 2024
    • Progress per partner
    • ANR visio on Feb 2nd with all projects, I'll present LLM4ALL
    • Organizing a workshop with the 3 other ANR TSIA
    • list corpus into the DMP...
  • 1st February 2024