Mille-Pensées

Model Summary

Mille-Pensées is a french math reaoning model finetuned on the Mille-Pensées-Dataset from the Qwen2.5-7B-Instruct model. It performs similarly or better than Qwen2.5-Math-7B-Instruct on most French Math Benchmarks while reasoning in french instead of english. It also offers superior performances on English Math and General Benchmarks.

Evaluation on French Math Data
Evaluation on English Math Data
Evaluation on English Data

Evaluation was done with vllm and math-verify for Math benchmarks (temperature=0.6, top_p=0.95, top_k=20, min_p=0, presence_penalty=0.5, max_tokens=38192) and lm-evaluation-harness for the general English Benchmarks.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "GLauzza/Mille-Pensees"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map="auto"
)

# Example input
messages = [{"role": "user", "content": "Combien vaut 1+1?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")

generated_ids = model.generate(input_ids, max_new_tokens=38192, temperature=0.6, repetition_penalty=0.5, do_sample=True, eos_token_id=tokenizer.eos_token_id)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Réponse: {response}")

Training Details

Parameter Value
Epochs 3.16
Global Batch Size 192
Learning Rate 6e-5
Scheduler Cosine
Optimizer AdamW
Warmup Steps 100
Weight Decay 0.0
Max Sequence Length 18k
Sequence Packing No

⚠️ Licenses

The base model of this finetuning Qwen2.5-7B-Instruct is licensed under Apache 2.0.

This model was finetuned on the Mille-Pensées-Dataset, which uses multiple datasets licensed under MIT, Apache 2.0 and CC-BY-4.0. However, one of the subsets of the Mille-Pensées-Dataset is from AM-DeepSeek-R1-0528-Distilled and has the following limitation: Developers should strictly limit the use of this project’s open-sourced code, data, models, and related artifacts to research purposes only. Commercial use and any applications that could potentially cause harm are strictly prohibited. The content in this dataset does not reflect the views, beliefs, or endorsements of any individual or institution. The authors disclaim any responsibility for consequences arising from the use, misuse, or interpretation of the dataset and associated materials.

Therefore, if you want to use the Mille-pensées model, you must respect all those licenses and limitations.

Citation Information

This project was provided with computing AI and storage resources by GENCI at IDRIS thanks to the grant 2025-AD011011668R5 on the supercomputer Jean Zay's A100 and H100 partitions.

The authors are affiliated to LORIA, CNRS, Nancy, France.

If you use this dataset, please cite:

@misc{Mille-Pensees,
    title={Mille-Pensees},
    url={https://huggingface.co/datasets/GLauzza/Mille-Pensees},
    author={Gabriel Lauzzana, Imane Ouada, Christophe Cerisara},
    month={December},
    year={2025}
}
Downloads last month
35
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GLauzza/Mille-Pensees

Base model

Qwen/Qwen2.5-7B
Finetuned
(2222)
this model
Quantizations
1 model

Dataset used to train GLauzza/Mille-Pensees