Outputs change if re-using KVCache (past_key_values) for model.forward and generation

HarlynDN · January 22, 2025, 11:28am

In short: Using KV cache will change the logits, especially when the model is loaded in 16-bit precision.

Topic		Replies	Views
Model.generate use_cache=True generates different results than use_cache=False Intermediate	3	422	March 4, 2025
Storing and loading KV cache 🤗Transformers	6	1671	October 21, 2024
Why i can't use or can't pass past_key_values = DynamicCache() into Llama 3 model Intermediate	1	353	October 8, 2024
Transformer KV-Cache Produces Worse Output Than Normal Generation – Why? Beginners	1	385	March 3, 2025
Past_key_values - why not past_key_values_queries? Beginners	5	11536	October 15, 2023