Outputs change if re-using KVCache (past_key_values) for model.forward and generation

I have found a post that could explain this: Possible Bug with KV Caching in Llama (original) model · Issue #25420 · huggingface/transformers · GitHub

In short: Using KV cache will change the logits, especially when the model is loaded in 16-bit precision.

1 Like