The wav2vec2.0 base 960h model never seems to return a beginning of sentence or end of sentence token (or ’ or unknown, so far)–using greedy decoding. Is that expected? I can’t seem to find this discussed anywhere. Or am I just feeding in audio that is too difficult for the model to determine the eos/bos? If so, can someone provide a counter-example?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Pretrained wav2vec2 speech to text - decoded text is gibberish | 0 | 429 | June 12, 2023 | |
| Wav2vec2 results vary depending on far away prefix len | 0 | 192 | September 30, 2023 | |
| Language model for wav2vec2.0 decoding | 36 | 14049 | August 3, 2024 | |
| Decoding the logits provided by a tiny Wav2vec2 model gives sequences that do not make sense | 0 | 252 | October 25, 2022 | |
| Wave2Vec,loss decreased, but WER remained stable | 2 | 808 | January 30, 2022 |