Qwen3-0.6B-T5-xxl

Model Description

This repository contains a fine-tuned version of Qwen/Qwen3-Embedding-0.6B. The model has been specifically trained to replicate the embedding outputs of the much larger google/t5-v1_1-xxl model.

The primary goal of this project is to provide a lightweight, high-performance model that produces embeddings semantically compatible with T5-xxl, but with a significantly smaller footprint and faster inference speed. The final output dimension is 4096.

This model is provided in bfloat16 format for optimized storage and usage.

How to Use

This model can be used directly with the Hugging Face transformers library. Since it includes a custom architecture, you must use the trust_remote_code=True flag when loading.

from transformers import AutoTokenizer, AutoModel
import torch

# Define the model repository ID
model_id = "JusteLeo/Qwen3-0.6B-T5-xxl"

# Load the tokenizer and model
# trust_remote_code=True is required to load the custom projection head
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

# Move model to a device (e.g., GPU)
device = "cuda"
model.to(device)
model.eval()

# Create embeddings
prompts = [
    "A photorealistic portrait of a medieval knight in shiny armor.",
    "A futuristic cityscape at night, with flying cars and neon lights."
]

# Tokenize the prompts
inputs = tokenizer(prompts, padding=True, truncation=True, return_tensors="pt").to(device)

# Generate the embeddings
with torch.no_grad():
    embeddings = model(**inputs)

print("Embeddings generated successfully!")
print(f"Output shape: {embeddings.shape}")
# Expected output shape: (2, 4096)

Model Details

Base Model: Qwen/Qwen3-Embedding-0.6B
Target Embedding Space: google/t5-v1_1-xxl
Fine-tuning Dataset: JusteLeo/t5-xxl-embedding
Output Dimension: 4096

License

This repository is licensed under the Apache license 2.0.

Downloads last month: 14

Safetensors

Model size

0.6B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JusteLeo/Qwen3-0.6B-T5-xxl

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(84)

this model

Finetunes

1 model