Mask2Former-Large for 20-Class Semantic Segmentation (DLR SkyScapes)

This repository provides a Mask2Former-Large (Swin-L backbone) model fine-tuned on the DLR SkyScapes aerial imagery dataset for 20-class semantic segmentation.
The model focuses on high-resolution urban and peri-urban environments and produces detailed semantic masks for roads, buildings, vegetation, vehicles, and more.


Model Overview

  • Architecture: Mask2Former (Transformer-based, Swin-L backbone)
  • Base Checkpoint: facebook/mask2former-swin-large-ade-semantic
  • Task: Semantic Segmentation
  • Domain: High-resolution aerial imagery
  • Classes: 20 (DLR SkyScapes subset)
  • Loss Function: CrossEntropyLoss (ignore_index = 255)
  • Preprocessing: Hugging Face AutoImageProcessor
  • Recommended: use_fast=False for full reproducibility
  • Weights: model.safetensors

This model was trained as part of a bachelor thesis benchmarking state-of-the-art segmentation architectures on aerial datasets.


DLR SkyScapes
is a high-resolution aerial dataset created by the German Aerospace Center (DLR).
It provides high-quality pixel-level annotations for detailed scene understanding in urban environments.

Dataset Characteristics

  • Images: 16
  • Resolution: 5616 ร— 3744 px
  • Ground Sampling Distance (GSD): 13 cm/pixel
  • Aerial Coverage: ~5.69 kmยฒ (urban & rural)
  • Annotated Instances: 70,346
  • Original Classes: 31
  • Used in this model: First 20 semantic classes, listed below.

Semantic Classes (1โ€“20)

  1. Low vegetation
  2. Paved road
  3. Non paved road
  4. Paved parking
  5. Non paved parking
  6. Bikeways
  7. Sidewalks
  8. Entrance/exit
  9. Danger area
  10. Lane markings
  11. Building
  12. Car
  13. Trailer
  14. Van
  15. Truck
  16. Long truck
  17. Bus
  18. Clutter
  19. Impervious surface
  20. Tree

These match the class IDs used during training.


Intended Applications

  • Urban mapping and GIS
  • HD map generation
  • Scene understanding for robotics
  • Autonomous driving (aerial-based priors)
  • Environmental monitoring
  • Remote sensing research

Not suitable for safety-critical deployment without human supervision.


Usage Example

Load the processor and model:

from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
import torch
from PIL import Image

model_id = "RyanQchiqache/mask2former-large-dlr-skyscapes"

processor = AutoImageProcessor.from_pretrained(model_id, use_fast=False)
model = Mask2FormerForUniversalSegmentation.from_pretrained(model_id)

img = Image.open("example_dlr_image.png")

inputs = processor(images=img, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

segmentation = processor.post_process_semantic_segmentation(
    outputs,
    target_sizes=[img.size[::-1]]
)[0]
Downloads last month
18
Safetensors
Model size
0.2B params
Tensor type
I64
ยท
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RyanQchiqache/mask2former-large-dlr-skyscapes

Finetuned
(3)
this model