Mask2Former-Large for 20-Class Semantic Segmentation (DLR SkyScapes)

This repository provides a Mask2Former-Large (Swin-L backbone) model fine-tuned on the DLR SkyScapes aerial imagery dataset for 20-class semantic segmentation.
The model focuses on high-resolution urban and peri-urban environments and produces detailed semantic masks for roads, buildings, vegetation, vehicles, and more.

Model Overview

Architecture: Mask2Former (Transformer-based, Swin-L backbone)
Base Checkpoint: facebook/mask2former-swin-large-ade-semantic
Task: Semantic Segmentation
Domain: High-resolution aerial imagery
Classes: 20 (DLR SkyScapes subset)
Loss Function: CrossEntropyLoss (ignore_index = 255)
Preprocessing: Hugging Face AutoImageProcessor
Recommended: use_fast=False for full reproducibility
Weights: model.safetensors

This model was trained as part of a bachelor thesis benchmarking state-of-the-art segmentation architectures on aerial datasets.

DLR SkyScapes
is a high-resolution aerial dataset created by the German Aerospace Center (DLR).
It provides high-quality pixel-level annotations for detailed scene understanding in urban environments.

Dataset Characteristics

Images: 16
Resolution: 5616 × 3744 px
Ground Sampling Distance (GSD): 13 cm/pixel
Aerial Coverage: ~5.69 km² (urban & rural)
Annotated Instances: 70,346
Original Classes: 31
Used in this model: First 20 semantic classes, listed below.

Semantic Classes (1–20)

Low vegetation
Paved road
Non paved road
Paved parking
Non paved parking
Bikeways
Sidewalks
Entrance/exit
Danger area
Lane markings
Building
Car
Trailer
Van
Truck
Long truck
Bus
Clutter
Impervious surface
Tree

These match the class IDs used during training.

Intended Applications

Urban mapping and GIS
HD map generation
Scene understanding for robotics
Autonomous driving (aerial-based priors)
Environmental monitoring
Remote sensing research

Not suitable for safety-critical deployment without human supervision.

Usage Example

Load the processor and model:

from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
import torch
from PIL import Image

model_id = "RyanQchiqache/mask2former-large-dlr-skyscapes"

processor = AutoImageProcessor.from_pretrained(model_id, use_fast=False)
model = Mask2FormerForUniversalSegmentation.from_pretrained(model_id)

img = Image.open("example_dlr_image.png")

inputs = processor(images=img, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

segmentation = processor.post_process_semantic_segmentation(
    outputs,
    target_sizes=[img.size[::-1]]
)[0]

Downloads last month: 18

Safetensors

Model size

0.2B params

Tensor type

I64

F32

Model tree for RyanQchiqache/mask2former-large-dlr-skyscapes

Base model

facebook/mask2former-swin-large-ade-semantic

Finetuned

(3)

this model