Mask2Former-Large for 20-Class Semantic Segmentation (DLR SkyScapes)
This repository provides a Mask2Former-Large (Swin-L backbone) model fine-tuned on the DLR SkyScapes aerial imagery dataset for 20-class semantic segmentation.
The model focuses on high-resolution urban and peri-urban environments and produces detailed semantic masks for roads, buildings, vegetation, vehicles, and more.
Model Overview
- Architecture: Mask2Former (Transformer-based, Swin-L backbone)
- Base Checkpoint:
facebook/mask2former-swin-large-ade-semantic - Task: Semantic Segmentation
- Domain: High-resolution aerial imagery
- Classes: 20 (DLR SkyScapes subset)
- Loss Function: CrossEntropyLoss (ignore_index = 255)
- Preprocessing: Hugging Face
AutoImageProcessor - Recommended:
use_fast=Falsefor full reproducibility - Weights:
model.safetensors
This model was trained as part of a bachelor thesis benchmarking state-of-the-art segmentation architectures on aerial datasets.
DLR SkyScapes
is a high-resolution aerial dataset created by the German Aerospace Center (DLR).
It provides high-quality pixel-level annotations for detailed scene understanding in urban environments.
Dataset Characteristics
- Images: 16
- Resolution: 5616 ร 3744 px
- Ground Sampling Distance (GSD): 13 cm/pixel
- Aerial Coverage: ~5.69 kmยฒ (urban & rural)
- Annotated Instances: 70,346
- Original Classes: 31
- Used in this model: First 20 semantic classes, listed below.
Semantic Classes (1โ20)
- Low vegetation
- Paved road
- Non paved road
- Paved parking
- Non paved parking
- Bikeways
- Sidewalks
- Entrance/exit
- Danger area
- Lane markings
- Building
- Car
- Trailer
- Van
- Truck
- Long truck
- Bus
- Clutter
- Impervious surface
- Tree
These match the class IDs used during training.
Intended Applications
- Urban mapping and GIS
- HD map generation
- Scene understanding for robotics
- Autonomous driving (aerial-based priors)
- Environmental monitoring
- Remote sensing research
Not suitable for safety-critical deployment without human supervision.
Usage Example
Load the processor and model:
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation
import torch
from PIL import Image
model_id = "RyanQchiqache/mask2former-large-dlr-skyscapes"
processor = AutoImageProcessor.from_pretrained(model_id, use_fast=False)
model = Mask2FormerForUniversalSegmentation.from_pretrained(model_id)
img = Image.open("example_dlr_image.png")
inputs = processor(images=img, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
segmentation = processor.post_process_semantic_segmentation(
outputs,
target_sizes=[img.size[::-1]]
)[0]
- Downloads last month
- 18
Model tree for RyanQchiqache/mask2former-large-dlr-skyscapes
Base model
facebook/mask2former-swin-large-ade-semantic