[Pipelines] Mask Generation Parameters

John6666 · December 10, 2025, 8:26am

The lack of parameters seems intentional.

The missing parameters are intentional.
Hugging Face’s MaskGenerationPipeline is not a thin wrapper around Meta’s SamAutomaticMaskGenerator / SAM2AutomaticMaskGenerator. It is a separate automatic-mask-generation implementation with a different set of knobs, designed to work uniformly across multiple SAM-style models. Parameters that the HF implementation does not actually use, like box_nms_thresh and points_per_side, are not exposed.

I will unpack that in concrete terms.

1. What each implementation actually exposes

Meta: `SamAutomaticMaskGenerator`

From the official SAM repo:

Constructor parameters include:

Sampling:
- points_per_side
- points_per_batch
- point_grids
Crop schedule:
- crop_n_layers
- crop_overlap_ratio
- crop_n_points_downscale_factor
Quality and stability:
- pred_iou_thresh
- stability_score_thresh
- stability_score_offset
NMS:
- box_nms_thresh (within a crop)
- crop_nms_thresh (between crops)
Small-region cleanup:
- min_mask_region_area
Output format:
- output_mode (binary_mask, uncompressed_rle, coco_rle)

So Meta’s AMG is a fairly “expert” interface with distinct knobs for:

grid density
crop strategy
per-crop and cross-crop NMS
morphological cleanup
and output encoding.

SAM2 and derivatives keep essentially the same parameter style (plus SAM2-specific flags such as use_m2m). For example, geospatial wrappers around SAM2 still configure points_per_side, box_nms_thresh, crop_n_points_downscale_factor, min_mask_region_area, and use_m2m. (samgeo.gishub.org)

Hugging Face: `MaskGenerationPipeline.call`

From the Transformers mask_generation.py you attached:

Parameters picked up by _sanitize_parameters:

Preprocess kwargs (grid and crops):
- points_per_batch
- points_per_crop
- crops_n_layers
- crop_overlap_ratio
- crop_n_points_downscale_factor
- timeout
Forward kwargs (quality and cleanup):
- pred_iou_thresh
- stability_score_thresh
- stability_score_offset
- mask_threshold
- max_hole_area
- max_sprinkle_area
Postprocess kwargs:
- crops_nms_thresh
- output_rle_mask
- output_bboxes_mask

Any other keyword argument, such as points_per_side or box_nms_thresh, is silently ignored by the pipeline because it is not added in _sanitize_parameters.

The HF tasks page for mask generation and the SAM/SAM2 docs show the same public surface: you configure points_per_batch, points_per_crop, crops_n_layers, pred_iou_thresh, stability_score_thresh, crops_nms_thresh, etc., not Meta’s full set. (Hugging Face)

2. Concrete parameter mismatches

2.1 `points_per_side` vs `points_per_crop`

Meta:
- points_per_side controls how many points are sampled along each axis. Total points per crop are points_per_side**2.
- There is also point_grids for custom grids.
HF:
- Uses points_per_crop directly, interpreted as “how many points to sample in this crop” when calling image_processor.generate_crop_boxes.
- Grid construction is handled internally by the SamImageProcessor / Sam2ImageProcessorFast, and the pipeline does not expose a point_grids concept. (GitHub)

So HF reparameterizes the sampling density:

Same idea (uniform point grid over each crop), but:
- No explicit points_per_side.
- No ability to pass custom point_grids through the pipeline.
If you want equivalent density, you roughly set points_per_crop ≈ points_per_side**2, but the exact layout is controlled by the processor, not the pipeline.

2.2 `box_nms_thresh` + `crop_nms_thresh` vs `crops_nms_thresh`

Meta:
- box_nms_thresh = per-crop NMS IoU threshold. Removes duplicates within one crop.
- crop_nms_thresh = cross-crop NMS IoU threshold. Removes duplicates between different crops.
Logic:
1. Generate masks for each crop.
2. NMS within each crop (box_nms_thresh).
3. NMS across all crops (crop_nms_thresh).
HF:
- Only exposes crops_nms_thresh.
- After processing all batches, the pipeline calls a single image_processor.post_process_for_mask_generation(all_masks, all_scores, all_boxes, crops_nms_thresh).
- There is no separate per-crop NMS stage; the processor runs one global NMS pass over all candidate masks.

Because HF does not implement per-crop NMS, there is no meaningful place to plug in box_nms_thresh. Adding a parameter that the algorithm never uses would be confusing, so it is omitted instead.

2.3 `min_mask_region_area` vs `max_hole_area` / `max_sprinkle_area`

Meta:
- min_mask_region_area controls postprocessing that removes small islands and small holes in masks, via remove_small_regions.
HF:
- _forward optionally calls image_processor.post_process_masks with max_hole_area and max_sprinkle_area, then calls it again to resize and optionally binarize.
- These parameters control:
  - Filling holes up to a given area.
  - Removing tiny “sprinkles” up to a given area.

So HF splits the single “minimum region area” concept into two more explicit morphological thresholds. The effect is similar (remove small artifacts), but it is not a direct 1:1 mapping, and the parameter name changes.

2.4 Other AMG-only arguments

Meta’s AMG and SAM2’s SAM2AutomaticMaskGenerator also expose parameters that have no stable meaning across all HF-supported mask-generation models:

point_grids
output_mode (binary vs RLE variants)
SAM2-specific knobs such as use_m2m, multimask_output, etc. (samgeo.gishub.org)

HF’s MaskGenerationPipeline does not use these concepts at all. Instead it offers:

output_rle_mask and output_bboxes_mask flags to optionally return extra outputs.
A fixed internal representation for masks and bounding boxes.

Again, parameters that have no effect on the HF algorithm are not exposed.

3. Why Hugging Face omits those SAM/SAM2 parameters

There is no official HF comment that says “we deliberately removed box_nms_thresh and points_per_side because X”, but the design is clear from the code and docs.

Reasoning, point by point:

3.1 HF pipeline is a separate implementation, not a wrapper

The HF task “mask generation” defines its own 3-stage pipeline:
1. preprocess: generate_crop_boxes + point grid + cropping.
2. forward: run model, get pred_masks and iou_scores.
3. postprocess: mask resizing, filtering, and a single NMS step.
That logic lives in MaskGenerationPipeline and in the vision processors (SamImageProcessor, Sam2ImageProcessorFast). It is not calling Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator anywhere. (GitHub)

Because it is an independent implementation, it:

Keeps the general idea (grid of points, crops, thresholds, NMS, small region cleanup).
But it is free to choose its own parameterization and internal steps.

3.2 Single API across multiple SAM-like models

HF wants one mask-generation task that works for:

SAM v1 models (facebook/sam-vit-*).
SAM-HQ, MedSAM, other fine-tuned variants.
SAM2-based models. (Hugging Face)

To do that, they define a small set of operations that all supporting processors can implement:

generate_crop_boxes
post_process_masks
filter_masks
post_process_for_mask_generation (GitHub)

Then they expose only those parameters that make sense for every supported backend:

Grid and crop density in terms of points_per_crop, crops_n_layers, etc.
Quality thresholds (pred_iou_thresh, stability_score_thresh).
One NMS threshold (crops_nms_thresh).
Generic cleanup knobs (max_hole_area, max_sprinkle_area).

Things that are specific to one particular implementation (e.g. SAM’s per-crop NMS, SAM2’s use_m2m, custom point grids) would complicate that common API, so they are left out.

3.3 Pipeline philosophy: “few powerful knobs”

Hugging Face pipelines are designed as high-level, opinionated interfaces. The docs for the mask-generation task describe it as:

“Automatic mask generation for images using SamForMaskGeneration.”
With a small set of configuration options. (Hugging Face)

This fits the general Transformers design:

Pipelines expose a minimal parameter set.
For advanced control, you drop down to:
- the model + processor level, or
- the original third-party library (Meta’s SAM/SAM2 repos in this case). (Hugging Face Forums)

So from HF’s perspective:

Parameters like points_per_batch, points_per_crop, crops_n_layers, and the filtering thresholds cover 95% of what users need.
Extra “power user” knobs from Meta’s AMG are better handled in custom code or directly in the SAM/SAM2 repos.

4. What this means for SAM2 and other derivatives

You are correct that SAM2’s own automatic mask generator still uses the full AMG-style set of parameters. For example, geospatial SAM2 wrappers show configuration like: (samgeo.gishub.org)

sam2 = SamGeo2(
    model_id="sam2-hiera-large",
    points_per_side=64,
    points_per_batch=128,
    pred_iou_thresh=0.7,
    stability_score_thresh=0.92,
    stability_score_offset=0.7,
    crop_n_layers=1,
    box_nms_thresh=0.7,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=25,
    use_m2m=True,
)

So:

Meta SAM2 code path: still has points_per_side, box_nms_thresh, min_mask_region_area, use_m2m, etc.
HF mask-generation pipeline for SAM2: uses the same model weights but its own sampling and postprocessing pipeline, which ignores those SAM2-specific AMG knobs and uses HF’s parameter set instead. (Hugging Face)

This is why you see the discrepancy:

The models are SAM derivatives.
The automatic mask generator logic is HF’s own, not SAM2’s SAM2AutomaticMaskGenerator.

5. If you need the “missing” knobs

If you actually require per-crop NMS tuning or explicit points_per_side, there are only a few real options:

Use Meta’s SAM/SAM2 repos directly
- Call SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator with all their original kwargs.
Use a wrapper that ports Meta’s AMG to HF checkpoints
- Several projects load HF model weights into SAM/SAM2 code and keep the full AMG parameterization (for example in geospatial and research code). (samgeo.gishub.org)
Reimplement the AMG logic on top of HF models
- Use SamModel / SAM2 model classes plus the HF processors.
- Port the grid, crop, NMS, and region-cleanup logic from automatic_mask_generator.py.
- Then you can expose exactly the parameter set you want.

If you stay on pipeline("mask-generation"), you are constrained to HF’s reduced parameter surface by design.

Summary

The Hugging Face MaskGenerationPipeline re-implements automatic mask generation on top of SamForMaskGeneration and the SAM/SAM2 image processors. It does not call Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator.
Because the internal algorithm is different, some Meta AMG knobs have no direct counterpart:
- points_per_side is replaced by points_per_crop and internal grid generation.
- Per-crop box_nms_thresh is gone; HF uses a single global crops_nms_thresh.
- min_mask_region_area is replaced by max_hole_area and max_sprinkle_area.
HF also wants one stable pipeline API that works for SAM, SAM-HQ, SAM2, and other derivatives, so it exposes only parameters that are meaningful across all of them. (Hugging Face)
SAM2 and many downstream projects still use the full AMG parameter set, including points_per_side and box_nms_thresh. If you need those, you must use Meta’s AMG implementation or a custom wrapper, not the HF mask-generation pipeline. (samgeo.gishub.org)

Topic		Replies	Views
Two questions about Segment Anything Model (SAM) in Transformers 🤗Transformers	5	4017	December 8, 2023
Finetuning Segment Anything and automatic prediction Intermediate	2	5905	June 7, 2023
Segment Anything model fine-tuning use in a pipeline 🤗Transformers	1	887	September 5, 2023
Finetune SAM for instance segmentation to output segmenatation masks along with label names Models	0	254	April 30, 2024
SAM image size for fine-tuning Intermediate	5	6440	April 3, 2024