[Pipelines] Mask Generation Parameters

The lack of parameters seems intentional.


The missing parameters are intentional.
Hugging Face’s MaskGenerationPipeline is not a thin wrapper around Meta’s SamAutomaticMaskGenerator / SAM2AutomaticMaskGenerator. It is a separate automatic-mask-generation implementation with a different set of knobs, designed to work uniformly across multiple SAM-style models. Parameters that the HF implementation does not actually use, like box_nms_thresh and points_per_side, are not exposed.

I will unpack that in concrete terms.


1. What each implementation actually exposes

Meta: SamAutomaticMaskGenerator

From the official SAM repo:

Constructor parameters include:

  • Sampling:

    • points_per_side
    • points_per_batch
    • point_grids
  • Crop schedule:

    • crop_n_layers
    • crop_overlap_ratio
    • crop_n_points_downscale_factor
  • Quality and stability:

    • pred_iou_thresh
    • stability_score_thresh
    • stability_score_offset
  • NMS:

    • box_nms_thresh (within a crop)
    • crop_nms_thresh (between crops)
  • Small-region cleanup:

    • min_mask_region_area
  • Output format:

    • output_mode (binary_mask, uncompressed_rle, coco_rle)

So Meta’s AMG is a fairly “expert” interface with distinct knobs for:

  • grid density
  • crop strategy
  • per-crop and cross-crop NMS
  • morphological cleanup
  • and output encoding.

SAM2 and derivatives keep essentially the same parameter style (plus SAM2-specific flags such as use_m2m). For example, geospatial wrappers around SAM2 still configure points_per_side, box_nms_thresh, crop_n_points_downscale_factor, min_mask_region_area, and use_m2m. (samgeo.gishub.org)

Hugging Face: MaskGenerationPipeline.__call__

From the Transformers mask_generation.py you attached:

Parameters picked up by _sanitize_parameters:

  • Preprocess kwargs (grid and crops):

    • points_per_batch
    • points_per_crop
    • crops_n_layers
    • crop_overlap_ratio
    • crop_n_points_downscale_factor
    • timeout
  • Forward kwargs (quality and cleanup):

    • pred_iou_thresh
    • stability_score_thresh
    • stability_score_offset
    • mask_threshold
    • max_hole_area
    • max_sprinkle_area
  • Postprocess kwargs:

    • crops_nms_thresh
    • output_rle_mask
    • output_bboxes_mask

Any other keyword argument, such as points_per_side or box_nms_thresh, is silently ignored by the pipeline because it is not added in _sanitize_parameters.

The HF tasks page for mask generation and the SAM/SAM2 docs show the same public surface: you configure points_per_batch, points_per_crop, crops_n_layers, pred_iou_thresh, stability_score_thresh, crops_nms_thresh, etc., not Meta’s full set. (Hugging Face)


2. Concrete parameter mismatches

2.1 points_per_side vs points_per_crop

  • Meta:

    • points_per_side controls how many points are sampled along each axis. Total points per crop are points_per_side**2.
    • There is also point_grids for custom grids.
  • HF:

    • Uses points_per_crop directly, interpreted as “how many points to sample in this crop” when calling image_processor.generate_crop_boxes.
    • Grid construction is handled internally by the SamImageProcessor / Sam2ImageProcessorFast, and the pipeline does not expose a point_grids concept. (GitHub)

So HF reparameterizes the sampling density:

  • Same idea (uniform point grid over each crop), but:

    • No explicit points_per_side.
    • No ability to pass custom point_grids through the pipeline.
  • If you want equivalent density, you roughly set points_per_crop ≈ points_per_side**2, but the exact layout is controlled by the processor, not the pipeline.

2.2 box_nms_thresh + crop_nms_thresh vs crops_nms_thresh

  • Meta:

    • box_nms_thresh = per-crop NMS IoU threshold. Removes duplicates within one crop.
    • crop_nms_thresh = cross-crop NMS IoU threshold. Removes duplicates between different crops.

    Logic:

    1. Generate masks for each crop.
    2. NMS within each crop (box_nms_thresh).
    3. NMS across all crops (crop_nms_thresh).
  • HF:

    • Only exposes crops_nms_thresh.
    • After processing all batches, the pipeline calls a single image_processor.post_process_for_mask_generation(all_masks, all_scores, all_boxes, crops_nms_thresh).
    • There is no separate per-crop NMS stage; the processor runs one global NMS pass over all candidate masks.

Because HF does not implement per-crop NMS, there is no meaningful place to plug in box_nms_thresh. Adding a parameter that the algorithm never uses would be confusing, so it is omitted instead.

2.3 min_mask_region_area vs max_hole_area / max_sprinkle_area

  • Meta:

    • min_mask_region_area controls postprocessing that removes small islands and small holes in masks, via remove_small_regions.
  • HF:

    • _forward optionally calls image_processor.post_process_masks with max_hole_area and max_sprinkle_area, then calls it again to resize and optionally binarize.

    • These parameters control:

      • Filling holes up to a given area.
      • Removing tiny “sprinkles” up to a given area.

So HF splits the single “minimum region area” concept into two more explicit morphological thresholds. The effect is similar (remove small artifacts), but it is not a direct 1:1 mapping, and the parameter name changes.

2.4 Other AMG-only arguments

Meta’s AMG and SAM2’s SAM2AutomaticMaskGenerator also expose parameters that have no stable meaning across all HF-supported mask-generation models:

  • point_grids
  • output_mode (binary vs RLE variants)
  • SAM2-specific knobs such as use_m2m, multimask_output, etc. (samgeo.gishub.org)

HF’s MaskGenerationPipeline does not use these concepts at all. Instead it offers:

  • output_rle_mask and output_bboxes_mask flags to optionally return extra outputs.
  • A fixed internal representation for masks and bounding boxes.

Again, parameters that have no effect on the HF algorithm are not exposed.


3. Why Hugging Face omits those SAM/SAM2 parameters

There is no official HF comment that says “we deliberately removed box_nms_thresh and points_per_side because X”, but the design is clear from the code and docs.

Reasoning, point by point:

3.1 HF pipeline is a separate implementation, not a wrapper

  • The HF task “mask generation” defines its own 3-stage pipeline:

    1. preprocess: generate_crop_boxes + point grid + cropping.
    2. forward: run model, get pred_masks and iou_scores.
    3. postprocess: mask resizing, filtering, and a single NMS step.
  • That logic lives in MaskGenerationPipeline and in the vision processors (SamImageProcessor, Sam2ImageProcessorFast). It is not calling Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator anywhere. (GitHub)

Because it is an independent implementation, it:

  • Keeps the general idea (grid of points, crops, thresholds, NMS, small region cleanup).
  • But it is free to choose its own parameterization and internal steps.

3.2 Single API across multiple SAM-like models

HF wants one mask-generation task that works for:

  • SAM v1 models (facebook/sam-vit-*).
  • SAM-HQ, MedSAM, other fine-tuned variants.
  • SAM2-based models. (Hugging Face)

To do that, they define a small set of operations that all supporting processors can implement:

  • generate_crop_boxes
  • post_process_masks
  • filter_masks
  • post_process_for_mask_generation (GitHub)

Then they expose only those parameters that make sense for every supported backend:

  • Grid and crop density in terms of points_per_crop, crops_n_layers, etc.
  • Quality thresholds (pred_iou_thresh, stability_score_thresh).
  • One NMS threshold (crops_nms_thresh).
  • Generic cleanup knobs (max_hole_area, max_sprinkle_area).

Things that are specific to one particular implementation (e.g. SAM’s per-crop NMS, SAM2’s use_m2m, custom point grids) would complicate that common API, so they are left out.

3.3 Pipeline philosophy: “few powerful knobs”

Hugging Face pipelines are designed as high-level, opinionated interfaces. The docs for the mask-generation task describe it as:

  • “Automatic mask generation for images using SamForMaskGeneration.”
  • With a small set of configuration options. (Hugging Face)

This fits the general Transformers design:

  • Pipelines expose a minimal parameter set.

  • For advanced control, you drop down to:

    • the model + processor level, or
    • the original third-party library (Meta’s SAM/SAM2 repos in this case). (Hugging Face Forums)

So from HF’s perspective:

  • Parameters like points_per_batch, points_per_crop, crops_n_layers, and the filtering thresholds cover 95% of what users need.
  • Extra “power user” knobs from Meta’s AMG are better handled in custom code or directly in the SAM/SAM2 repos.

4. What this means for SAM2 and other derivatives

You are correct that SAM2’s own automatic mask generator still uses the full AMG-style set of parameters. For example, geospatial SAM2 wrappers show configuration like: (samgeo.gishub.org)

sam2 = SamGeo2(
    model_id="sam2-hiera-large",
    points_per_side=64,
    points_per_batch=128,
    pred_iou_thresh=0.7,
    stability_score_thresh=0.92,
    stability_score_offset=0.7,
    crop_n_layers=1,
    box_nms_thresh=0.7,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=25,
    use_m2m=True,
)

So:

  • Meta SAM2 code path: still has points_per_side, box_nms_thresh, min_mask_region_area, use_m2m, etc.
  • HF mask-generation pipeline for SAM2: uses the same model weights but its own sampling and postprocessing pipeline, which ignores those SAM2-specific AMG knobs and uses HF’s parameter set instead. (Hugging Face)

This is why you see the discrepancy:

  • The models are SAM derivatives.
  • The automatic mask generator logic is HF’s own, not SAM2’s SAM2AutomaticMaskGenerator.

5. If you need the “missing” knobs

If you actually require per-crop NMS tuning or explicit points_per_side, there are only a few real options:

  1. Use Meta’s SAM/SAM2 repos directly

    • Call SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator with all their original kwargs.
  2. Use a wrapper that ports Meta’s AMG to HF checkpoints

    • Several projects load HF model weights into SAM/SAM2 code and keep the full AMG parameterization (for example in geospatial and research code). (samgeo.gishub.org)
  3. Reimplement the AMG logic on top of HF models

    • Use SamModel / SAM2 model classes plus the HF processors.
    • Port the grid, crop, NMS, and region-cleanup logic from automatic_mask_generator.py.
    • Then you can expose exactly the parameter set you want.

If you stay on pipeline("mask-generation"), you are constrained to HF’s reduced parameter surface by design.


Summary

  • The Hugging Face MaskGenerationPipeline re-implements automatic mask generation on top of SamForMaskGeneration and the SAM/SAM2 image processors. It does not call Meta’s SamAutomaticMaskGenerator or SAM2AutomaticMaskGenerator.

  • Because the internal algorithm is different, some Meta AMG knobs have no direct counterpart:

    • points_per_side is replaced by points_per_crop and internal grid generation.
    • Per-crop box_nms_thresh is gone; HF uses a single global crops_nms_thresh.
    • min_mask_region_area is replaced by max_hole_area and max_sprinkle_area.
  • HF also wants one stable pipeline API that works for SAM, SAM-HQ, SAM2, and other derivatives, so it exposes only parameters that are meaningful across all of them. (Hugging Face)

  • SAM2 and many downstream projects still use the full AMG parameter set, including points_per_side and box_nms_thresh. If you need those, you must use Meta’s AMG implementation or a custom wrapper, not the HF mask-generation pipeline. (samgeo.gishub.org)

1 Like