Skip to content

AI Engine Reference

The @snapotter/ai package bridges Node.js to a persistent Python sidecar for all ML operations. The dispatcher process stays alive between requests for fast warm-start performance. GPU is auto-detected at startup and used when available.

19 Python sidecar AI tools across four modalities (image, audio, video, document), plus 2 tools with optional AI capabilities. All models run locally -- no internet required after initial model download.

Architecture

Node.js Tool Route
      |
      v
 @snapotter/ai bridge.ts
      | (stdin/stdout JSON + stderr progress events)
      v
 Python dispatcher (persistent process, "ai" profile)
      |
      |-- remove_bg.py        (rembg / BiRefNet)
      |-- upscale.py          (RealESRGAN)
      |-- inpaint.py          (LaMa ONNX)
      |-- outpaint.py         (LaMa canvas expansion)
      |-- ocr.py              (PaddleOCR / Tesseract)
      |-- ocr_pdf.py          (page-by-page document OCR)
      |-- ocr_preprocess.py   (image enhancement for OCR)
      |-- detect_faces.py     (MediaPipe)
      |-- face_landmarks.py   (MediaPipe landmarks)
      |-- enhance_faces.py    (GFPGAN / CodeFormer)
      |-- colorize.py         (DDColor)
      |-- noise_removal.py    (SCUNet / tiered denoising)
      |-- red_eye_removal.py  (landmark + color analysis)
      |-- restore.py          (scratch repair + enhancement + denoising)
      |-- transcribe.py       (faster-whisper speech-to-text)
      +-- install_feature.py  (on-demand bundle installer)

A separate "docs" dispatcher profile replaces the AI allowlist with document-processing scripts (doc_pagecount, doc_health, doc_flatten, doc_redact, doc_text, doc_to_word, doc_metadata, doc_html_pdf) and skips heavy ML imports.

Timeouts: 300 s default; OCR and BiRefNet background removal get 600 s.

Feature Bundles

Each AI tool requires a model bundle to be installed before use. Bundles are installed on demand via the admin UI or install_feature.py.

BundleSizeTools
background-removal4-5 GBremove-background, passport-photo, transparency-fixer, background-replace, blur-background
face-detection200-300 MBblur-faces, red-eye-removal, smart-crop
object-eraser-colorize1-2 GBerase-object, colorize, ai-canvas-expand
upscale-enhance4-5 GBupscale, enhance-faces, noise-removal
photo-restoration800 MB - 1 GBrestore-photo
ocr3-4 GBocr, ocr-pdf
transcription~600 MBtranscribe-audio, auto-subtitles

Background Removal

Tool route: remove-background
Model: rembg with BiRefNet (default) or U2-Net variants

ParameterTypeDefaultDescription
modelstring-Model variant (optional override)
backgroundTypestring"transparent"One of: transparent, color, gradient, blur, image
backgroundColorstring-Hex color for solid background
gradientColor1string-First gradient color
gradientColor2string-Second gradient color
gradientAnglenumber-Gradient angle in degrees
blurEnabledboolean-Enable background blur effect
blurIntensitynumber (0-100)-Blur intensity
shadowEnabledboolean-Enable drop shadow on subject
shadowOpacitynumber (0-100)-Shadow opacity
outputFormatstring-Output format: png, webp, or avif
edgeRefineinteger (0-3)-Edge refinement level
decontaminateboolean-Remove color bleed from edges

Background Replace

Tool route: background-replace
Model: rembg / BiRefNet (shared with remove-background)

Removes the background and replaces it with a solid color or gradient.

ParameterTypeDefaultDescription
backgroundType"color" | "gradient""color"Background mode
colorstring"#ffffff"Background hex color (when backgroundType is color)
gradientColor1string-First gradient hex color
gradientColor2string-Second gradient hex color
gradientAngleinteger (0-360)180Gradient angle in degrees
featherinteger (0-20)0Edge feathering radius
format"png" | "webp""png"Output format

Blur Background

Tool route: blur-background
Model: rembg / BiRefNet (shared with remove-background)

Blurs the background while keeping the subject sharp.

ParameterTypeDefaultDescription
intensityinteger (1-100)50Blur intensity
featherinteger (0-20)0Edge feathering radius
format"png" | "webp""png"Output format

Image Upscaling

Tool route: upscale
Model: RealESRGAN (with Lanczos fallback when unavailable)

ParameterTypeDefaultDescription
scalenumber2Upscale factor
modelstring"auto"Model variant
faceEnhancebooleanfalseApply GFPGAN face enhancement pass
denoisenumber0Denoising strength
formatstring"auto"Output format override
qualitynumber95Output quality (1-100)

OCR / Text Extraction

Tool route: ocr
Models: Tesseract (fast), PaddleOCR PP-OCRv5 (balanced), PaddleOCR-VL 1.5 (best)

ParameterTypeDefaultDescription
quality"fast" | "balanced" | "best""balanced"Processing tier
languagestring"auto"Language: auto, en, de, fr, es, zh, ja, ko
enhancebooleantruePre-process image to improve OCR accuracy
enginestring-Deprecated. Maps tesseract to fast, paddleocr to balanced

Returns structured results with bounding boxes, confidence scores, and extracted text blocks.

PDF OCR

Tool route: ocr-pdf
Models: Same tier system as image OCR

Extracts text from scanned PDF documents using AI-powered OCR, page by page.

ParameterTypeDefaultDescription
quality"fast" | "balanced" | "best""balanced"Processing tier
languagestring"auto"Language: auto, en, de, fr, es, zh, ja, ko
pagesstring"all"Page selection: "all", "1-3", "1,3,5"

Face / PII Blur

Tool route: blur-faces
Model: MediaPipe face detection

ParameterTypeDefaultDescription
blurRadiusnumber (1-100)30Gaussian blur radius
sensitivitynumber (0-1)0.5Detection confidence threshold

Face Enhancement

Tool route: enhance-faces
Models: GFPGAN, CodeFormer

ParameterTypeDefaultDescription
model"auto" | "gfpgan" | "codeformer""auto"Enhancement model
strengthnumber (0-1)0.8Enhancement strength
sensitivitynumber (0-1)0.5Face detection threshold
onlyCenterFacebooleanfalseEnhance only the most central face

AI Colorization

Tool route: colorize
Model: DDColor (with OpenCV DNN fallback)

Converts black-and-white or grayscale photos to full color.

ParameterTypeDefaultDescription
intensitynumber (0-1)1.0Color saturation strength
model"auto" | "ddcolor" | "opencv""auto"Model variant

Noise Removal

Tool route: noise-removal
Model: SCUNet (tiered denoising pipeline)

ParameterTypeDefaultDescription
tier"quick" | "balanced" | "quality" | "maximum""balanced"Processing tier
strengthnumber (0-100)50Denoising strength
detailPreservationnumber (0-100)50How much detail to preserve; higher keeps more texture
colorNoisenumber (0-100)30Color noise reduction strength
formatstring"original"Output format: original, png, jpeg, webp, avif, jxl
qualitynumber (1-100)90Output encoding quality

Red Eye Removal

Tool route: red-eye-removal

Detects face landmarks, locates eye regions, and corrects red-channel oversaturation.

ParameterTypeDefaultDescription
sensitivitynumber (0-100)50Red pixel detection threshold
strengthnumber (0-100)70Correction strength
formatstring-Output format override (optional)
qualitynumber (1-100)90Output quality

Photo Restoration

Tool route: restore-photo

Multi-step pipeline for old or damaged photos: scratch/tear detection and repair, face enhancement, denoising, and optional colorization.

ParameterTypeDefaultDescription
scratchRemovalbooleantrueDetect and repair scratches, tears
faceEnhancementbooleantrueApply face enhancement pass
fidelitynumber (0-1)0.7Face enhancement strength (higher = more conservative)
denoisebooleantrueApply denoising pass
denoiseStrengthnumber (0-100)25Denoising strength
colorizebooleanfalseColorize after restoration
colorizeStrengthnumber (0-100)85Colorization intensity

Passport Photo

Tool route: passport-photo
Models: MediaPipe face landmarks + BiRefNet background removal

Two-phase workflow: analyze (detect face + remove background) then generate (crop, resize, tile). Supports 37+ countries across 6 regions.

Phase 1: Analyze

POST /api/v1/tools/image/passport-photo/analyze

Accepts an image file (multipart). Returns face landmark data, a base64 preview, and image dimensions.

Phase 2: Generate

POST /api/v1/tools/image/passport-photo/generate

Accepts a JSON body with the Phase 1 results plus generation settings:

ParameterTypeDefaultDescription
jobIdstring(required)Job ID from Phase 1
filenamestring(required)Original filename from Phase 1
countryCodestring(required)ISO country code (e.g., US, GB, IN)
documentTypestring"passport"Document type
bgColorstring"#FFFFFF"Background color hex
printLayoutstring"none"Print layout: none, 4x6, a4, letter
maxFileSizeKbnumber0Max file size in KB (0 = no limit)
dpinumber (72-1200)300Output DPI
customWidthMmnumber-Custom width in mm (overrides country spec)
customHeightMmnumber-Custom height in mm (overrides country spec)
zoomnumber (0.5-3)1Zoom factor
adjustXnumber0Horizontal position adjustment
adjustYnumber0Vertical position adjustment
landmarksobject(required)Landmarks from Phase 1
imageWidthnumber(required)Image width from Phase 1
imageHeightnumber(required)Image height from Phase 1

Object Erasing (Inpainting)

Tool route: erase-object
Model: LaMa via ONNX Runtime

The mask is sent as a second file part (fieldname mask), not as base64. White pixels in the mask indicate areas to erase. The format and quality settings are sent as top-level form fields.

ParameterTypeDefaultDescription
filefile(required)Source image (multipart)
maskfile(required)Mask image (multipart, fieldname mask, white = erase)
formatstring"auto"Output format: auto, png, jpg, jpeg, webp, tiff, gif, avif, heic, heif, jxl
qualityinteger (1-100)95Output quality

GPU-accelerated when an NVIDIA GPU is available.

AI Canvas Expand

Tool route: ai-canvas-expand
Model: LaMa-based outpainting

Expands the canvas of an image in any direction and fills new areas with AI-generated content that matches the existing image.

ParameterTypeDefaultDescription
extendTopinteger0Pixels to extend at the top
extendRightinteger0Pixels to extend at the right
extendBottominteger0Pixels to extend at the bottom
extendLeftinteger0Pixels to extend at the left
tier"fast" | "balanced" | "high""balanced"Quality tier
formatstring"auto"Output format: auto, png, jpg, jpeg, webp, tiff, gif, avif, heic, heif, jxl
qualityinteger (1-100)95Output quality

At least one extend direction must be greater than 0.

Smart Crop

Tool route: smart-crop
Model: MediaPipe face detection (face mode only)

ParameterTypeDefaultDescription
modestring"subject"Crop strategy: subject, face, trim
strategy"attention" | "entropy""attention"Strategy for subject mode
widthinteger-Output width
heightinteger-Output height
paddinginteger (0-50)0Padding percentage around subject
facePresetstring"head-shoulders"Preset framing when mode=face
sensitivitynumber (0-1)0.5Face detection threshold
thresholdinteger (0-255)30Background detection threshold (trim mode)
padToSquarebooleanfalsePad trimmed result to a square
padColorstring"#ffffff"Background color for square padding
targetSizeinteger-Target size for padded output (pixels)
qualityinteger (1-100)-Output quality

Legacy mode values attention and content are accepted and mapped to subject and trim respectively.

Face presets:

PresetBest for
closeupHeadshots
head-shouldersProfile photos
upper-bodyLinkedIn / formal
half-bodyFull upper body

Transcribe Audio

Tool route: transcribe-audio
Model: faster-whisper

Converts speech to text. Supports plain text, SRT, and VTT output formats.

ParameterTypeDefaultDescription
languagestring"auto"Language: auto, en, de, fr, es, zh, ja, ko, id, th, vi
outputFormat"txt" | "srt" | "vtt""txt"Output format

Auto Subtitles

Tool route: auto-subtitles
Model: faster-whisper (extracts audio from video, then transcribes)

Generates subtitle files from a video's audio track.

ParameterTypeDefaultDescription
languagestring"auto"Language: auto, en, de, fr, es, zh, ja, ko, id, th, vi
format"srt" | "vtt""srt"Output subtitle format

PNG Transparency Fixer

Tool route: transparency-fixer
Model: BiRefNet HR-matting (2048x2048 resolution)

Fixes "fake transparent" PNGs where the background was removed but left behind fringing, halos, or semi-transparent artifacts. Uses BiRefNet's high-resolution matting model to produce a clean alpha channel, then applies configurable defringe processing to remove color contamination along edges.

OOM fallback chain: If BiRefNet HR-matting exceeds available memory, the tool automatically falls back to birefnet-general, then to u2net.

ParameterTypeDefaultDescription
defringenumber (0-100)30Edge defringe strength to remove color contamination
outputFormat"png" | "webp""png"Output image format
removeWatermarkbooleanfalseApply watermark removal pre-processing (median filter)
bash
curl -X POST http://localhost:1349/api/v1/tools/image/transparency-fixer \
  -H "Authorization: Bearer <token>" \
  -F "file=@fake-transparent.png" \
  -F 'settings={"defringe":30,"outputFormat":"png"}'

Tools with Optional AI Capabilities

The following tools are not Python sidecar tools but use AI features when certain options are enabled.

Image Enhancement

Tool route: image-enhancement
Engine: Analysis-based (Sharp histogram and statistics)

Analyzes the image and applies automatic corrections for exposure, contrast, white balance, saturation, sharpness, and noise. Supports scene-specific modes.

ParameterTypeDefaultDescription
mode"auto" | "portrait" | "landscape" | "low-light" | "food" | "document""auto"Scene mode for tuning corrections
intensitynumber (0-100)50Overall correction strength
corrections.exposurebooleantrueApply exposure correction
corrections.contrastbooleantrueApply contrast correction
corrections.whiteBalancebooleantrueApply white balance correction
corrections.saturationbooleantrueApply saturation correction
corrections.sharpnessbooleantrueApply sharpness correction
corrections.denoisebooleantrueApply denoising
deepEnhancebooleanfalseEnable AI noise removal via SCUNet (requires upscale-enhance bundle)

An additional analysis endpoint is available at POST /api/v1/tools/image/image-enhancement/analyze which returns the detected corrections without applying them.

Content-Aware Resize (Seam Carving)

Tool route: content-aware-resize
Engine: Go caire binary (not Python -- no GPU benefit)

Intelligently resizes images by removing low-energy seams, preserving important content.

ParameterTypeDefaultDescription
widthnumber-Target width
heightnumber-Target height
protectFacesbooleanfalseProtect detected face regions (requires face-detection bundle)
blurRadiusnumber (0-20)4Pre-blur for energy calculation
sobelThresholdnumber (1-20)2Edge sensitivity threshold
squarebooleanfalseForce square output