AI Engine Reference

The @snapotter/ai package bridges Node.js to a persistent Python sidecar for all ML operations. The dispatcher process stays alive between requests for fast warm-start performance. GPU is auto-detected at startup and used when available.

19 Python sidecar AI tools across four modalities (image, audio, video, document), plus 2 tools with optional AI capabilities. All models run locally -- no internet required after initial model download.

Architecture

Node.js Tool Route
      |
      v
 @snapotter/ai bridge.ts
      | (stdin/stdout JSON + stderr progress events)
      v
 Python dispatcher (persistent process, "ai" profile)
      |
      |-- remove_bg.py        (rembg / BiRefNet)
      |-- upscale.py          (RealESRGAN)
      |-- inpaint.py          (LaMa ONNX)
      |-- outpaint.py         (LaMa canvas expansion)
      |-- ocr.py              (PaddleOCR / Tesseract)
      |-- ocr_pdf.py          (page-by-page document OCR)
      |-- ocr_preprocess.py   (image enhancement for OCR)
      |-- detect_faces.py     (MediaPipe)
      |-- face_landmarks.py   (MediaPipe landmarks)
      |-- enhance_faces.py    (GFPGAN / CodeFormer)
      |-- colorize.py         (DDColor)
      |-- noise_removal.py    (SCUNet / tiered denoising)
      |-- red_eye_removal.py  (landmark + color analysis)
      |-- restore.py          (scratch repair + enhancement + denoising)
      |-- transcribe.py       (faster-whisper speech-to-text)
      +-- install_feature.py  (on-demand bundle installer)

A separate "docs" dispatcher profile replaces the AI allowlist with document-processing scripts (doc_pagecount, doc_health, doc_flatten, doc_redact, doc_text, doc_to_word, doc_metadata, doc_html_pdf) and skips heavy ML imports.

Timeouts: 300 s default; OCR and BiRefNet background removal get 600 s.

Feature Bundles

Each AI tool requires a model bundle to be installed before use. Bundles are installed on demand via the admin UI or install_feature.py.

Bundle	Size	Tools
`background-removal`	4-5 GB	remove-background, passport-photo, transparency-fixer, background-replace, blur-background
`face-detection`	200-300 MB	blur-faces, red-eye-removal, smart-crop
`object-eraser-colorize`	1-2 GB	erase-object, colorize, ai-canvas-expand
`upscale-enhance`	4-5 GB	upscale, enhance-faces, noise-removal
`photo-restoration`	800 MB - 1 GB	restore-photo
`ocr`	3-4 GB	ocr, ocr-pdf
`transcription`	~600 MB	transcribe-audio, auto-subtitles

Background Removal

Tool route: remove-background
Model: rembg with BiRefNet (default) or U2-Net variants

Parameter	Type	Default	Description
`model`	string	-	Model variant (optional override)
`backgroundType`	string	`"transparent"`	One of: `transparent`, `color`, `gradient`, `blur`, `image`
`backgroundColor`	string	-	Hex color for solid background
`gradientColor1`	string	-	First gradient color
`gradientColor2`	string	-	Second gradient color
`gradientAngle`	number	-	Gradient angle in degrees
`blurEnabled`	boolean	-	Enable background blur effect
`blurIntensity`	number (0-100)	-	Blur intensity
`shadowEnabled`	boolean	-	Enable drop shadow on subject
`shadowOpacity`	number (0-100)	-	Shadow opacity
`outputFormat`	string	-	Output format: `png`, `webp`, or `avif`
`edgeRefine`	integer (0-3)	-	Edge refinement level
`decontaminate`	boolean	-	Remove color bleed from edges

Background Replace

Tool route: background-replace
Model: rembg / BiRefNet (shared with remove-background)

Removes the background and replaces it with a solid color or gradient.

Parameter	Type	Default	Description
`backgroundType`	`"color"` \| `"gradient"`	`"color"`	Background mode
`color`	string	`"#ffffff"`	Background hex color (when `backgroundType` is `color`)
`gradientColor1`	string	-	First gradient hex color
`gradientColor2`	string	-	Second gradient hex color
`gradientAngle`	integer (0-360)	`180`	Gradient angle in degrees
`feather`	integer (0-20)	`0`	Edge feathering radius
`format`	`"png"` \| `"webp"`	`"png"`	Output format

Blur Background

Tool route: blur-background
Model: rembg / BiRefNet (shared with remove-background)

Blurs the background while keeping the subject sharp.

Parameter	Type	Default	Description
`intensity`	integer (1-100)	`50`	Blur intensity
`feather`	integer (0-20)	`0`	Edge feathering radius
`format`	`"png"` \| `"webp"`	`"png"`	Output format

Image Upscaling

Tool route: upscale
Model: RealESRGAN (with Lanczos fallback when unavailable)

Parameter	Type	Default	Description
`scale`	number	`2`	Upscale factor
`model`	string	`"auto"`	Model variant
`faceEnhance`	boolean	`false`	Apply GFPGAN face enhancement pass
`denoise`	number	`0`	Denoising strength
`format`	string	`"auto"`	Output format override
`quality`	number	`95`	Output quality (1-100)

OCR / Text Extraction

Tool route: ocr
Models: Tesseract (fast), PaddleOCR PP-OCRv5 (balanced), PaddleOCR-VL 1.5 (best)

Parameter	Type	Default	Description
`quality`	`"fast"` \| `"balanced"` \| `"best"`	`"balanced"`	Processing tier
`language`	string	`"auto"`	Language: `auto`, `en`, `de`, `fr`, `es`, `zh`, `ja`, `ko`
`enhance`	boolean	`true`	Pre-process image to improve OCR accuracy
`engine`	string	-	Deprecated. Maps `tesseract` to `fast`, `paddleocr` to `balanced`

Returns structured results with bounding boxes, confidence scores, and extracted text blocks.

PDF OCR

Tool route: ocr-pdf
Models: Same tier system as image OCR

Extracts text from scanned PDF documents using AI-powered OCR, page by page.

Parameter	Type	Default	Description
`quality`	`"fast"` \| `"balanced"` \| `"best"`	`"balanced"`	Processing tier
`language`	string	`"auto"`	Language: `auto`, `en`, `de`, `fr`, `es`, `zh`, `ja`, `ko`
`pages`	string	`"all"`	Page selection: `"all"`, `"1-3"`, `"1,3,5"`

Face / PII Blur

Tool route: blur-faces
Model: MediaPipe face detection

Parameter	Type	Default	Description
`blurRadius`	number (1-100)	`30`	Gaussian blur radius
`sensitivity`	number (0-1)	`0.5`	Detection confidence threshold

Face Enhancement

Tool route: enhance-faces
Models: GFPGAN, CodeFormer

Parameter	Type	Default	Description
`model`	`"auto"` \| `"gfpgan"` \| `"codeformer"`	`"auto"`	Enhancement model
`strength`	number (0-1)	`0.8`	Enhancement strength
`sensitivity`	number (0-1)	`0.5`	Face detection threshold
`onlyCenterFace`	boolean	`false`	Enhance only the most central face

AI Colorization

Tool route: colorize
Model: DDColor (with OpenCV DNN fallback)

Converts black-and-white or grayscale photos to full color.

Parameter	Type	Default	Description
`intensity`	number (0-1)	`1.0`	Color saturation strength
`model`	`"auto"` \| `"ddcolor"` \| `"opencv"`	`"auto"`	Model variant

Noise Removal

Tool route: noise-removal
Model: SCUNet (tiered denoising pipeline)

Parameter	Type	Default	Description
`tier`	`"quick"` \| `"balanced"` \| `"quality"` \| `"maximum"`	`"balanced"`	Processing tier
`strength`	number (0-100)	`50`	Denoising strength
`detailPreservation`	number (0-100)	`50`	How much detail to preserve; higher keeps more texture
`colorNoise`	number (0-100)	`30`	Color noise reduction strength
`format`	string	`"original"`	Output format: `original`, `png`, `jpeg`, `webp`, `avif`, `jxl`
`quality`	number (1-100)	`90`	Output encoding quality

Red Eye Removal

Tool route: red-eye-removal

Detects face landmarks, locates eye regions, and corrects red-channel oversaturation.

Parameter	Type	Default	Description
`sensitivity`	number (0-100)	`50`	Red pixel detection threshold
`strength`	number (0-100)	`70`	Correction strength
`format`	string	-	Output format override (optional)
`quality`	number (1-100)	`90`	Output quality

Photo Restoration

Tool route: restore-photo

Multi-step pipeline for old or damaged photos: scratch/tear detection and repair, face enhancement, denoising, and optional colorization.

Parameter	Type	Default	Description
`scratchRemoval`	boolean	`true`	Detect and repair scratches, tears
`faceEnhancement`	boolean	`true`	Apply face enhancement pass
`fidelity`	number (0-1)	`0.7`	Face enhancement strength (higher = more conservative)
`denoise`	boolean	`true`	Apply denoising pass
`denoiseStrength`	number (0-100)	`25`	Denoising strength
`colorize`	boolean	`false`	Colorize after restoration
`colorizeStrength`	number (0-100)	`85`	Colorization intensity

Passport Photo

Tool route: passport-photo
Models: MediaPipe face landmarks + BiRefNet background removal

Two-phase workflow: analyze (detect face + remove background) then generate (crop, resize, tile). Supports 37+ countries across 6 regions.

Phase 1: Analyze

POST /api/v1/tools/image/passport-photo/analyze

Accepts an image file (multipart). Returns face landmark data, a base64 preview, and image dimensions.

Phase 2: Generate

POST /api/v1/tools/image/passport-photo/generate

Accepts a JSON body with the Phase 1 results plus generation settings:

Parameter	Type	Default	Description
`jobId`	string	(required)	Job ID from Phase 1
`filename`	string	(required)	Original filename from Phase 1
`countryCode`	string	(required)	ISO country code (e.g., `US`, `GB`, `IN`)
`documentType`	string	`"passport"`	Document type
`bgColor`	string	`"#FFFFFF"`	Background color hex
`printLayout`	string	`"none"`	Print layout: `none`, `4x6`, `a4`, `letter`
`maxFileSizeKb`	number	`0`	Max file size in KB (0 = no limit)
`dpi`	number (72-1200)	`300`	Output DPI
`customWidthMm`	number	-	Custom width in mm (overrides country spec)
`customHeightMm`	number	-	Custom height in mm (overrides country spec)
`zoom`	number (0.5-3)	`1`	Zoom factor
`adjustX`	number	`0`	Horizontal position adjustment
`adjustY`	number	`0`	Vertical position adjustment
`landmarks`	object	(required)	Landmarks from Phase 1
`imageWidth`	number	(required)	Image width from Phase 1
`imageHeight`	number	(required)	Image height from Phase 1

Object Erasing (Inpainting)

Tool route: erase-object
Model: LaMa via ONNX Runtime

The mask is sent as a second file part (fieldname mask), not as base64. White pixels in the mask indicate areas to erase. The format and quality settings are sent as top-level form fields.

Parameter	Type	Default	Description
`file`	file	(required)	Source image (multipart)
`mask`	file	(required)	Mask image (multipart, fieldname `mask`, white = erase)
`format`	string	`"auto"`	Output format: `auto`, `png`, `jpg`, `jpeg`, `webp`, `tiff`, `gif`, `avif`, `heic`, `heif`, `jxl`
`quality`	integer (1-100)	`95`	Output quality

GPU-accelerated when an NVIDIA GPU is available.

AI Canvas Expand

Tool route: ai-canvas-expand
Model: LaMa-based outpainting

Expands the canvas of an image in any direction and fills new areas with AI-generated content that matches the existing image.

Parameter	Type	Default	Description
`extendTop`	integer	`0`	Pixels to extend at the top
`extendRight`	integer	`0`	Pixels to extend at the right
`extendBottom`	integer	`0`	Pixels to extend at the bottom
`extendLeft`	integer	`0`	Pixels to extend at the left
`tier`	`"fast"` \| `"balanced"` \| `"high"`	`"balanced"`	Quality tier
`format`	string	`"auto"`	Output format: `auto`, `png`, `jpg`, `jpeg`, `webp`, `tiff`, `gif`, `avif`, `heic`, `heif`, `jxl`
`quality`	integer (1-100)	`95`	Output quality

At least one extend direction must be greater than 0.

Smart Crop

Tool route: smart-crop
Model: MediaPipe face detection (face mode only)

Parameter	Type	Default	Description
`mode`	string	`"subject"`	Crop strategy: `subject`, `face`, `trim`
`strategy`	`"attention"` \| `"entropy"`	`"attention"`	Strategy for subject mode
`width`	integer	-	Output width
`height`	integer	-	Output height
`padding`	integer (0-50)	`0`	Padding percentage around subject
`facePreset`	string	`"head-shoulders"`	Preset framing when `mode=face`
`sensitivity`	number (0-1)	`0.5`	Face detection threshold
`threshold`	integer (0-255)	`30`	Background detection threshold (trim mode)
`padToSquare`	boolean	`false`	Pad trimmed result to a square
`padColor`	string	`"#ffffff"`	Background color for square padding
`targetSize`	integer	-	Target size for padded output (pixels)
`quality`	integer (1-100)	-	Output quality

Legacy mode values attention and content are accepted and mapped to subject and trim respectively.

Face presets:

Preset	Best for
`closeup`	Headshots
`head-shoulders`	Profile photos
`upper-body`	LinkedIn / formal
`half-body`	Full upper body

Transcribe Audio

Tool route: transcribe-audio
Model: faster-whisper

Converts speech to text. Supports plain text, SRT, and VTT output formats.

Parameter	Type	Default	Description
`language`	string	`"auto"`	Language: `auto`, `en`, `de`, `fr`, `es`, `zh`, `ja`, `ko`, `id`, `th`, `vi`
`outputFormat`	`"txt"` \| `"srt"` \| `"vtt"`	`"txt"`	Output format

Auto Subtitles

Tool route: auto-subtitles
Model: faster-whisper (extracts audio from video, then transcribes)

Generates subtitle files from a video's audio track.

Parameter	Type	Default	Description
`language`	string	`"auto"`	Language: `auto`, `en`, `de`, `fr`, `es`, `zh`, `ja`, `ko`, `id`, `th`, `vi`
`format`	`"srt"` \| `"vtt"`	`"srt"`	Output subtitle format

PNG Transparency Fixer

Tool route: transparency-fixer
Model: BiRefNet HR-matting (2048x2048 resolution)

Fixes "fake transparent" PNGs where the background was removed but left behind fringing, halos, or semi-transparent artifacts. Uses BiRefNet's high-resolution matting model to produce a clean alpha channel, then applies configurable defringe processing to remove color contamination along edges.

OOM fallback chain: If BiRefNet HR-matting exceeds available memory, the tool automatically falls back to birefnet-general, then to u2net.

Parameter	Type	Default	Description
`defringe`	number (0-100)	`30`	Edge defringe strength to remove color contamination
`outputFormat`	`"png"` \| `"webp"`	`"png"`	Output image format
`removeWatermark`	boolean	`false`	Apply watermark removal pre-processing (median filter)

bash

curl -X POST http://localhost:1349/api/v1/tools/image/transparency-fixer \
  -H "Authorization: Bearer <token>" \
  -F "file=@fake-transparent.png" \
  -F 'settings={"defringe":30,"outputFormat":"png"}'

Tools with Optional AI Capabilities

The following tools are not Python sidecar tools but use AI features when certain options are enabled.

Image Enhancement

Tool route: image-enhancement
Engine: Analysis-based (Sharp histogram and statistics)

Analyzes the image and applies automatic corrections for exposure, contrast, white balance, saturation, sharpness, and noise. Supports scene-specific modes.

Parameter	Type	Default	Description
`mode`	`"auto"` \| `"portrait"` \| `"landscape"` \| `"low-light"` \| `"food"` \| `"document"`	`"auto"`	Scene mode for tuning corrections
`intensity`	number (0-100)	`50`	Overall correction strength
`corrections.exposure`	boolean	`true`	Apply exposure correction
`corrections.contrast`	boolean	`true`	Apply contrast correction
`corrections.whiteBalance`	boolean	`true`	Apply white balance correction
`corrections.saturation`	boolean	`true`	Apply saturation correction
`corrections.sharpness`	boolean	`true`	Apply sharpness correction
`corrections.denoise`	boolean	`true`	Apply denoising
`deepEnhance`	boolean	`false`	Enable AI noise removal via SCUNet (requires `upscale-enhance` bundle)

An additional analysis endpoint is available at POST /api/v1/tools/image/image-enhancement/analyze which returns the detected corrections without applying them.

Content-Aware Resize (Seam Carving)

Tool route: content-aware-resize
Engine: Go caire binary (not Python -- no GPU benefit)

Intelligently resizes images by removing low-energy seams, preserving important content.

Parameter	Type	Default	Description
`width`	number	-	Target width
`height`	number	-	Target height
`protectFaces`	boolean	`false`	Protect detected face regions (requires `face-detection` bundle)
`blurRadius`	number (0-20)	`4`	Pre-blur for energy calculation
`sobelThreshold`	number (1-20)	`2`	Edge sensitivity threshold
`square`	boolean	`false`	Force square output

Essentials

Optimization

Adjustments

Watermark & Overlay

Utilities

Layout

Format

AI Tools

AI Engine Reference

Architecture

Feature Bundles

Background Removal

Background Replace

Blur Background

Image Upscaling

OCR / Text Extraction

PDF OCR

Face / PII Blur

Face Enhancement

AI Colorization

Noise Removal

Red Eye Removal

Photo Restoration

Passport Photo

Phase 1: Analyze

Phase 2: Generate

Object Erasing (Inpainting)

AI Canvas Expand

Smart Crop

Transcribe Audio

Auto Subtitles

PNG Transparency Fixer

Tools with Optional AI Capabilities

Image Enhancement

Content-Aware Resize (Seam Carving)

AI Engine Reference ​

Architecture ​

Feature Bundles ​

Background Removal ​

Background Replace ​

Blur Background ​

Image Upscaling ​

OCR / Text Extraction ​

PDF OCR ​

Face / PII Blur ​

Face Enhancement ​

AI Colorization ​

Noise Removal ​

Red Eye Removal ​

Photo Restoration ​

Passport Photo ​

Phase 1: Analyze ​

Phase 2: Generate ​

Object Erasing (Inpainting) ​

AI Canvas Expand ​

Smart Crop ​

Transcribe Audio ​

Auto Subtitles ​

PNG Transparency Fixer ​

Tools with Optional AI Capabilities ​

Image Enhancement ​

Content-Aware Resize (Seam Carving) ​

AI Engine Reference

Architecture

Feature Bundles

Background Removal

Background Replace

Blur Background

Image Upscaling

OCR / Text Extraction

PDF OCR

Face / PII Blur

Face Enhancement

AI Colorization

Noise Removal

Red Eye Removal

Photo Restoration

Passport Photo

Phase 1: Analyze

Phase 2: Generate

Object Erasing (Inpainting)

AI Canvas Expand

Smart Crop

Transcribe Audio

Auto Subtitles

PNG Transparency Fixer

Tools with Optional AI Capabilities

Image Enhancement

Content-Aware Resize (Seam Carving)