🚀 Quick Start — 30-Second Setup
- Click Select Window Source and choose your VN window.
- Drag a rectangle around the dialogue text area in the mirrored preview.
- OCR runs instantly — text appears below and is copied to clipboard.
Note: The RE-CAPTURE button has a 300ms cooldown to prevent accidental double-clicks.
Essential Controls:
- 🔄 RE-CAPTURE: Manually re-scan the same region anytime.
- AUTO Button: Enable hands-free continuous reading (next to RE-CAPTURE).
- Debug Thumbnail (Dev only): Advanced diagnostic view showing the exact capture sent to OCR — useful for engineering troubleshooting.
- Scaling & Engine: Adjust in the header for best results on small or complex
fonts.
- TTS & History: Hear lines spoken aloud (🔊) or browse the sidebar for past
logs.
The OCR Engine reads the text. The Image Processing Mode
prepares the image for the engine. You can mix and match them depending on the media source.
🧩 Recommended Extensions
Important: This application does not provide built-in translations. For the best reading experience, install browser extensions such as
Google Translate, Yomitan, or your preferred text‑to‑speech tool.
These extensions can instantly translate or read aloud the extracted text, making the OCR output much easier to follow.
🛡️ Browser Compatibility
For the definitive Gold v3.8 experience, we recommend:
- Chrome / Edge 113+: Full WebGPU & Threading status.
- Firefox: Stable, but may run in ⚠️ Compatibility Mode.
- Incognito Mode: Not recommended (may block model caching).
If you see ⚠️ instead of 🔥, your hardware or browser is restricting performance.
⚡ Performance & Diagnostics
Check the Performance Icon next to the status pill in the header to see your hardware status:
- 🔥 High-Performance: WebGPU acceleration and Multi-threading are active. OCR will be lightning-fast.
- ⚠️ Compatibility Mode: The browser is running in a restricted sandbox. Performance may be slower as it falls back to CPU-only processing.
Tip: To unlock 🔥 mode, ensure your server headers include COOP/COEP protection.
System Integrity
| Model Assets (Hybrid) |
INIT… |
🔮 Image Processing Modes
These modes prepare the image for the OCR engine. They are NOT the OCR models themselves:
- Default Mini: Balanced speed and accuracy. Smart thresholding with light denoise. Great for clean, modern media.
- Default Full: Maximum fidelity. High-contrast normalization and strong sharpening. Best for classic or low-res captures.
- Adaptive: Adaptive Thresholding. Uses local tile analysis to handle gradients and semi-transparent text boxes with extreme precision.
- Multi-Pass: The Analyst. Runs 5 preprocessing passes and uses "Consensus Voting" to select the most Japanese-like result. Includes a real-time diagnostic overlay with an Engine Indicator to show exactly what is being analyzed.
- Last Resort: The Nuclear Option. Performs advanced textbox isolation and stroke reconstruction. Use when all else fails.
- Contrast: Hard black/white. Best for flat, clean backgrounds.
- Grayscale: Preserved tonal detail. Good for colorful or gradient backgrounds.
- Raw: No digital preprocessing. Direct capture.
🧹 VN‑Optimized Text Validator
Toggle the VN Text Cleaner in the Side Menu (≡) to enable an eight‑layer deterministic cleaning pipeline that removes UI artifacts, normalizes punctuation, and applies visual‑novel‑specific spacing rules. This post‑processing step runs after OCR and significantly improves readability for Japanese visual‑novel text.
Off by default; toggle On for cleaner output.
🔍 Scaling & OCR Engine
Adjust the Scaling slider to upscale the capture before OCR:
- 1x – 2x: Standard 1080p content.
- 3x – 4x: Small or highly packed dialogue boxes.
Choose your OCR Engine:
- Tesseract: Classic engine. Optimized for low-latency standard text.
- PaddleOCR: High-precision neural recognizer. Powered by WebGPU Acceleration and Zero-Allocation memory pooling.
- MangaOCR (Japanese): Premium Transformers model. Optimized with WebGPU Shader Pre-Warming to eliminate stutter.
When PaddleOCR or MangaOCR are selected, the Image Processing Mode dropdown is purposefully bypassed — they use their own high-fidelity pipelines optimized for neural network recognition. Switch back to Tesseract to use preprocessing modes again.
👁️ Autonomous Eye (Auto-Capture)
Use the AUTO button next to RE-CAPTURE to enable hands-free
OCR.
- AUTO: ON — Indicator turns green
- AUTO: OFF — Indicator turns gray
- Clicking toggles automation instantly
- Reflects the global state also found in the side menu
How it works:
- Scans the selected region every 500ms for pixel-level changes.
- Waits 800ms after stabilization to avoid mid-transition frames.
- Works best with a tight capture region around the dialogue text.
🔊 Text-to-Speech
- Select a Japanese voice from the TTS Voice dropdown in the header.
- Click 🔊 next to the latest line to hear it spoken aloud.
- Every history entry also has its own 🔊 button for replaying past lines.
- Select 🔇 TTS Off to disable speech entirely.
📋 High-Speed Clipping (Auto-Copy)
Extract and copy specific text segments from the results tray with zero friction:
- Enable: Turn on Auto-copy selected text in the Side Menu (≡).
- Clip: Simply highlight any text in the transcription area using your mouse.
- Confirm: An accent outline will flash to confirm the text is safely on your clipboard.
💡 Pro Tips
- Deterministic Reset: Use Reset to Defaults in the Side Menu (≡) to clear all cache and restore the v3.1 Gold factory state.
- UI Customization: Use the Side Menu (≡) to adjust the Text Area Size and Text Size to match your device.
- Manga Focus: MangaOCR is strictly optimized for comic panels. For standard Visual Novel UI text, PaddleOCR or Tesseract will almost always yield better results.
- History: The sidebar logs up to 100 lines, restored on next session. Use the 📋 icon next to any entry to copy the full line.