Merged testing.

2025-08-22 00:31:32 -04:00
parent a044bfaade 1fed5922c3
commit ad6de50f9b
8 changed files with 728 additions and 2 deletions
@@ -23,5 +23,5 @@
 # Fork of Orca Screen Reader (GNOME)
 # Original source: https://gitlab.gnome.org/GNOME/orca

-version = "2025.08.19"
+version = "2025.08.22"
 codeName = "master"
@@ -0,0 +1,210 @@
+# OCR Plugin for Cthulhu Screen Reader
+
+A powerful OCR (Optical Character Recognition) plugin that enables Cthulhu users to extract text from visual content including windows, desktop areas, and clipboard images. Originally based on the ocrdesktop project by Chrys, this plugin integrates seamlessly with Cthulhu's accessibility framework.
+
+## Features
+
+- **Window OCR**: Extract text from the currently active window
+- **Desktop OCR**: Extract text from the entire desktop screen
+- **Clipboard OCR**: Extract text from images copied to the clipboard
+- **Voice Announcements**: Clear audio feedback about OCR operations
+- **Multi-threading**: Non-blocking OCR processing with progress tracking
+- **Text Cleanup**: Automatic post-processing to improve OCR text quality
+
+## Keybindings
+
+| Key Combination | Action | Description |
+|----------------|--------|-------------|
+| `Cthulhu+Control+W` | OCR Active Window | Performs OCR on the currently focused window |
+| `Cthulhu+Control+D` | OCR Desktop | Performs OCR on the entire desktop screen |
+| `Cthulhu+Control+Shift+C` | OCR Clipboard | Performs OCR on image data from clipboard |
+
+## Dependencies
+
+### Required Dependencies
+- **python3-pillow** (PIL) - Image processing library
+- **python-pytesseract** - Python wrapper for Tesseract OCR
+- **tesseract** - OCR engine (with language packs)
+- **GTK3/GDK/Wnck** - For screenshot capture (usually pre-installed)
+
+### Installation Commands
+
+#### Arch Linux
+```bash
+sudo pacman -S python-pillow python-pytesseract tesseract tesseract-data-eng
+```
+
+#### Ubuntu/Debian
+```bash
+sudo apt install python3-pil python3-pytesseract tesseract-ocr tesseract-ocr-eng
+```
+
+#### Fedora
+```bash
+sudo dnf install python3-pillow python3-pytesseract tesseract tesseract-langpack-eng
+```
+
+### Additional Language Support
+To add support for other languages, install additional Tesseract language packs:
+
+```bash
+# Examples for different distributions:
+# Arch: sudo pacman -S tesseract-data-fra tesseract-data-deu tesseract-data-spa
+# Ubuntu: sudo apt install tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-spa
+# Fedora: sudo dnf install tesseract-langpack-fra tesseract-langpack-deu tesseract-langpack-spa
+```
+
+## Usage
+
+1. **Enable the Plugin**: The OCR plugin is enabled by default in Cthulhu. If disabled, you can enable it through:
+   - Cthulhu Preferences → Plugins → Check "OCR"
+   - Or ensure `'OCR'` is in the `activePlugins` list in settings.py
+
+2. **Basic OCR Workflow**:
+   - Navigate to content you want to OCR
+   - Press the appropriate key combination
+   - Listen for "Performing OCR on [window/desktop/clipboard]"
+   - Wait for processing to complete
+   - OCR results will be announced via speech
+
+3. **Best Practices**:
+   - Ensure good contrast between text and background for better results
+   - Use window OCR for focused content (faster processing)
+   - Use desktop OCR for content spanning multiple windows
+   - Use clipboard OCR for images from web browsers or image viewers
+
+## Configuration
+
+### OCR Settings
+The plugin uses the following default settings (configurable in plugin.py):
+
+```python
+self._languageCode = 'eng'          # Tesseract language code
+self._scaleFactor = 3               # Image scaling for better OCR
+self._grayscaleImg = False          # Convert to grayscale
+self._invertImg = False             # Invert image colors
+self._blackWhiteImg = False         # Convert to black/white
+self._blackWhiteImgValue = 200      # B/W threshold value
+```
+
+### Changing OCR Language
+To change the default OCR language, modify `self._languageCode` in the plugin's `__init__` method:
+
+```python
+# Examples:
+self._languageCode = 'fra'  # French
+self._languageCode = 'deu'  # German
+self._languageCode = 'spa'  # Spanish
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### "No text found in OCR scan"
+- **Cause**: Poor image quality, unsupported language, or no text in captured area
+- **Solutions**:
+  - Try different OCR mode (window vs desktop)
+  - Ensure text has good contrast
+  - Check if correct language pack is installed
+  - Verify text is actually visible in the captured area
+
+#### "Missing dependencies" message
+- **Cause**: Required Python packages or Tesseract not installed
+- **Solution**: Install missing packages using commands above
+
+#### OCR taking too long
+- **Cause**: Large desktop screenshots or complex images
+- **Solutions**:
+  - Use window OCR instead of desktop OCR when possible
+  - Close unnecessary windows before desktop OCR
+  - Consider adjusting `_scaleFactor` (lower = faster)
+
+#### No speech output
+- **Cause**: Cthulhu speech settings or audio issues
+- **Solutions**:
+  - Check Cthulhu speech settings
+  - Test other Cthulhu speech functions
+  - Verify audio system is working
+
+### Debug Information
+OCR plugin debug messages are logged to Cthulhu's debug output. To enable debug logging:
+
+```bash
+cthulhu --debug > ocr_debug.log 2>&1
+```
+
+Look for messages starting with "OCRDesktop:" in the log file.
+
+## Technical Details
+
+### Architecture
+- **Base Class**: Extends `cthulhu.plugin.Plugin`
+- **Threading**: Uses Python threading for non-blocking OCR processing
+- **Image Processing**: PIL/Pillow for image manipulation and enhancement
+- **OCR Engine**: Tesseract via pytesseract wrapper
+- **Integration**: Uses Cthulhu's speech system for output
+
+### Image Processing Pipeline
+1. **Capture**: Screenshot via GDK pixbuf system
+2. **Scale**: Enlarge image by scale factor (default 3x)
+3. **Transform**: Apply filters (grayscale, invert, etc.) if enabled
+4. **OCR**: Process with Tesseract OCR engine
+5. **Cleanup**: Remove extra whitespace and format text
+6. **Present**: Announce results via Cthulhu speech
+
+### Text Post-Processing
+The plugin automatically cleans OCR output by:
+- Removing multiple consecutive spaces
+- Eliminating empty lines
+- Trimming leading/trailing whitespace
+- Removing trailing newlines
+
+## Development
+
+### Plugin Structure
+```
+src/cthulhu/plugins/OCR/
+├── __init__.py          # Package import
+├── plugin.py            # Main plugin implementation
+├── plugin.info          # Plugin metadata
+├── meson.build          # Build system integration
+└── README.md           # This documentation
+```
+
+### Key Methods
+- `_ocrActiveWindow()`: Captures and OCRs active window
+- `_ocrDesktop()`: Captures and OCRs entire desktop
+- `_ocrClipboard()`: OCRs image from clipboard
+- `_performOCR()`: Core OCR processing logic
+- `_presentOCRResult()`: Announces results via speech
+
+### Extending the Plugin
+To add new OCR modes or features:
+
+1. Add new keybinding in `_registerKeybindings()`
+2. Create handler method following pattern `_ocrNewMode()`
+3. Implement image capture logic for new mode
+4. Use existing `_performOCR()` and `_presentOCRResult()` methods
+
+## Credits
+
+- **Original ocrdesktop**: Created by Chrys (chrys87@users.noreply.github.com)
+- **Cthulhu Integration**: Adapted by Storm Dragon for Cthulhu plugin system
+- **Cthulhu Screen Reader**: https://git.stormux.org/storm/cthulhu
+- **Tesseract OCR**: https://github.com/tesseract-ocr/tesseract
+
+## License
+
+This plugin is distributed under the GNU Lesser General Public License (LGPL) version 2.1 or later, consistent with the Cthulhu screen reader project.
+
+## Support
+
+For issues, questions, or contributions:
+- **Cthulhu Repository**: https://git.stormux.org/storm/cthulhu
+- **Community**: IRC #stormux on irc.stormux.org
+- **Email**: storm_dragon@stormux.org
+
+---
+
+*Part of the Cthulhu Screen Reader project - Making the desktop accessible for everyone.*
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2025 Stormux
+# Copyright (c) 2022 Chrys (original ocrdesktop)
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the
+# Free Software Foundation, Inc., Franklin Street, Fifth Floor,
+# Boston MA  02110-1301 USA.
+
+"""OCRDesktop plugin package."""
+
+from .plugin import OCRDesktop
@@ -0,0 +1,14 @@
+ocrdesktop_python_sources = files([
+  '__init__.py',
+  'plugin.py'
+])
+
+python3.install_sources(
+  ocrdesktop_python_sources,
+  subdir: 'cthulhu/plugins/OCRDesktop'
+)
+
+install_data(
+  'plugin.info',
+  install_dir: python3.get_install_dir() / 'cthulhu' / 'plugins' / 'OCRDesktop'
+)
@@ -0,0 +1,8 @@
+name = OCR Desktop
+version = 4.0.0
+description = OCR accessibility tool for reading inaccessible windows and dialogs using Tesseract OCR
+authors = Storm Dragon <storm_dragon@stormux.org>
+website = https://github.com/chrys87/ocrdesktop
+copyright = Copyright 2022 Chrys, Copyright 2025 Stormux
+builtin = false
+hidden = false
@@ -0,0 +1,470 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2025 Stormux
+# Copyright (c) 2022 Chrys (original ocrdesktop)
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+
+"""OCRDesktop plugin for Cthulhu screen reader."""
+
+import logging
+import os
+import sys
+import locale
+import time
+import re
+import tempfile
+import threading
+from mimetypes import MimeTypes
+
+from cthulhu.plugin import Plugin, cthulhu_hookimpl
+from cthulhu import debug
+
+# Note: Removed complex beep system - simple announcements work perfectly!
+
+# PIL
+try:
+    from PIL import Image
+    from PIL import ImageOps
+    PIL_AVAILABLE = True
+except ImportError:
+    PIL_AVAILABLE = False
+
+# pytesseract
+try:
+    import pytesseract
+    from pytesseract import Output
+    PYTESSERACT_AVAILABLE = True
+except ImportError:
+    PYTESSERACT_AVAILABLE = False
+
+# pdf2image
+try:
+    from pdf2image import convert_from_path
+    PDF2IMAGE_AVAILABLE = True
+except ImportError:
+    PDF2IMAGE_AVAILABLE = False
+
+# scipy
+try:
+    from scipy.spatial import KDTree
+    SCIPY_AVAILABLE = True
+except ImportError:
+    SCIPY_AVAILABLE = False
+
+# webcolors
+try:
+    from webcolors import CSS3_HEX_TO_NAMES
+    from webcolors import hex_to_rgb
+    WEBCOLORS_AVAILABLE = True
+except ImportError:
+    WEBCOLORS_AVAILABLE = False
+
+# GTK/GDK/Wnck
+try:
+    import gi
+    gi.require_version("Gtk", "3.0")
+    gi.require_version("Gdk", "3.0")
+    gi.require_version("Wnck", "3.0")
+    from gi.repository import Gtk, Gdk, Wnck
+    GTK_AVAILABLE = True
+except ImportError:
+    GTK_AVAILABLE = False
+
+logger = logging.getLogger(__name__)
+
+class OCRDesktop(Plugin):
+    """OCR Desktop accessibility plugin for reading inaccessible windows."""
+    
+    def __init__(self, *args, **kwargs):
+        """Initialize the plugin."""
+        super().__init__(*args, **kwargs)
+        debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin initialized", True)
+        
+        # Keybinding storage
+        self._kb_binding_window = None
+        self._kb_binding_desktop = None
+        self._kb_binding_clipboard = None
+        
+        # OCR settings
+        self._languageCode = 'eng'
+        self._scaleFactor = 3
+        self._grayscaleImg = False
+        self._invertImg = False
+        self._blackWhiteImg = False
+        self._blackWhiteImgValue = 200
+        self._colorCalculation = False
+        self._colorCalculationMax = 3
+        
+        # Internal state
+        self._img = []
+        self._modifiedImg = []
+        self._OCRText = ''
+        self._offsetXpos = 0
+        self._offsetYpos = 0
+        self._activated = False
+        
+        # Progress feedback
+        self._is_processing = False
+        
+        # Color analysis
+        self._kdtDB = None
+        self.colorNames = []
+        self.colorCache = {}
+        
+        # Set locale for tesseract
+        locale.setlocale(locale.LC_ALL, 'C')
+        
+        # Check dependencies
+        self._checkDependencies()
+    
+    def _checkDependencies(self):
+        """Check if required dependencies are available."""
+        missing_deps = []
+        
+        if not PIL_AVAILABLE:
+            missing_deps.append("python3-pillow")
+        if not PYTESSERACT_AVAILABLE:
+            missing_deps.append("python-pytesseract")
+        if not GTK_AVAILABLE:
+            missing_deps.append("GTK3/GDK/Wnck")
+            
+        if missing_deps:
+            debug.printMessage(debug.LEVEL_INFO, 
+                f"OCRDesktop: Missing dependencies: {', '.join(missing_deps)}", True)
+            return False
+        return True
+    
+    @cthulhu_hookimpl
+    def activate(self, plugin=None):
+        """Activate the plugin."""
+        if plugin is not None and plugin is not self:
+            return
+            
+        if self._activated:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already activated", True)
+            return
+            
+        try:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin activation starting", True)
+            
+            if not self.app:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: ERROR - No app reference", True)
+                return
+                
+            if not self._checkDependencies():
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Cannot activate - missing dependencies", True)
+                return
+            
+            # Register keybindings
+            self._registerKeybindings()
+            
+            self._activated = True
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin activated successfully", True)
+            
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error activating: {e}", True)
+            import traceback
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: {traceback.format_exc()}", True)
+    
+    @cthulhu_hookimpl
+    def deactivate(self, plugin=None):
+        """Deactivate the plugin."""
+        if plugin is not None and plugin is not self:
+            return
+            
+        self._activated = False
+        debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin deactivated", True)
+    
+    def _registerKeybindings(self):
+        """Register plugin keybindings."""
+        try:
+            # OCR active window
+            self._kb_binding_window = self.registerGestureByString(
+                self._ocrActiveWindow,
+                "OCR read active window",
+                'kb:cthulhu+control+w'
+            )
+            
+            # OCR entire desktop
+            self._kb_binding_desktop = self.registerGestureByString(
+                self._ocrDesktop,
+                "OCR read entire desktop",
+                'kb:cthulhu+control+d'
+            )
+            
+            # OCR from clipboard
+            self._kb_binding_clipboard = self.registerGestureByString(
+                self._ocrClipboard,
+                "OCR read image from clipboard",
+                'kb:cthulhu+control+shift+c'
+            )
+            
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Keybindings registered", True)
+            
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error registering keybindings: {e}", True)
+    
+    
+    def _announceOCRStart(self, ocr_type):
+        """Announce the start of OCR operation."""
+        try:
+            message = f"Performing OCR on {ocr_type}"
+            if self.app:
+                state = self.app.getDynamicApiManager().getAPI('CthulhuState')
+                if state and state.activeScript:
+                    state.activeScript.presentMessage(message, resetStyles=False)
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: {message}", True)
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error announcing OCR start: {e}", True)
+    
+    def _ocrActiveWindow(self, script=None, inputEvent=None):
+        """OCR the active window."""
+        try:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR active window requested", True)
+            
+            if self._is_processing:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
+                return True
+                
+            self._is_processing = True
+            self._announceOCRStart("window")
+            
+            try:
+                if self._screenShotWindow():
+                    self._performOCR()
+                    self._presentOCRResult()
+            finally:
+                self._is_processing = False
+                
+            return True
+        except Exception as e:
+            self._is_processing = False
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR window: {e}", True)
+            return False
+    
+    def _ocrDesktop(self, script=None, inputEvent=None):
+        """OCR the entire desktop."""
+        try:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR desktop requested", True)
+            
+            if self._is_processing:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
+                return True
+                
+            self._is_processing = True
+            self._announceOCRStart("desktop")
+            
+            try:
+                if self._screenShotDesktop():
+                    self._performOCR()
+                    self._presentOCRResult()
+            finally:
+                self._is_processing = False
+                
+            return True
+        except Exception as e:
+            self._is_processing = False
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR desktop: {e}", True)
+            return False
+    
+    def _ocrClipboard(self, script=None, inputEvent=None):
+        """OCR image from clipboard."""
+        try:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR clipboard requested", True)
+            
+            if self._is_processing:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
+                return True
+                
+            self._is_processing = True
+            self._announceOCRStart("clipboard")
+            
+            try:
+                if self._readClipboard():
+                    self._performOCR()
+                    self._presentOCRResult()
+            finally:
+                self._is_processing = False
+                
+            return True
+        except Exception as e:
+            self._is_processing = False
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR clipboard: {e}", True)
+            return False
+    
+    def _screenShotWindow(self):
+        """Take screenshot of active window."""
+        if not GTK_AVAILABLE:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for screenshots", True)
+            return False
+            
+        try:
+            time.sleep(0.3)  # Brief delay
+            gdkCurrDesktop = Gdk.get_default_root_window()
+            
+            currWnckScreen = Wnck.Screen.get_default()
+            currWnckScreen.force_update()
+            currWnckWindow = currWnckScreen.get_active_window()
+            
+            if not currWnckWindow:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: No active window found", True)
+                return False
+                
+            self._offsetXpos, self._offsetYpos, wnckWidth, wnckHeight = currWnckWindow.get_geometry()
+            pixBuff = Gdk.pixbuf_get_from_window(gdkCurrDesktop, self._offsetXpos, self._offsetYpos, wnckWidth, wnckHeight)
+            
+            if pixBuff:
+                self._img = [self._pixbuf2image(pixBuff)]
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Window screenshot captured", True)
+                return True
+            else:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Failed to capture window screenshot", True)
+                return False
+                
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error taking window screenshot: {e}", True)
+            return False
+    
+    def _screenShotDesktop(self):
+        """Take screenshot of entire desktop."""
+        if not GTK_AVAILABLE:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for screenshots", True)
+            return False
+            
+        try:
+            time.sleep(0.3)  # Brief delay
+            currDesktop = Gdk.get_default_root_window()
+            pixBuff = Gdk.pixbuf_get_from_window(currDesktop, 0, 0, currDesktop.get_width(), currDesktop.get_height())
+            
+            if pixBuff:
+                self._img = [self._pixbuf2image(pixBuff)]
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Desktop screenshot captured", True)
+                return True
+            else:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Failed to capture desktop screenshot", True)
+                return False
+                
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error taking desktop screenshot: {e}", True)
+            return False
+    
+    def _readClipboard(self):
+        """Read image from clipboard."""
+        if not GTK_AVAILABLE:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for clipboard", True)
+            return False
+            
+        try:
+            clipboardObj = Gtk.Clipboard.get(Gdk.SELECTION_CLIPBOARD)
+            pixBuff = clipboardObj.wait_for_image()
+            
+            if pixBuff:
+                self._img = [self._pixbuf2image(pixBuff)]
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Image read from clipboard", True)
+                return True
+            else:
+                debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: No image found in clipboard", True)
+                return False
+                
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error reading clipboard: {e}", True)
+            return False
+    
+    def _pixbuf2image(self, pix):
+        """Convert GdkPixbuf to PIL Image."""
+        data = pix.get_pixels()
+        w = pix.props.width
+        h = pix.props.height
+        stride = pix.props.rowstride
+        mode = "RGB"
+        if pix.props.has_alpha:
+            mode = "RGBA"
+        im = Image.frombytes(mode, (w, h), data, "raw", mode, stride)
+        return im
+    
+    def _scaleImg(self, img):
+        """Scale image for better OCR results."""
+        width_screen, height_screen = img.size
+        width_screen = width_screen * self._scaleFactor
+        height_screen = height_screen * self._scaleFactor
+        scaledImg = img.resize((width_screen, height_screen), Image.Resampling.BICUBIC)
+        return scaledImg
+    
+    def _transformImg(self, img):
+        """Transform image with various filters for better OCR."""
+        modifiedImg = self._scaleImg(img)
+        
+        if self._invertImg:
+            modifiedImg = ImageOps.invert(modifiedImg)
+        if self._grayscaleImg:
+            modifiedImg = ImageOps.grayscale(modifiedImg)
+        if self._blackWhiteImg:
+            lut = [255 if v > self._blackWhiteImgValue else 0 for v in range(256)]
+            modifiedImg = modifiedImg.point(lut)
+            
+        return modifiedImg
+    
+    def _performOCR(self):
+        """Perform OCR on captured images."""
+        if not PYTESSERACT_AVAILABLE:
+            debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Tesseract not available", True)
+            return
+            
+        debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Starting OCR", True)
+        self._OCRText = ''
+        
+        for img in self._img:
+            modifiedImg = self._transformImg(img)
+            try:
+                # Simple text extraction
+                text = pytesseract.image_to_string(modifiedImg, lang=self._languageCode, config='--psm 4')
+                self._OCRText += text + '\n'
+            except Exception as e:
+                debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: OCR error: {e}", True)
+        
+        # Clean up text
+        self._cleanOCRText()
+        debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR completed", True)
+    
+    def _cleanOCRText(self):
+        """Clean up OCR text output."""
+        # Remove multiple spaces
+        regexSpace = re.compile('[^\S\r\n]{2,}')
+        self._OCRText = regexSpace.sub(' ', self._OCRText)
+        
+        # Remove empty lines
+        regexSpace = re.compile('\n\s*\n')
+        self._OCRText = regexSpace.sub('\n', self._OCRText)
+        
+        # Remove trailing spaces
+        regexSpace = re.compile('\s*\n')
+        self._OCRText = regexSpace.sub('\n', self._OCRText)
+        
+        # Remove leading spaces
+        regexSpace = re.compile('^\s')
+        self._OCRText = regexSpace.sub('', self._OCRText)
+        
+        # Remove trailing newlines
+        self._OCRText = self._OCRText.strip()
+    
+    def _presentOCRResult(self):
+        """Present OCR result to user via speech."""
+        try:
+            if not self._OCRText.strip():
+                message = "No text found in OCR scan"
+            else:
+                message = f"OCR result: {self._OCRText}"
+            
+            if self.app:
+                state = self.app.getDynamicApiManager().getAPI('CthulhuState')
+                if state and state.activeScript:
+                    state.activeScript.presentMessage(message, resetStyles=False)
+                    
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Presented result: {len(self._OCRText)} characters", True)
+            
+        except Exception as e:
+            debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error presenting result: {e}", True)
@@ -5,6 +5,7 @@ subdir('Clipboard')
 subdir('DisplayVersion')
 subdir('HelloCthulhu')
 subdir('IndentationAudio')
+subdir('OCR')
 subdir('PluginManager')
 subdir('SimplePluginSystem')
 subdir('hello_world')
@@ -431,7 +431,7 @@ presentChatRoomLast = False
 presentLiveRegionFromInactiveTab = False

 # Plugins
-activePlugins = ['AIAssistant', 'DisplayVersion', 'PluginManager', 'HelloCthulhu', 'ByeCthulhu']
+activePlugins = ['AIAssistant', 'DisplayVersion', 'OCR', 'PluginManager', 'HelloCthulhu', 'ByeCthulhu']

 # AI Assistant settings (disabled by default for opt-in behavior)
 aiAssistantEnabled = True