Merged testing.

This commit is contained in:
Storm Dragon
2025-08-22 00:31:32 -04:00
8 changed files with 728 additions and 2 deletions

View File

@@ -23,5 +23,5 @@
# Fork of Orca Screen Reader (GNOME)
# Original source: https://gitlab.gnome.org/GNOME/orca
version = "2025.08.19"
version = "2025.08.22"
codeName = "master"

View File

@@ -0,0 +1,210 @@
# OCR Plugin for Cthulhu Screen Reader
A powerful OCR (Optical Character Recognition) plugin that enables Cthulhu users to extract text from visual content including windows, desktop areas, and clipboard images. Originally based on the ocrdesktop project by Chrys, this plugin integrates seamlessly with Cthulhu's accessibility framework.
## Features
- **Window OCR**: Extract text from the currently active window
- **Desktop OCR**: Extract text from the entire desktop screen
- **Clipboard OCR**: Extract text from images copied to the clipboard
- **Voice Announcements**: Clear audio feedback about OCR operations
- **Multi-threading**: Non-blocking OCR processing with progress tracking
- **Text Cleanup**: Automatic post-processing to improve OCR text quality
## Keybindings
| Key Combination | Action | Description |
|----------------|--------|-------------|
| `Cthulhu+Control+W` | OCR Active Window | Performs OCR on the currently focused window |
| `Cthulhu+Control+D` | OCR Desktop | Performs OCR on the entire desktop screen |
| `Cthulhu+Control+Shift+C` | OCR Clipboard | Performs OCR on image data from clipboard |
## Dependencies
### Required Dependencies
- **python3-pillow** (PIL) - Image processing library
- **python-pytesseract** - Python wrapper for Tesseract OCR
- **tesseract** - OCR engine (with language packs)
- **GTK3/GDK/Wnck** - For screenshot capture (usually pre-installed)
### Installation Commands
#### Arch Linux
```bash
sudo pacman -S python-pillow python-pytesseract tesseract tesseract-data-eng
```
#### Ubuntu/Debian
```bash
sudo apt install python3-pil python3-pytesseract tesseract-ocr tesseract-ocr-eng
```
#### Fedora
```bash
sudo dnf install python3-pillow python3-pytesseract tesseract tesseract-langpack-eng
```
### Additional Language Support
To add support for other languages, install additional Tesseract language packs:
```bash
# Examples for different distributions:
# Arch: sudo pacman -S tesseract-data-fra tesseract-data-deu tesseract-data-spa
# Ubuntu: sudo apt install tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-spa
# Fedora: sudo dnf install tesseract-langpack-fra tesseract-langpack-deu tesseract-langpack-spa
```
## Usage
1. **Enable the Plugin**: The OCR plugin is enabled by default in Cthulhu. If disabled, you can enable it through:
- Cthulhu Preferences → Plugins → Check "OCR"
- Or ensure `'OCR'` is in the `activePlugins` list in settings.py
2. **Basic OCR Workflow**:
- Navigate to content you want to OCR
- Press the appropriate key combination
- Listen for "Performing OCR on [window/desktop/clipboard]"
- Wait for processing to complete
- OCR results will be announced via speech
3. **Best Practices**:
- Ensure good contrast between text and background for better results
- Use window OCR for focused content (faster processing)
- Use desktop OCR for content spanning multiple windows
- Use clipboard OCR for images from web browsers or image viewers
## Configuration
### OCR Settings
The plugin uses the following default settings (configurable in plugin.py):
```python
self._languageCode = 'eng' # Tesseract language code
self._scaleFactor = 3 # Image scaling for better OCR
self._grayscaleImg = False # Convert to grayscale
self._invertImg = False # Invert image colors
self._blackWhiteImg = False # Convert to black/white
self._blackWhiteImgValue = 200 # B/W threshold value
```
### Changing OCR Language
To change the default OCR language, modify `self._languageCode` in the plugin's `__init__` method:
```python
# Examples:
self._languageCode = 'fra' # French
self._languageCode = 'deu' # German
self._languageCode = 'spa' # Spanish
```
## Troubleshooting
### Common Issues
#### "No text found in OCR scan"
- **Cause**: Poor image quality, unsupported language, or no text in captured area
- **Solutions**:
- Try different OCR mode (window vs desktop)
- Ensure text has good contrast
- Check if correct language pack is installed
- Verify text is actually visible in the captured area
#### "Missing dependencies" message
- **Cause**: Required Python packages or Tesseract not installed
- **Solution**: Install missing packages using commands above
#### OCR taking too long
- **Cause**: Large desktop screenshots or complex images
- **Solutions**:
- Use window OCR instead of desktop OCR when possible
- Close unnecessary windows before desktop OCR
- Consider adjusting `_scaleFactor` (lower = faster)
#### No speech output
- **Cause**: Cthulhu speech settings or audio issues
- **Solutions**:
- Check Cthulhu speech settings
- Test other Cthulhu speech functions
- Verify audio system is working
### Debug Information
OCR plugin debug messages are logged to Cthulhu's debug output. To enable debug logging:
```bash
cthulhu --debug > ocr_debug.log 2>&1
```
Look for messages starting with "OCRDesktop:" in the log file.
## Technical Details
### Architecture
- **Base Class**: Extends `cthulhu.plugin.Plugin`
- **Threading**: Uses Python threading for non-blocking OCR processing
- **Image Processing**: PIL/Pillow for image manipulation and enhancement
- **OCR Engine**: Tesseract via pytesseract wrapper
- **Integration**: Uses Cthulhu's speech system for output
### Image Processing Pipeline
1. **Capture**: Screenshot via GDK pixbuf system
2. **Scale**: Enlarge image by scale factor (default 3x)
3. **Transform**: Apply filters (grayscale, invert, etc.) if enabled
4. **OCR**: Process with Tesseract OCR engine
5. **Cleanup**: Remove extra whitespace and format text
6. **Present**: Announce results via Cthulhu speech
### Text Post-Processing
The plugin automatically cleans OCR output by:
- Removing multiple consecutive spaces
- Eliminating empty lines
- Trimming leading/trailing whitespace
- Removing trailing newlines
## Development
### Plugin Structure
```
src/cthulhu/plugins/OCR/
├── __init__.py # Package import
├── plugin.py # Main plugin implementation
├── plugin.info # Plugin metadata
├── meson.build # Build system integration
└── README.md # This documentation
```
### Key Methods
- `_ocrActiveWindow()`: Captures and OCRs active window
- `_ocrDesktop()`: Captures and OCRs entire desktop
- `_ocrClipboard()`: OCRs image from clipboard
- `_performOCR()`: Core OCR processing logic
- `_presentOCRResult()`: Announces results via speech
### Extending the Plugin
To add new OCR modes or features:
1. Add new keybinding in `_registerKeybindings()`
2. Create handler method following pattern `_ocrNewMode()`
3. Implement image capture logic for new mode
4. Use existing `_performOCR()` and `_presentOCRResult()` methods
## Credits
- **Original ocrdesktop**: Created by Chrys (chrys87@users.noreply.github.com)
- **Cthulhu Integration**: Adapted by Storm Dragon for Cthulhu plugin system
- **Cthulhu Screen Reader**: https://git.stormux.org/storm/cthulhu
- **Tesseract OCR**: https://github.com/tesseract-ocr/tesseract
## License
This plugin is distributed under the GNU Lesser General Public License (LGPL) version 2.1 or later, consistent with the Cthulhu screen reader project.
## Support
For issues, questions, or contributions:
- **Cthulhu Repository**: https://git.stormux.org/storm/cthulhu
- **Community**: IRC #stormux on irc.stormux.org
- **Email**: storm_dragon@stormux.org
---
*Part of the Cthulhu Screen Reader project - Making the desktop accessible for everyone.*

View File

@@ -0,0 +1,23 @@
#!/usr/bin/env python3
#
# Copyright (c) 2025 Stormux
# Copyright (c) 2022 Chrys (original ocrdesktop)
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the
# Free Software Foundation, Inc., Franklin Street, Fifth Floor,
# Boston MA 02110-1301 USA.
"""OCRDesktop plugin package."""
from .plugin import OCRDesktop

View File

@@ -0,0 +1,14 @@
ocrdesktop_python_sources = files([
'__init__.py',
'plugin.py'
])
python3.install_sources(
ocrdesktop_python_sources,
subdir: 'cthulhu/plugins/OCRDesktop'
)
install_data(
'plugin.info',
install_dir: python3.get_install_dir() / 'cthulhu' / 'plugins' / 'OCRDesktop'
)

View File

@@ -0,0 +1,8 @@
name = OCR Desktop
version = 4.0.0
description = OCR accessibility tool for reading inaccessible windows and dialogs using Tesseract OCR
authors = Storm Dragon <storm_dragon@stormux.org>
website = https://github.com/chrys87/ocrdesktop
copyright = Copyright 2022 Chrys, Copyright 2025 Stormux
builtin = false
hidden = false

View File

@@ -0,0 +1,470 @@
#!/usr/bin/env python3
#
# Copyright (c) 2025 Stormux
# Copyright (c) 2022 Chrys (original ocrdesktop)
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
"""OCRDesktop plugin for Cthulhu screen reader."""
import logging
import os
import sys
import locale
import time
import re
import tempfile
import threading
from mimetypes import MimeTypes
from cthulhu.plugin import Plugin, cthulhu_hookimpl
from cthulhu import debug
# Note: Removed complex beep system - simple announcements work perfectly!
# PIL
try:
from PIL import Image
from PIL import ImageOps
PIL_AVAILABLE = True
except ImportError:
PIL_AVAILABLE = False
# pytesseract
try:
import pytesseract
from pytesseract import Output
PYTESSERACT_AVAILABLE = True
except ImportError:
PYTESSERACT_AVAILABLE = False
# pdf2image
try:
from pdf2image import convert_from_path
PDF2IMAGE_AVAILABLE = True
except ImportError:
PDF2IMAGE_AVAILABLE = False
# scipy
try:
from scipy.spatial import KDTree
SCIPY_AVAILABLE = True
except ImportError:
SCIPY_AVAILABLE = False
# webcolors
try:
from webcolors import CSS3_HEX_TO_NAMES
from webcolors import hex_to_rgb
WEBCOLORS_AVAILABLE = True
except ImportError:
WEBCOLORS_AVAILABLE = False
# GTK/GDK/Wnck
try:
import gi
gi.require_version("Gtk", "3.0")
gi.require_version("Gdk", "3.0")
gi.require_version("Wnck", "3.0")
from gi.repository import Gtk, Gdk, Wnck
GTK_AVAILABLE = True
except ImportError:
GTK_AVAILABLE = False
logger = logging.getLogger(__name__)
class OCRDesktop(Plugin):
"""OCR Desktop accessibility plugin for reading inaccessible windows."""
def __init__(self, *args, **kwargs):
"""Initialize the plugin."""
super().__init__(*args, **kwargs)
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin initialized", True)
# Keybinding storage
self._kb_binding_window = None
self._kb_binding_desktop = None
self._kb_binding_clipboard = None
# OCR settings
self._languageCode = 'eng'
self._scaleFactor = 3
self._grayscaleImg = False
self._invertImg = False
self._blackWhiteImg = False
self._blackWhiteImgValue = 200
self._colorCalculation = False
self._colorCalculationMax = 3
# Internal state
self._img = []
self._modifiedImg = []
self._OCRText = ''
self._offsetXpos = 0
self._offsetYpos = 0
self._activated = False
# Progress feedback
self._is_processing = False
# Color analysis
self._kdtDB = None
self.colorNames = []
self.colorCache = {}
# Set locale for tesseract
locale.setlocale(locale.LC_ALL, 'C')
# Check dependencies
self._checkDependencies()
def _checkDependencies(self):
"""Check if required dependencies are available."""
missing_deps = []
if not PIL_AVAILABLE:
missing_deps.append("python3-pillow")
if not PYTESSERACT_AVAILABLE:
missing_deps.append("python-pytesseract")
if not GTK_AVAILABLE:
missing_deps.append("GTK3/GDK/Wnck")
if missing_deps:
debug.printMessage(debug.LEVEL_INFO,
f"OCRDesktop: Missing dependencies: {', '.join(missing_deps)}", True)
return False
return True
@cthulhu_hookimpl
def activate(self, plugin=None):
"""Activate the plugin."""
if plugin is not None and plugin is not self:
return
if self._activated:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already activated", True)
return
try:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin activation starting", True)
if not self.app:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: ERROR - No app reference", True)
return
if not self._checkDependencies():
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Cannot activate - missing dependencies", True)
return
# Register keybindings
self._registerKeybindings()
self._activated = True
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin activated successfully", True)
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error activating: {e}", True)
import traceback
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: {traceback.format_exc()}", True)
@cthulhu_hookimpl
def deactivate(self, plugin=None):
"""Deactivate the plugin."""
if plugin is not None and plugin is not self:
return
self._activated = False
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Plugin deactivated", True)
def _registerKeybindings(self):
"""Register plugin keybindings."""
try:
# OCR active window
self._kb_binding_window = self.registerGestureByString(
self._ocrActiveWindow,
"OCR read active window",
'kb:cthulhu+control+w'
)
# OCR entire desktop
self._kb_binding_desktop = self.registerGestureByString(
self._ocrDesktop,
"OCR read entire desktop",
'kb:cthulhu+control+d'
)
# OCR from clipboard
self._kb_binding_clipboard = self.registerGestureByString(
self._ocrClipboard,
"OCR read image from clipboard",
'kb:cthulhu+control+shift+c'
)
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Keybindings registered", True)
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error registering keybindings: {e}", True)
def _announceOCRStart(self, ocr_type):
"""Announce the start of OCR operation."""
try:
message = f"Performing OCR on {ocr_type}"
if self.app:
state = self.app.getDynamicApiManager().getAPI('CthulhuState')
if state and state.activeScript:
state.activeScript.presentMessage(message, resetStyles=False)
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: {message}", True)
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error announcing OCR start: {e}", True)
def _ocrActiveWindow(self, script=None, inputEvent=None):
"""OCR the active window."""
try:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR active window requested", True)
if self._is_processing:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
return True
self._is_processing = True
self._announceOCRStart("window")
try:
if self._screenShotWindow():
self._performOCR()
self._presentOCRResult()
finally:
self._is_processing = False
return True
except Exception as e:
self._is_processing = False
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR window: {e}", True)
return False
def _ocrDesktop(self, script=None, inputEvent=None):
"""OCR the entire desktop."""
try:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR desktop requested", True)
if self._is_processing:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
return True
self._is_processing = True
self._announceOCRStart("desktop")
try:
if self._screenShotDesktop():
self._performOCR()
self._presentOCRResult()
finally:
self._is_processing = False
return True
except Exception as e:
self._is_processing = False
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR desktop: {e}", True)
return False
def _ocrClipboard(self, script=None, inputEvent=None):
"""OCR image from clipboard."""
try:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR clipboard requested", True)
if self._is_processing:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Already processing, ignoring request", True)
return True
self._is_processing = True
self._announceOCRStart("clipboard")
try:
if self._readClipboard():
self._performOCR()
self._presentOCRResult()
finally:
self._is_processing = False
return True
except Exception as e:
self._is_processing = False
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error in OCR clipboard: {e}", True)
return False
def _screenShotWindow(self):
"""Take screenshot of active window."""
if not GTK_AVAILABLE:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for screenshots", True)
return False
try:
time.sleep(0.3) # Brief delay
gdkCurrDesktop = Gdk.get_default_root_window()
currWnckScreen = Wnck.Screen.get_default()
currWnckScreen.force_update()
currWnckWindow = currWnckScreen.get_active_window()
if not currWnckWindow:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: No active window found", True)
return False
self._offsetXpos, self._offsetYpos, wnckWidth, wnckHeight = currWnckWindow.get_geometry()
pixBuff = Gdk.pixbuf_get_from_window(gdkCurrDesktop, self._offsetXpos, self._offsetYpos, wnckWidth, wnckHeight)
if pixBuff:
self._img = [self._pixbuf2image(pixBuff)]
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Window screenshot captured", True)
return True
else:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Failed to capture window screenshot", True)
return False
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error taking window screenshot: {e}", True)
return False
def _screenShotDesktop(self):
"""Take screenshot of entire desktop."""
if not GTK_AVAILABLE:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for screenshots", True)
return False
try:
time.sleep(0.3) # Brief delay
currDesktop = Gdk.get_default_root_window()
pixBuff = Gdk.pixbuf_get_from_window(currDesktop, 0, 0, currDesktop.get_width(), currDesktop.get_height())
if pixBuff:
self._img = [self._pixbuf2image(pixBuff)]
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Desktop screenshot captured", True)
return True
else:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Failed to capture desktop screenshot", True)
return False
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error taking desktop screenshot: {e}", True)
return False
def _readClipboard(self):
"""Read image from clipboard."""
if not GTK_AVAILABLE:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: GTK not available for clipboard", True)
return False
try:
clipboardObj = Gtk.Clipboard.get(Gdk.SELECTION_CLIPBOARD)
pixBuff = clipboardObj.wait_for_image()
if pixBuff:
self._img = [self._pixbuf2image(pixBuff)]
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Image read from clipboard", True)
return True
else:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: No image found in clipboard", True)
return False
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error reading clipboard: {e}", True)
return False
def _pixbuf2image(self, pix):
"""Convert GdkPixbuf to PIL Image."""
data = pix.get_pixels()
w = pix.props.width
h = pix.props.height
stride = pix.props.rowstride
mode = "RGB"
if pix.props.has_alpha:
mode = "RGBA"
im = Image.frombytes(mode, (w, h), data, "raw", mode, stride)
return im
def _scaleImg(self, img):
"""Scale image for better OCR results."""
width_screen, height_screen = img.size
width_screen = width_screen * self._scaleFactor
height_screen = height_screen * self._scaleFactor
scaledImg = img.resize((width_screen, height_screen), Image.Resampling.BICUBIC)
return scaledImg
def _transformImg(self, img):
"""Transform image with various filters for better OCR."""
modifiedImg = self._scaleImg(img)
if self._invertImg:
modifiedImg = ImageOps.invert(modifiedImg)
if self._grayscaleImg:
modifiedImg = ImageOps.grayscale(modifiedImg)
if self._blackWhiteImg:
lut = [255 if v > self._blackWhiteImgValue else 0 for v in range(256)]
modifiedImg = modifiedImg.point(lut)
return modifiedImg
def _performOCR(self):
"""Perform OCR on captured images."""
if not PYTESSERACT_AVAILABLE:
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Tesseract not available", True)
return
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: Starting OCR", True)
self._OCRText = ''
for img in self._img:
modifiedImg = self._transformImg(img)
try:
# Simple text extraction
text = pytesseract.image_to_string(modifiedImg, lang=self._languageCode, config='--psm 4')
self._OCRText += text + '\n'
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: OCR error: {e}", True)
# Clean up text
self._cleanOCRText()
debug.printMessage(debug.LEVEL_INFO, "OCRDesktop: OCR completed", True)
def _cleanOCRText(self):
"""Clean up OCR text output."""
# Remove multiple spaces
regexSpace = re.compile('[^\S\r\n]{2,}')
self._OCRText = regexSpace.sub(' ', self._OCRText)
# Remove empty lines
regexSpace = re.compile('\n\s*\n')
self._OCRText = regexSpace.sub('\n', self._OCRText)
# Remove trailing spaces
regexSpace = re.compile('\s*\n')
self._OCRText = regexSpace.sub('\n', self._OCRText)
# Remove leading spaces
regexSpace = re.compile('^\s')
self._OCRText = regexSpace.sub('', self._OCRText)
# Remove trailing newlines
self._OCRText = self._OCRText.strip()
def _presentOCRResult(self):
"""Present OCR result to user via speech."""
try:
if not self._OCRText.strip():
message = "No text found in OCR scan"
else:
message = f"OCR result: {self._OCRText}"
if self.app:
state = self.app.getDynamicApiManager().getAPI('CthulhuState')
if state and state.activeScript:
state.activeScript.presentMessage(message, resetStyles=False)
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Presented result: {len(self._OCRText)} characters", True)
except Exception as e:
debug.printMessage(debug.LEVEL_INFO, f"OCRDesktop: Error presenting result: {e}", True)

View File

@@ -5,6 +5,7 @@ subdir('Clipboard')
subdir('DisplayVersion')
subdir('HelloCthulhu')
subdir('IndentationAudio')
subdir('OCR')
subdir('PluginManager')
subdir('SimplePluginSystem')
subdir('hello_world')

View File

@@ -431,7 +431,7 @@ presentChatRoomLast = False
presentLiveRegionFromInactiveTab = False
# Plugins
activePlugins = ['AIAssistant', 'DisplayVersion', 'PluginManager', 'HelloCthulhu', 'ByeCthulhu']
activePlugins = ['AIAssistant', 'DisplayVersion', 'OCR', 'PluginManager', 'HelloCthulhu', 'ByeCthulhu']
# AI Assistant settings (disabled by default for opt-in behavior)
aiAssistantEnabled = True