AI capabilities added. Working 90 percent with ollama, more providers and functionality coming soon.
This commit is contained in:
411
CLAUDE.md
411
CLAUDE.md
@@ -335,10 +335,417 @@ subprojects/spiel.wrap # Subproject integration
|
||||
3. **Plugin System**: How to maintain Cthulhu's plugin advantage while integrating Orca improvements?
|
||||
4. **Version Strategy**: Selective feature backporting vs. major version sync?
|
||||
|
||||
## AI Assistant Integration
|
||||
|
||||
### **NEW FEATURE**: AI-Powered Accessibility Assistant
|
||||
Cthulhu now includes an optional AI assistant plugin for enhanced accessibility support:
|
||||
|
||||
- **Vision Analysis**: Screenshots + AT-SPI data for understanding unlabeled UI elements
|
||||
- **Safe Actions**: Confirmed element clicking and navigation assistance
|
||||
- **Multi-Provider Support**: Claude, ChatGPT, Gemini, and Ollama backends
|
||||
- **Privacy-First**: Disabled by default, requires explicit opt-in and API key configuration
|
||||
|
||||
### AI Assistant Configuration
|
||||
```bash
|
||||
# Access via Cthulhu Preferences
|
||||
~/.local/bin/cthulhu -s # Opens preferences dialog
|
||||
# Navigate to "AI Assistant" tab
|
||||
# 1. Check "Enable AI Assistant"
|
||||
# 2. Select provider (Claude, ChatGPT, Gemini, Ollama)
|
||||
# 3. Set API key file path
|
||||
# 4. Configure safety and quality settings
|
||||
```
|
||||
|
||||
### AI Provider Setup
|
||||
|
||||
#### 1. Claude (Anthropic) - **Recommended**
|
||||
```bash
|
||||
# Get API key from: https://console.anthropic.com/
|
||||
# 1. Sign up/login → "Get API Keys" → Create new key
|
||||
# 2. Copy the key (starts with "sk-ant-...")
|
||||
# 3. Save to file:
|
||||
mkdir -p ~/.config/cthulhu
|
||||
echo "sk-ant-your-actual-key-here" > ~/.config/cthulhu/claude-api-key
|
||||
chmod 600 ~/.config/cthulhu/claude-api-key
|
||||
|
||||
# Pricing: ~$3 per million input tokens, ~$15 per million output tokens
|
||||
# Best vision capabilities and safety for accessibility use
|
||||
```
|
||||
|
||||
#### 2. ChatGPT (OpenAI)
|
||||
```bash
|
||||
# Get API key from: https://platform.openai.com/api-keys
|
||||
# 1. Sign up/login → "Create new secret key"
|
||||
# 2. Copy immediately (can't view again, starts with "sk-...")
|
||||
# 3. Save to file:
|
||||
mkdir -p ~/.config/cthulhu
|
||||
echo "sk-your-actual-openai-key" > ~/.config/cthulhu/openai-api-key
|
||||
chmod 600 ~/.config/cthulhu/openai-api-key
|
||||
|
||||
# Pricing: ~$2.50 per million input tokens, ~$10 per million output tokens
|
||||
# Good vision capabilities, widely supported
|
||||
```
|
||||
|
||||
#### 3. Gemini (Google)
|
||||
```bash
|
||||
# Get API key from: https://aistudio.google.com/app/apikey
|
||||
# 1. Sign up/login → "Create API key"
|
||||
# 2. Copy the generated key
|
||||
# 3. Save to file:
|
||||
mkdir -p ~/.config/cthulhu
|
||||
echo "your-actual-gemini-key" > ~/.config/cthulhu/gemini-api-key
|
||||
chmod 600 ~/.config/cthulhu/gemini-api-key
|
||||
|
||||
# Pricing: Free tier (15 requests/min), then ~$1.25 per million tokens
|
||||
# Good for testing, has generous free allowance
|
||||
```
|
||||
|
||||
#### 4. Ollama (Local) - **Privacy-Focused**
|
||||
```bash
|
||||
# Install Ollama (no API key needed!)
|
||||
sudo pacman -S ollama # Arch Linux
|
||||
# OR: curl -fsSL https://ollama.ai/install.sh | sh
|
||||
|
||||
# Start service
|
||||
systemctl --user enable ollama
|
||||
systemctl --user start ollama
|
||||
|
||||
# Download vision-capable model (required for AI assistant)
|
||||
ollama pull llama3.2-vision # 7.9GB download
|
||||
# OR smaller model: ollama pull moondream # 1.7GB
|
||||
|
||||
# Verify installation
|
||||
ollama list # Should show downloaded models
|
||||
|
||||
# No API key needed - runs entirely offline!
|
||||
# Free to use, privacy-focused, but slower than cloud providers
|
||||
```
|
||||
|
||||
### AI Assistant Usage Patterns
|
||||
- **Information Queries**: "What does this unlabeled button do?"
|
||||
- **Navigation Help**: "Where is the login form?"
|
||||
- **Action Assistance**: "Click the submit button" (with confirmation)
|
||||
- **Layout Understanding**: "Describe the main sections of this page"
|
||||
|
||||
### Safety Framework
|
||||
- **Confirmation Required**: All actions require user approval by default
|
||||
- **Action Descriptions**: Clear explanation of what will happen
|
||||
- **Safe Defaults**: Conservative timeouts and quality settings
|
||||
- **Privacy Protection**: API keys stored securely, no data logging
|
||||
|
||||
### Troubleshooting AI Assistant Setup
|
||||
|
||||
#### Common Issues
|
||||
```bash
|
||||
# Check if AI settings loaded correctly
|
||||
~/.local/bin/cthulhu -s # Open preferences, check AI Assistant tab
|
||||
|
||||
# Verify API key file permissions and format
|
||||
ls -la ~/.config/cthulhu/*-api-key # Should show 600 permissions
|
||||
cat ~/.config/cthulhu/claude-api-key # Should contain only the API key
|
||||
|
||||
# Test Ollama connection
|
||||
curl http://localhost:11434/api/version # Should return Ollama version
|
||||
ollama ps # Should show running models
|
||||
|
||||
# Check dependencies
|
||||
python3 -c "import requests, PIL; print('Dependencies OK')"
|
||||
|
||||
# Test screenshot capability (requires X11/Wayland)
|
||||
python3 -c "
|
||||
from gi.repository import Gdk
|
||||
window = Gdk.get_default_root_window()
|
||||
print('Screenshot capability available')
|
||||
"
|
||||
```
|
||||
|
||||
#### Required Permissions
|
||||
- **File Access**: API key files in `~/.config/cthulhu/`
|
||||
- **Screen Access**: Screenshot capture (automatic on most setups)
|
||||
- **Network Access**: HTTP requests to AI providers (except Ollama)
|
||||
- **AT-SPI Access**: Accessibility tree traversal (enabled by default)
|
||||
|
||||
## Cthulhu Plugin System - Developer Reference
|
||||
|
||||
### **Plugin Architecture Overview**
|
||||
|
||||
Cthulhu uses a **pluggy-based plugin system** with the following components:
|
||||
|
||||
1. **Plugin Manager**: `src/cthulhu/plugin_system_manager.py` - Central plugin loading/management
|
||||
2. **Base Plugin Class**: `src/cthulhu/plugin.py` - Provides common functionality
|
||||
3. **Hook System**: Uses `@cthulhu_hookimpl` decorators for lifecycle management
|
||||
4. **Plugin Discovery**: Automatic scanning of `src/cthulhu/plugins/` and `~/.local/share/cthulhu/plugins/`
|
||||
|
||||
### **Plugin Directory Structure**
|
||||
|
||||
Every plugin must follow this exact structure:
|
||||
```
|
||||
src/cthulhu/plugins/YourPlugin/
|
||||
├── __init__.py # Import: from .plugin import YourPlugin
|
||||
├── plugin.py # Main plugin class
|
||||
├── plugin.info # Metadata (name, version, description)
|
||||
└── Makefile.am # Build system integration
|
||||
```
|
||||
|
||||
### **Essential Plugin Files**
|
||||
|
||||
#### **`__init__.py`** - Package Import
|
||||
```python
|
||||
from .plugin import YourPlugin
|
||||
```
|
||||
|
||||
#### **`plugin.info`** - Metadata
|
||||
```ini
|
||||
name = Your Plugin Name
|
||||
version = 1.0.0
|
||||
description = What your plugin does
|
||||
authors = Your Name <email@example.com>
|
||||
website = https://example.com
|
||||
copyright = Copyright 2025
|
||||
builtin = false
|
||||
hidden = false
|
||||
```
|
||||
|
||||
#### **`Makefile.am`** - Build Integration
|
||||
```makefile
|
||||
cthulhu_python_PYTHON = \
|
||||
__init__.py \
|
||||
plugin.info \
|
||||
plugin.py
|
||||
|
||||
cthulhu_pythondir=$(pkgpythondir)/plugins/YourPlugin
|
||||
```
|
||||
|
||||
### **Plugin Class Template**
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
import logging
|
||||
from cthulhu.plugin import Plugin, cthulhu_hookimpl
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class YourPlugin(Plugin):
|
||||
"""Your plugin description."""
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
"""Initialize the plugin."""
|
||||
super().__init__(*args, **kwargs)
|
||||
logger.info("YourPlugin initialized")
|
||||
|
||||
# Keybinding storage - use individual variables, NOT dictionaries
|
||||
self._kb_binding = None
|
||||
|
||||
@cthulhu_hookimpl
|
||||
def activate(self, plugin=None):
|
||||
"""Activate the plugin."""
|
||||
if plugin is not None and plugin is not self:
|
||||
return
|
||||
|
||||
try:
|
||||
logger.info("=== YourPlugin activation starting ===")
|
||||
|
||||
# Register keybindings
|
||||
self._register_keybinding()
|
||||
|
||||
logger.info("YourPlugin activated successfully")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error activating YourPlugin: {e}")
|
||||
return False
|
||||
|
||||
@cthulhu_hookimpl
|
||||
def deactivate(self, plugin=None):
|
||||
"""Deactivate the plugin."""
|
||||
if plugin is not None and plugin is not self:
|
||||
return
|
||||
|
||||
logger.info("Deactivating YourPlugin")
|
||||
self._kb_binding = None
|
||||
return True
|
||||
|
||||
def _register_keybinding(self):
|
||||
"""Register plugin keybindings."""
|
||||
try:
|
||||
# CRITICAL: Use this exact parameter order!
|
||||
self._kb_binding = self.registerGestureByString(
|
||||
self._your_handler_method, # Handler method (first)
|
||||
"Description of action", # Description (second)
|
||||
'kb:cthulhu+your+keys' # Gesture string (third)
|
||||
)
|
||||
|
||||
if self._kb_binding:
|
||||
logger.info(f"Registered keybinding: {gesture_string}")
|
||||
else:
|
||||
logger.error(f"Failed to register keybinding")
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error registering keybinding: {e}")
|
||||
|
||||
def _your_handler_method(self, script=None, inputEvent=None):
|
||||
"""Handle the keybinding activation."""
|
||||
try:
|
||||
logger.info("Keybinding triggered")
|
||||
|
||||
# Your plugin logic here
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"Error in handler: {e}")
|
||||
return False
|
||||
```
|
||||
|
||||
### **🚨 CRITICAL Keybinding Patterns**
|
||||
|
||||
#### **✅ CORRECT Pattern (What Works)**
|
||||
```python
|
||||
# Individual binding storage (NOT dictionaries)
|
||||
self._kb_binding = None
|
||||
self._kb_binding_action1 = None
|
||||
self._kb_binding_action2 = None
|
||||
|
||||
# Correct registerGestureByString parameter order
|
||||
self._kb_binding = self.registerGestureByString(
|
||||
self._handler_method, # 1st: Handler method
|
||||
"Action description", # 2nd: Description
|
||||
'kb:cthulhu+your+keys' # 3rd: Gesture string
|
||||
)
|
||||
```
|
||||
|
||||
#### **❌ INCORRECT Patterns (What Fails)**
|
||||
```python
|
||||
# DON'T use dictionaries for keybinding storage
|
||||
self._kb_bindings = {} # ❌ WRONG
|
||||
self._kb_bindings['action'] = self.registerGestureByString(...) # ❌ WRONG
|
||||
|
||||
# DON'T use wrong parameter order
|
||||
self.registerGestureByString(
|
||||
'kb:cthulhu+keys', # ❌ WRONG ORDER
|
||||
"Description",
|
||||
self._handler_method
|
||||
)
|
||||
|
||||
# DON'T use description as handler parameter
|
||||
self.registerGestureByString(
|
||||
self._handler_method,
|
||||
'kb:cthulhu+keys', # ❌ WRONG ORDER
|
||||
"Description"
|
||||
)
|
||||
```
|
||||
|
||||
### **Plugin Registration & Activation**
|
||||
|
||||
#### **Add to Build System**
|
||||
1. **Add to `src/cthulhu/plugins/Makefile.am`**:
|
||||
```makefile
|
||||
SUBDIRS = YourPlugin OtherPlugin1 OtherPlugin2 ...
|
||||
```
|
||||
|
||||
2. **Add to `configure.ac`**:
|
||||
```
|
||||
src/cthulhu/plugins/YourPlugin/Makefile
|
||||
```
|
||||
|
||||
#### **Add to Default Active Plugins**
|
||||
In `src/cthulhu/settings.py`:
|
||||
```python
|
||||
activePlugins = ['YourPlugin', 'DisplayVersion', 'PluginManager', ...]
|
||||
```
|
||||
|
||||
### **Plugin Lifecycle Events**
|
||||
|
||||
1. **`__init__`**: Plugin instance created
|
||||
2. **`activate`**: Plugin enabled (register keybindings, connect events)
|
||||
3. **`deactivate`**: Plugin disabled (cleanup, disconnect)
|
||||
|
||||
**Note**: `activate()` may be called multiple times for different script contexts.
|
||||
|
||||
### **Common Plugin Patterns**
|
||||
|
||||
#### **Settings Integration**
|
||||
```python
|
||||
from cthulhu import settings_manager
|
||||
|
||||
class YourPlugin(Plugin):
|
||||
def __init__(self, *args, **kwargs):
|
||||
super().__init__(*args, **kwargs)
|
||||
self._settings_manager = settings_manager.getManager()
|
||||
|
||||
def activate(self, plugin=None):
|
||||
# Check if plugin should be active
|
||||
enabled = self._settings_manager.getSetting('yourPluginEnabled')
|
||||
if not enabled:
|
||||
return
|
||||
```
|
||||
|
||||
#### **Message Presentation**
|
||||
```python
|
||||
def _present_message(self, message):
|
||||
"""Present a message to the user via speech."""
|
||||
try:
|
||||
if self.app:
|
||||
state = self.app.getDynamicApiManager().getAPI('CthulhuState')
|
||||
if state and state.activeScript:
|
||||
state.activeScript.presentMessage(message, resetStyles=False)
|
||||
except Exception as e:
|
||||
logger.error(f"Error presenting message: {e}")
|
||||
```
|
||||
|
||||
#### **Sound Generation**
|
||||
```python
|
||||
from cthulhu import sound
|
||||
from cthulhu.sound_generator import Tone
|
||||
|
||||
def _play_sound(self):
|
||||
player = sound.getPlayer()
|
||||
tone = Tone(duration=0.15, frequency=400, volumeMultiplier=0.7)
|
||||
player.play(tone, interrupt=False)
|
||||
```
|
||||
|
||||
### **Debugging Plugin Issues**
|
||||
|
||||
#### **Common Debug Techniques**
|
||||
1. **Add debug output to both logger and print**:
|
||||
```python
|
||||
logger.info("Plugin message")
|
||||
print("DEBUG: Plugin message") # Shows in terminal
|
||||
```
|
||||
|
||||
2. **Check plugin loading**:
|
||||
```python
|
||||
# In __init__
|
||||
with open('/tmp/your_plugin_debug.log', 'a') as f:
|
||||
f.write("Plugin loaded\n")
|
||||
```
|
||||
|
||||
3. **Verify keybinding registration**:
|
||||
```python
|
||||
if self._kb_binding:
|
||||
print(f"DEBUG: Keybinding registered: {self._kb_binding}")
|
||||
else:
|
||||
print("DEBUG: Keybinding registration FAILED")
|
||||
```
|
||||
|
||||
#### **Common Issues & Solutions**
|
||||
|
||||
| Issue | Symptom | Solution |
|
||||
|-------|---------|----------|
|
||||
| Plugin not loading | No __init__ debug output | Check `activePlugins` list |
|
||||
| Keybindings not working | "stored for later registration" | Use correct parameter order |
|
||||
| Import errors | Plugin fails to activate | Check module imports and dependencies |
|
||||
| Settings not loading | Default values used | Verify settings key names |
|
||||
|
||||
### **Working Plugin Examples**
|
||||
- **`DisplayVersion`**: Simple keybinding + message
|
||||
- **`PluginManager`**: GUI dialog + settings management
|
||||
- **`IndentationAudio`**: Event listening + sound generation
|
||||
- **`AIAssistant`**: Complex settings + multi-keybinding + external APIs
|
||||
|
||||
## D-Bus Remote Controller Integration
|
||||
|
||||
### **NEW FEATURE**: D-Bus Service for Remote Control
|
||||
Cthulhu now includes a D-Bus service (ported from Orca v49.alpha) for external control and automation:
|
||||
### **EXISTING FEATURE**: D-Bus Service for Remote Control
|
||||
Cthulhu includes a D-Bus service (ported from Orca v49.alpha) for external control and automation:
|
||||
|
||||
- **Service Name**: `org.stormux.Cthulhu.Service`
|
||||
- **Object Path**: `/org/stormux/Cthulhu/Service`
|
||||
|
Reference in New Issue
Block a user