AI capabilities added. Working 90 percent with ollama, more providers and functionality coming soon.

This commit is contained in:
Storm Dragon
2025-08-03 00:07:59 -04:00
parent 9ead764b2e
commit a8672165d8
14 changed files with 1893 additions and 35 deletions

411
CLAUDE.md
View File

@@ -335,10 +335,417 @@ subprojects/spiel.wrap # Subproject integration
3. **Plugin System**: How to maintain Cthulhu's plugin advantage while integrating Orca improvements?
4. **Version Strategy**: Selective feature backporting vs. major version sync?
## AI Assistant Integration
### **NEW FEATURE**: AI-Powered Accessibility Assistant
Cthulhu now includes an optional AI assistant plugin for enhanced accessibility support:
- **Vision Analysis**: Screenshots + AT-SPI data for understanding unlabeled UI elements
- **Safe Actions**: Confirmed element clicking and navigation assistance
- **Multi-Provider Support**: Claude, ChatGPT, Gemini, and Ollama backends
- **Privacy-First**: Disabled by default, requires explicit opt-in and API key configuration
### AI Assistant Configuration
```bash
# Access via Cthulhu Preferences
~/.local/bin/cthulhu -s # Opens preferences dialog
# Navigate to "AI Assistant" tab
# 1. Check "Enable AI Assistant"
# 2. Select provider (Claude, ChatGPT, Gemini, Ollama)
# 3. Set API key file path
# 4. Configure safety and quality settings
```
### AI Provider Setup
#### 1. Claude (Anthropic) - **Recommended**
```bash
# Get API key from: https://console.anthropic.com/
# 1. Sign up/login → "Get API Keys" → Create new key
# 2. Copy the key (starts with "sk-ant-...")
# 3. Save to file:
mkdir -p ~/.config/cthulhu
echo "sk-ant-your-actual-key-here" > ~/.config/cthulhu/claude-api-key
chmod 600 ~/.config/cthulhu/claude-api-key
# Pricing: ~$3 per million input tokens, ~$15 per million output tokens
# Best vision capabilities and safety for accessibility use
```
#### 2. ChatGPT (OpenAI)
```bash
# Get API key from: https://platform.openai.com/api-keys
# 1. Sign up/login → "Create new secret key"
# 2. Copy immediately (can't view again, starts with "sk-...")
# 3. Save to file:
mkdir -p ~/.config/cthulhu
echo "sk-your-actual-openai-key" > ~/.config/cthulhu/openai-api-key
chmod 600 ~/.config/cthulhu/openai-api-key
# Pricing: ~$2.50 per million input tokens, ~$10 per million output tokens
# Good vision capabilities, widely supported
```
#### 3. Gemini (Google)
```bash
# Get API key from: https://aistudio.google.com/app/apikey
# 1. Sign up/login → "Create API key"
# 2. Copy the generated key
# 3. Save to file:
mkdir -p ~/.config/cthulhu
echo "your-actual-gemini-key" > ~/.config/cthulhu/gemini-api-key
chmod 600 ~/.config/cthulhu/gemini-api-key
# Pricing: Free tier (15 requests/min), then ~$1.25 per million tokens
# Good for testing, has generous free allowance
```
#### 4. Ollama (Local) - **Privacy-Focused**
```bash
# Install Ollama (no API key needed!)
sudo pacman -S ollama # Arch Linux
# OR: curl -fsSL https://ollama.ai/install.sh | sh
# Start service
systemctl --user enable ollama
systemctl --user start ollama
# Download vision-capable model (required for AI assistant)
ollama pull llama3.2-vision # 7.9GB download
# OR smaller model: ollama pull moondream # 1.7GB
# Verify installation
ollama list # Should show downloaded models
# No API key needed - runs entirely offline!
# Free to use, privacy-focused, but slower than cloud providers
```
### AI Assistant Usage Patterns
- **Information Queries**: "What does this unlabeled button do?"
- **Navigation Help**: "Where is the login form?"
- **Action Assistance**: "Click the submit button" (with confirmation)
- **Layout Understanding**: "Describe the main sections of this page"
### Safety Framework
- **Confirmation Required**: All actions require user approval by default
- **Action Descriptions**: Clear explanation of what will happen
- **Safe Defaults**: Conservative timeouts and quality settings
- **Privacy Protection**: API keys stored securely, no data logging
### Troubleshooting AI Assistant Setup
#### Common Issues
```bash
# Check if AI settings loaded correctly
~/.local/bin/cthulhu -s # Open preferences, check AI Assistant tab
# Verify API key file permissions and format
ls -la ~/.config/cthulhu/*-api-key # Should show 600 permissions
cat ~/.config/cthulhu/claude-api-key # Should contain only the API key
# Test Ollama connection
curl http://localhost:11434/api/version # Should return Ollama version
ollama ps # Should show running models
# Check dependencies
python3 -c "import requests, PIL; print('Dependencies OK')"
# Test screenshot capability (requires X11/Wayland)
python3 -c "
from gi.repository import Gdk
window = Gdk.get_default_root_window()
print('Screenshot capability available')
"
```
#### Required Permissions
- **File Access**: API key files in `~/.config/cthulhu/`
- **Screen Access**: Screenshot capture (automatic on most setups)
- **Network Access**: HTTP requests to AI providers (except Ollama)
- **AT-SPI Access**: Accessibility tree traversal (enabled by default)
## Cthulhu Plugin System - Developer Reference
### **Plugin Architecture Overview**
Cthulhu uses a **pluggy-based plugin system** with the following components:
1. **Plugin Manager**: `src/cthulhu/plugin_system_manager.py` - Central plugin loading/management
2. **Base Plugin Class**: `src/cthulhu/plugin.py` - Provides common functionality
3. **Hook System**: Uses `@cthulhu_hookimpl` decorators for lifecycle management
4. **Plugin Discovery**: Automatic scanning of `src/cthulhu/plugins/` and `~/.local/share/cthulhu/plugins/`
### **Plugin Directory Structure**
Every plugin must follow this exact structure:
```
src/cthulhu/plugins/YourPlugin/
├── __init__.py # Import: from .plugin import YourPlugin
├── plugin.py # Main plugin class
├── plugin.info # Metadata (name, version, description)
└── Makefile.am # Build system integration
```
### **Essential Plugin Files**
#### **`__init__.py`** - Package Import
```python
from .plugin import YourPlugin
```
#### **`plugin.info`** - Metadata
```ini
name = Your Plugin Name
version = 1.0.0
description = What your plugin does
authors = Your Name <email@example.com>
website = https://example.com
copyright = Copyright 2025
builtin = false
hidden = false
```
#### **`Makefile.am`** - Build Integration
```makefile
cthulhu_python_PYTHON = \
__init__.py \
plugin.info \
plugin.py
cthulhu_pythondir=$(pkgpythondir)/plugins/YourPlugin
```
### **Plugin Class Template**
```python
#!/usr/bin/env python3
import logging
from cthulhu.plugin import Plugin, cthulhu_hookimpl
logger = logging.getLogger(__name__)
class YourPlugin(Plugin):
"""Your plugin description."""
def __init__(self, *args, **kwargs):
"""Initialize the plugin."""
super().__init__(*args, **kwargs)
logger.info("YourPlugin initialized")
# Keybinding storage - use individual variables, NOT dictionaries
self._kb_binding = None
@cthulhu_hookimpl
def activate(self, plugin=None):
"""Activate the plugin."""
if plugin is not None and plugin is not self:
return
try:
logger.info("=== YourPlugin activation starting ===")
# Register keybindings
self._register_keybinding()
logger.info("YourPlugin activated successfully")
return True
except Exception as e:
logger.error(f"Error activating YourPlugin: {e}")
return False
@cthulhu_hookimpl
def deactivate(self, plugin=None):
"""Deactivate the plugin."""
if plugin is not None and plugin is not self:
return
logger.info("Deactivating YourPlugin")
self._kb_binding = None
return True
def _register_keybinding(self):
"""Register plugin keybindings."""
try:
# CRITICAL: Use this exact parameter order!
self._kb_binding = self.registerGestureByString(
self._your_handler_method, # Handler method (first)
"Description of action", # Description (second)
'kb:cthulhu+your+keys' # Gesture string (third)
)
if self._kb_binding:
logger.info(f"Registered keybinding: {gesture_string}")
else:
logger.error(f"Failed to register keybinding")
except Exception as e:
logger.error(f"Error registering keybinding: {e}")
def _your_handler_method(self, script=None, inputEvent=None):
"""Handle the keybinding activation."""
try:
logger.info("Keybinding triggered")
# Your plugin logic here
return True
except Exception as e:
logger.error(f"Error in handler: {e}")
return False
```
### **🚨 CRITICAL Keybinding Patterns**
#### **✅ CORRECT Pattern (What Works)**
```python
# Individual binding storage (NOT dictionaries)
self._kb_binding = None
self._kb_binding_action1 = None
self._kb_binding_action2 = None
# Correct registerGestureByString parameter order
self._kb_binding = self.registerGestureByString(
self._handler_method, # 1st: Handler method
"Action description", # 2nd: Description
'kb:cthulhu+your+keys' # 3rd: Gesture string
)
```
#### **❌ INCORRECT Patterns (What Fails)**
```python
# DON'T use dictionaries for keybinding storage
self._kb_bindings = {} # ❌ WRONG
self._kb_bindings['action'] = self.registerGestureByString(...) # ❌ WRONG
# DON'T use wrong parameter order
self.registerGestureByString(
'kb:cthulhu+keys', # ❌ WRONG ORDER
"Description",
self._handler_method
)
# DON'T use description as handler parameter
self.registerGestureByString(
self._handler_method,
'kb:cthulhu+keys', # ❌ WRONG ORDER
"Description"
)
```
### **Plugin Registration & Activation**
#### **Add to Build System**
1. **Add to `src/cthulhu/plugins/Makefile.am`**:
```makefile
SUBDIRS = YourPlugin OtherPlugin1 OtherPlugin2 ...
```
2. **Add to `configure.ac`**:
```
src/cthulhu/plugins/YourPlugin/Makefile
```
#### **Add to Default Active Plugins**
In `src/cthulhu/settings.py`:
```python
activePlugins = ['YourPlugin', 'DisplayVersion', 'PluginManager', ...]
```
### **Plugin Lifecycle Events**
1. **`__init__`**: Plugin instance created
2. **`activate`**: Plugin enabled (register keybindings, connect events)
3. **`deactivate`**: Plugin disabled (cleanup, disconnect)
**Note**: `activate()` may be called multiple times for different script contexts.
### **Common Plugin Patterns**
#### **Settings Integration**
```python
from cthulhu import settings_manager
class YourPlugin(Plugin):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._settings_manager = settings_manager.getManager()
def activate(self, plugin=None):
# Check if plugin should be active
enabled = self._settings_manager.getSetting('yourPluginEnabled')
if not enabled:
return
```
#### **Message Presentation**
```python
def _present_message(self, message):
"""Present a message to the user via speech."""
try:
if self.app:
state = self.app.getDynamicApiManager().getAPI('CthulhuState')
if state and state.activeScript:
state.activeScript.presentMessage(message, resetStyles=False)
except Exception as e:
logger.error(f"Error presenting message: {e}")
```
#### **Sound Generation**
```python
from cthulhu import sound
from cthulhu.sound_generator import Tone
def _play_sound(self):
player = sound.getPlayer()
tone = Tone(duration=0.15, frequency=400, volumeMultiplier=0.7)
player.play(tone, interrupt=False)
```
### **Debugging Plugin Issues**
#### **Common Debug Techniques**
1. **Add debug output to both logger and print**:
```python
logger.info("Plugin message")
print("DEBUG: Plugin message") # Shows in terminal
```
2. **Check plugin loading**:
```python
# In __init__
with open('/tmp/your_plugin_debug.log', 'a') as f:
f.write("Plugin loaded\n")
```
3. **Verify keybinding registration**:
```python
if self._kb_binding:
print(f"DEBUG: Keybinding registered: {self._kb_binding}")
else:
print("DEBUG: Keybinding registration FAILED")
```
#### **Common Issues & Solutions**
| Issue | Symptom | Solution |
|-------|---------|----------|
| Plugin not loading | No __init__ debug output | Check `activePlugins` list |
| Keybindings not working | "stored for later registration" | Use correct parameter order |
| Import errors | Plugin fails to activate | Check module imports and dependencies |
| Settings not loading | Default values used | Verify settings key names |
### **Working Plugin Examples**
- **`DisplayVersion`**: Simple keybinding + message
- **`PluginManager`**: GUI dialog + settings management
- **`IndentationAudio`**: Event listening + sound generation
- **`AIAssistant`**: Complex settings + multi-keybinding + external APIs
## D-Bus Remote Controller Integration
### **NEW FEATURE**: D-Bus Service for Remote Control
Cthulhu now includes a D-Bus service (ported from Orca v49.alpha) for external control and automation:
### **EXISTING FEATURE**: D-Bus Service for Remote Control
Cthulhu includes a D-Bus service (ported from Orca v49.alpha) for external control and automation:
- **Service Name**: `org.stormux.Cthulhu.Service`
- **Object Path**: `/org/stormux/Cthulhu/Service`