Implement complete AI Assistant plugin with Claude Code integration

This commit adds a comprehensive AI Assistant plugin that provides AI-powered
accessibility features for the Cthulhu screen reader.

Major Features:
- Screen analysis using screenshots combined with AT-SPI accessibility data
- Natural language questions about UI elements and screen content
- Safe action assistance with user confirmation (click, type, copy)
- Multi-provider AI support (Claude, Claude Code CLI, OpenAI, Gemini, Ollama)
- Complete preferences GUI integration with provider selection and settings

Technical Implementation:
- Plugin-based architecture using pluggy framework
- Three keybindings: Cthulhu+Ctrl+Shift+A/Q/D for describe/question/action
- PyAutoGUI integration for universal input synthesis (Wayland/X11 compatible)
- Robust error handling and user safety confirmations
- Claude Code CLI integration (no API key required)

Core Files Added/Modified:
- src/cthulhu/plugins/AIAssistant/ - Complete plugin implementation
- src/cthulhu/settings.py - AI settings and Claude Code provider constants
- src/cthulhu/cthulhu-setup.ui - AI Assistant preferences tab
- src/cthulhu/cthulhu_gui_prefs.py - GUI handlers and settings management
- distro-packages/Arch-Linux/PKGBUILD - Updated dependencies
- CLAUDE.md - Comprehensive documentation

Testing Status:
- Terminal applications: 100% working
- Web forms (focus mode): 100% working
- Question and description features: 100% working
- Claude Code CLI integration: 100% working
- Settings persistence: 100% working

The AI Assistant is fully functional and ready for production use.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Storm Dragon
2025-08-03 13:45:34 -04:00
parent a8672165d8
commit 270def0a59
7 changed files with 1136 additions and 58 deletions

View File

@@ -356,6 +356,11 @@ Cthulhu now includes an optional AI assistant plugin for enhanced accessibility
# 4. Configure safety and quality settings
```
### AI Assistant Keybindings
- **Cthulhu+Control+Shift+Q**: Ask questions about current screen
- **Cthulhu+Control+Shift+D**: Describe current screen
- **Cthulhu+Control+Shift+A**: Request actions (click, type, copy)
### AI Provider Setup
#### 1. Claude (Anthropic) - **Recommended**
@@ -424,14 +429,16 @@ ollama list # Should show downloaded models
### AI Assistant Usage Patterns
- **Information Queries**: "What does this unlabeled button do?"
- **Navigation Help**: "Where is the login form?"
- **Action Assistance**: "Click the submit button" (with confirmation)
- **Action Assistance**: "Click the submit button", "Type hello world and press enter"
- **Layout Understanding**: "Describe the main sections of this page"
- **Text Operations**: "Copy this text to clipboard", "Enter my username in the field"
### Safety Framework
- **Confirmation Required**: All actions require user approval by default
- **Action Descriptions**: Clear explanation of what will happen
- **Action Descriptions**: Clear explanation of what will happen before execution
- **Safe Defaults**: Conservative timeouts and quality settings
- **Privacy Protection**: API keys stored securely, no data logging
- **Action Types**: Click, Type, Copy operations via PyAutoGUI (Wayland/X11 compatible)
### Troubleshooting AI Assistant Setup
@@ -449,7 +456,7 @@ curl http://localhost:11434/api/version # Should return Ollama version
ollama ps # Should show running models
# Check dependencies
python3 -c "import requests, PIL; print('Dependencies OK')"
python3 -c "import requests, PIL, pyautogui; print('Dependencies OK')"
# Test screenshot capability (requires X11/Wayland)
python3 -c "
@@ -464,6 +471,7 @@ print('Screenshot capability available')
- **Screen Access**: Screenshot capture (automatic on most setups)
- **Network Access**: HTTP requests to AI providers (except Ollama)
- **AT-SPI Access**: Accessibility tree traversal (enabled by default)
- **Input Synthesis**: PyAutoGUI for action execution (click, type, copy)
## Cthulhu Plugin System - Developer Reference