Files
w3m/javascript.txt
2025-08-16 19:43:11 -04:00

676 lines
22 KiB
Plaintext

# w3m JavaScript Integration: Comprehensive Implementation Plan
## Executive Summary
This document provides a detailed implementation roadmap for adding JavaScript support to w3m, a text-based web browser. The integration focuses on essential web functionality while maintaining w3m's lightweight nature and terminal-based interface.
**FEASIBILITY ASSESSMENT: CHALLENGING BUT ACHIEVABLE**
Adding JavaScript to w3m is ambitious but technically feasible with careful planning and phased implementation. The key is balancing functionality with w3m's core philosophy of being a lightweight, fast, text-based browser.
---
## 1. ARCHITECTURE ANALYSIS: W3M CODEBASE STRUCTURE
### Current w3m Architecture
w3m's architecture provides several integration points for JavaScript:
#### Core Data Structures (fm.h:455-522)
```c
typedef struct _Buffer {
Line *firstLine, *topLine, *currentLine, *lastLine; // Document structure
AnchorList *href, *name, *img, *formitem; // Interactive elements
FormList *formlist; // Forms
ParsedURL currentURL, *baseURL; // Navigation
// ... 70+ additional fields
} Buffer;
typedef struct _Line { // Text line representation
char *lineBuf; // Text content
Lineprop *propBuf; // Character properties
Linecolor *colorBuf; // Color information (if USE_ANSI_COLOR)
// Position and formatting data
} Line;
```
#### HTML Processing Pipeline (file.c)
1. **HTML Parsing**: `loadGeneralFile()` → `loadHTMLBuffer()` → HTML tag processing
2. **Document Building**: `struct readbuffer` → `flushline()` → Buffer/Line creation
3. **Rendering**: `displayBuffer()` → terminal output
#### Event System (main.c:1300)
- **Input Processing**: `keyPressEventProc()` handles all user input
- **Event Loop**: `getch()` → keymap lookup → function dispatch
- **Form Interaction**: Form elements trigger buffer modifications
### Integration Points for JavaScript
1. **HTML Tag Processing** (parsetagx.c): Hook JavaScript parsing into HTML_SCRIPT tags
2. **Buffer Management** (buffer.c): Extend Buffer structure for DOM representation
3. **Event System** (main.c): Integrate JS event handlers into existing input system
4. **Form Processing** (form.c): Connect JS to form validation and submission
5. **Network Layer** (file.c): Handle XMLHttpRequest and fetch operations
---
## 2. JAVASCRIPT ENGINE RECOMMENDATION: QuickJS
### Selected Engine: QuickJS
- **Rationale**: 367kB footprint, ES2023 compliance, MIT license compatibility
- **Integration**: Drop-in C files with clean API
- **Performance**: Excellent balance for text browser needs
### QuickJS Integration Plan
```c
// Core JavaScript context structure
typedef struct {
JSRuntime *runtime;
JSContext *context;
JSValue global_obj;
JSValue document_obj;
JSValue window_obj;
} W3MJSContext;
// Per-buffer JavaScript state
typedef struct {
W3MJSContext *js_ctx;
JSValue *script_objects; // Array of script element objects
int script_count;
char *pending_scripts; // Scripts to execute on load
} BufferJSState;
```
---
## 3. DOM REPRESENTATION REQUIREMENTS
### Minimal DOM Implementation
w3m needs a simplified DOM focused on essential web functionality:
#### Core DOM Objects
```c
// Document Object Model structures
typedef struct W3MElement {
char *tagName; // Element tag name
char *id; // Element ID
char *className; // CSS classes
struct W3MElement *parent; // Parent element
struct W3MElement *firstChild;
struct W3MElement *nextSibling;
// w3m-specific mappings
Line *line; // Associated line in buffer
int line_pos; // Position within line
Anchor *anchor; // If element is interactive
FormItem *form_item; // If element is form control
// Attributes and content
HashMap *attributes;
char *textContent;
char *innerHTML;
} W3MElement;
typedef struct W3MDocument {
W3MElement *documentElement; // <html> element
W3MElement *body; // <body> element
W3MElement *head; // <head> element
Buffer *buffer; // Associated w3m buffer
char *title;
char *URL;
} W3MDocument;
```
#### DOM-to-Buffer Mapping
- **Element-to-Line Mapping**: Each DOM element maps to specific Line/position
- **Dynamic Updates**: Changes to DOM trigger buffer regeneration
- **Interactive Elements**: Form controls and links maintain two-way sync
### Essential DOM APIs
1. **Element Selection**: `getElementById()`, `getElementsByTagName()`
2. **Content Manipulation**: `innerHTML`, `textContent`, `setAttribute()`
3. **Form Access**: `document.forms[]`, form validation
4. **Navigation**: `location.href`, `location.reload()`
---
## 4. EVENT HANDLING INTEGRATION
### Event System Architecture
```c
typedef struct W3MJSEvent {
char *type; // "click", "submit", "load", etc.
W3MElement *target; // Target element
JSValue callback; // JavaScript callback function
struct W3MJSEvent *next; // Linked list
} W3MJSEvent;
typedef struct {
W3MJSEvent *event_listeners; // Registered event listeners
int event_queue_size;
W3MJSEvent **event_queue; // Pending events for processing
} W3MEventSystem;
```
### Event Integration Points
#### 1. Form Events (form.c integration)
```c
// In form submission handling
void processFormSubmit(Buffer *buf, FormList *form) {
// Existing form processing...
// NEW: Fire JavaScript submit event
if (buf->js_state && buf->js_state->js_ctx) {
fireJSEvent(buf->js_state->js_ctx, "submit", form->target_element);
}
// Continue with normal submission if not prevented
}
```
#### 2. Click Events (main.c:keyPressEventProc integration)
```c
// In anchor click processing
void followAnchor(Anchor *anchor) {
// NEW: Check for JavaScript click handlers
if (anchor->element && hasJSEventListener(anchor->element, "click")) {
JSValue result = fireJSEvent(js_ctx, "click", anchor->element);
if (JSEventDefaultPrevented(result)) {
return; // Don't follow link if preventDefault() called
}
}
// Existing anchor following logic...
}
```
#### 3. Page Load Events
```c
// In buffer loading completion
void completeBufferLoad(Buffer *buf) {
// Existing completion logic...
// NEW: Execute pending scripts and fire load event
if (buf->js_state) {
executeBufferScripts(buf);
fireJSEvent(buf->js_state->js_ctx, "load", buf->js_document->body);
}
}
```
---
## 5. JAVASCRIPT-HTML BRIDGE ARCHITECTURE
### Bridge Implementation Strategy
#### 1. HTML Script Tag Processing
```c
// Enhanced script tag handler in parsetagx.c
void processScriptTag(struct parsed_tag *tag, Buffer *buf) {
char *src = tag->value[ATTR_SRC];
char *script_content = NULL;
if (src) {
// External script - fetch via existing URL loading
script_content = loadExternalScript(src, buf);
} else {
// Inline script - extract from HTML content
script_content = extractInlineScript(tag);
}
if (script_content) {
addPendingScript(buf->js_state, script_content);
}
}
```
#### 2. DOM Object Binding
```c
// Bind w3m objects to JavaScript context
void bindDOMObjects(W3MJSContext *js_ctx, Buffer *buf) {
// Create document object
JSValue document = JS_NewObject(js_ctx->context);
// Bind document methods
JS_SetPropertyStr(js_ctx->context, document, "getElementById",
JS_NewCFunction(js_ctx->context, js_getElementById, "getElementById", 1));
JS_SetPropertyStr(js_ctx->context, document, "createElement",
JS_NewCFunction(js_ctx->context, js_createElement, "createElement", 1));
// Bind to global object
JS_SetPropertyStr(js_ctx->context, js_ctx->global_obj, "document", document);
// Create window object with location
JSValue window = JS_NewObject(js_ctx->context);
JSValue location = JS_NewObject(js_ctx->context);
JS_SetPropertyStr(js_ctx->context, location, "href",
JS_NewString(js_ctx->context, buf->currentURL.url));
JS_SetPropertyStr(js_ctx->context, window, "location", location);
JS_SetPropertyStr(js_ctx->context, js_ctx->global_obj, "window", window);
}
```
#### 3. Form Integration Bridge
```c
// JavaScript form validation integration
JSValue js_validateForm(JSContext *ctx, JSValueConst this_val, int argc, JSValueConst *argv) {
// Get form reference from arguments
FormList *form = getFormFromJSValue(ctx, argv[0]);
// Perform validation
int valid = validateFormData(form);
return JS_NewBool(ctx, valid);
}
```
### Network Integration
```c
// Simple XMLHttpRequest implementation using w3m's existing network layer
JSValue js_fetch(JSContext *ctx, JSValueConst this_val, int argc, JSValueConst *argv) {
const char *url = JS_ToCString(ctx, argv[0]);
// Use w3m's existing loadGeneralFile for HTTP requests
URLFile *response = loadURL(url, current_buffer);
// Convert response to JavaScript Promise/response object
JSValue result = createJSResponse(ctx, response);
JS_FreeCString(ctx, url);
return result;
}
```
---
## 6. IMPLEMENTATION ROADMAP
### Phase 1: Foundation (Months 1-2)
**Goal**: Basic JavaScript execution infrastructure
#### Milestones:
1. **QuickJS Integration**
- Add QuickJS source files to w3m build system
- Modify configure.ac and Makefile for conditional compilation
- Create basic JavaScript context management
2. **Buffer Extension**
- Extend Buffer structure with JavaScript state
- Add memory management for JS contexts
- Implement cleanup in buffer destruction
3. **Script Tag Recognition**
- Modify HTML parser to recognize `<script>` tags
- Store script content for later execution
- Handle both inline and external scripts
#### Deliverables:
- Modified build system with `--enable-javascript` configure option
- Extended Buffer structure with JavaScript support
- Basic script extraction from HTML
#### Code Changes:
```c
// In fm.h, extend Buffer structure:
typedef struct _Buffer {
// ... existing fields ...
#ifdef USE_JAVASCRIPT
BufferJSState *js_state;
W3MDocument *js_document;
#endif
} Buffer;
// New files to create:
// - js/w3m_javascript.h - JavaScript integration headers
// - js/w3m_javascript.c - Core JavaScript functionality
// - js/quickjs/ - QuickJS engine files
```
### Phase 2: DOM Foundation (Months 3-4)
**Goal**: Basic DOM representation and document object
#### Milestones:
1. **DOM Structure Creation**
- Build minimal DOM tree from parsed HTML
- Map DOM elements to Buffer/Line structures
- Implement basic element traversal
2. **Document Object Implementation**
- Create JavaScript `document` object
- Implement `getElementById()`, `getElementsByTagName()`
- Basic property access (title, URL)
3. **Element Property Access**
- innerHTML, textContent getters/setters
- Attribute manipulation (getAttribute, setAttribute)
- Basic style property access
#### Deliverables:
- DOM tree construction from HTML
- Working `document` object in JavaScript
- Basic element manipulation APIs
### Phase 3: Event System (Months 5-6)
**Goal**: JavaScript event handling integration
#### Milestones:
1. **Event Listener Registration**
- `addEventListener()` implementation
- Event listener storage and management
- Event object creation
2. **Form Event Integration**
- Submit event firing
- Input validation hooks
- Form data access from JavaScript
3. **Click Event Integration**
- Link click event handling
- Event propagation and preventDefault()
- Mouse event coordinate mapping
#### Deliverables:
- Complete event system integration
- Form validation via JavaScript
- Interactive link behavior control
### Phase 4: Form and Network Integration (Months 7-8)
**Goal**: Complete form handling and basic network requests
#### Milestones:
1. **Advanced Form Support**
- Form element access and manipulation
- Dynamic form creation and modification
- Form submission control
2. **Network Request Support**
- Basic XMLHttpRequest implementation
- Simple fetch() API using w3m's HTTP layer
- Response handling and parsing
3. **URL and Navigation Control**
- Location object implementation
- History manipulation (basic)
- Page redirection control
#### Deliverables:
- Full form manipulation from JavaScript
- Basic AJAX request capability
- URL and navigation control
### Phase 5: Performance and Compatibility (Months 9-10)
**Goal**: Optimization and modern web compatibility
#### Milestones:
1. **Performance Optimization**
- JavaScript execution optimization
- Memory usage optimization
- Lazy script loading
2. **Enhanced Compatibility**
- Modern JavaScript features support
- Better error handling and debugging
- Cross-platform compatibility testing
3. **Security Implementation**
- Same-origin policy enforcement
- Script execution sandboxing
- Content Security Policy basics
#### Deliverables:
- Optimized JavaScript performance
- Enhanced web compatibility
- Basic security measures
### Phase 6: Advanced Features (Months 11-12)
**Goal**: Advanced web functionality and polish
#### Milestones:
1. **Advanced DOM Manipulation**
- Dynamic element creation/removal
- Complex event handling
- CSS selector support (basic)
2. **Modern Web APIs**
- localStorage/sessionStorage (file-based)
- setTimeout/setInterval
- JSON object
3. **Testing and Documentation**
- Comprehensive test suite
- Performance benchmarking
- User documentation and examples
#### Deliverables:
- Production-ready JavaScript support
- Comprehensive testing
- Complete documentation
---
## 7. TECHNICAL CHALLENGES AND SOLUTIONS
### Challenge 1: Memory Management
**Problem**: QuickJS requires careful memory management in C environment
**Solution**:
- Wrap all JavaScript operations in error-handling contexts
- Implement automatic cleanup on buffer destruction
- Use w3m's existing Boehm GC for JavaScript object lifecycle
### Challenge 2: DOM-Buffer Synchronization
**Problem**: Keeping DOM state synchronized with w3m's text buffer
**Solution**:
- Implement two-way mapping between DOM elements and buffer positions
- Use buffer modification hooks to update DOM
- Cache DOM state and regenerate only when necessary
### Challenge 3: Event System Integration
**Problem**: Integrating JavaScript events with w3m's input handling
**Solution**:
- Hook into existing input processing pipeline
- Queue JavaScript events for processing during idle cycles
- Implement event priority system for critical operations
### Challenge 4: Network Request Handling
**Problem**: Handling asynchronous network requests in synchronous text browser
**Solution**:
- Implement pseudo-asynchronous requests using w3m's existing network layer
- Use callback queuing for request completion
- Integrate with w3m's existing download management
### Challenge 5: Cross-Platform Compatibility
**Problem**: Ensuring JavaScript works across all platforms w3m supports
**Solution**:
- Conditional compilation with feature detection
- Platform-specific implementations where necessary
- Extensive testing on target platforms
---
## 8. BUILD SYSTEM INTEGRATION
### Configure Script Modifications
```bash
# Add to configure.ac
AC_ARG_ENABLE([javascript],
AS_HELP_STRING([--enable-javascript], [Enable JavaScript support]),
[javascript_enabled=$enableval],
[javascript_enabled=no])
if test "x$javascript_enabled" = "xyes"; then
AC_DEFINE([USE_JAVASCRIPT], [1], [Define to enable JavaScript support])
JAVASCRIPT_LIBS="-lquickjs -lm"
JAVASCRIPT_OBJS="js/w3m_javascript.o js/w3m_dom.o js/w3m_events.o"
fi
AC_SUBST([JAVASCRIPT_LIBS])
AC_SUBST([JAVASCRIPT_OBJS])
```
### Makefile Modifications
```makefile
# Add to Makefile.in
ifdef USE_JAVASCRIPT
CFLAGS += -DUSE_JAVASCRIPT
LIBS += $(JAVASCRIPT_LIBS)
OBJS += $(JAVASCRIPT_OBJS)
js/%.o: js/%.c
$(CC) $(CFLAGS) -c $< -o $@
endif
```
---
## 9. TESTING STRATEGY
### Unit Testing
- Test JavaScript engine integration
- Test DOM manipulation functions
- Test event handling mechanisms
- Test memory management and cleanup
### Integration Testing
- Test with real-world websites
- Test form submission and validation
- Test network request handling
- Test performance with large pages
### Compatibility Testing
- Test with modern JavaScript frameworks (basic functionality)
- Test with common web libraries
- Cross-platform compatibility verification
- Memory usage and performance benchmarking
### Test Websites for Validation
1. **Basic JavaScript**: Simple form validation, alert dialogs
2. **DOM Manipulation**: Dynamic content updates, element creation
3. **Event Handling**: Click handlers, form events
4. **Network Requests**: Simple AJAX calls, form submissions
5. **Modern Features**: JSON parsing, local storage simulation
---
## 10. RISK ASSESSMENT AND MITIGATION
### High-Risk Areas
1. **Memory Leaks**: JavaScript objects not properly cleaned up
- *Mitigation*: Comprehensive cleanup procedures, automated testing
2. **Performance Impact**: JavaScript execution slowing down text rendering
- *Mitigation*: Lazy loading, execution time limits, profiling
3. **Security Vulnerabilities**: Untrusted JavaScript execution
- *Mitigation*: Sandboxing, same-origin policy, content filtering
4. **Compatibility Issues**: Modern websites expecting full browser capabilities
- *Mitigation*: Clear documentation of supported features, graceful degradation
### Medium-Risk Areas
1. **Build Complexity**: Integration with existing build system
- *Mitigation*: Thorough testing, multiple platform validation
2. **Code Maintenance**: Additional complexity in codebase
- *Mitigation*: Modular design, comprehensive documentation
---
## 11. CONFIGURATION OPTIONS
### Runtime Configuration
```c
// New configuration options for w3m
global int EnableJavaScript init(FALSE);
global int JSExecutionTimeout init(5000); // 5 second timeout
global int JSMemoryLimit init(8*1024*1024); // 8MB memory limit
global int JSNetworkRequests init(TRUE); // Allow network requests
global int JSFormValidation init(TRUE); // Enable form validation
```
### User Configuration (.w3m/config)
```
# JavaScript configuration options
javascript_enabled 1
javascript_timeout 5000
javascript_memory_limit 8388608
javascript_network 1
javascript_forms 1
javascript_console_log 0
```
---
## 12. USER INTERFACE CONSIDERATIONS
### JavaScript Console (Optional)
- Simple text-based console for debugging
- Console.log() output to status line or separate buffer
- Error reporting and debugging information
### Status Indicators
- JavaScript execution indicator in status line
- Error notifications for script failures
- Performance warnings for slow scripts
### User Controls
- Ability to disable JavaScript per-site
- Manual script execution control
- Performance monitoring display
---
## 13. DOCUMENTATION REQUIREMENTS
### Developer Documentation
1. **JavaScript Integration Guide**: How to extend JavaScript functionality
2. **DOM API Reference**: Complete API documentation for supported features
3. **Event System Documentation**: Event handling and custom events
4. **Build Instructions**: Compilation with JavaScript support
### User Documentation
1. **JavaScript Support Overview**: What works and what doesn't
2. **Configuration Guide**: How to configure JavaScript options
3. **Troubleshooting Guide**: Common issues and solutions
4. **Performance Tips**: Optimizing JavaScript performance
---
## 14. CONCLUSION AND FEASIBILITY ASSESSMENT
### Feasibility: **CHALLENGING BUT ACHIEVABLE**
Adding JavaScript support to w3m is technically feasible but represents a significant undertaking:
#### Pros:
- ✅ **Existing Architecture**: w3m's buffer and event system provide good integration points
- ✅ **Suitable Engine**: QuickJS offers the right balance of features and size
- ✅ **Incremental Development**: Can be implemented in phases with useful milestones
- ✅ **Community Value**: Would significantly enhance w3m's utility for modern web
#### Cons:
- ⚠️ **Complexity**: Adds significant complexity to a historically simple codebase
- ⚠️ **Performance**: May impact w3m's speed and memory efficiency
- ⚠️ **Maintenance**: Requires ongoing maintenance and security updates
- ⚠️ **Compatibility Gap**: Will never achieve full modern browser compatibility
### Recommended Approach:
1. **Start with Phase 1**: Basic infrastructure and script recognition
2. **Validate Early**: Test with real websites to ensure practical value
3. **Community Feedback**: Engage w3m community throughout development
4. **Performance Focus**: Maintain w3m's speed and efficiency as top priority
### Success Metrics:
- Popular websites work with basic JavaScript functionality
- No significant performance regression for non-JavaScript pages
- Positive community reception and adoption
- Maintainable codebase that doesn't compromise w3m's core values
### Estimated Timeline: **12 months** for full implementation
### Estimated Effort: **2000-3000 hours** of development time
### Code Size Impact: **+30-50% codebase size** (with QuickJS integration)
**FINAL ASSESSMENT**: This is an ambitious but worthwhile project that could significantly enhance w3m's relevance for modern web browsing while maintaining its text-based, efficient character. Success depends on careful implementation that preserves w3m's core strengths while adding essential modern web functionality.