Files
w3m/CLAUDE.md
2025-08-16 19:43:11 -04:00

9.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

w3m is a text-based web browser and pager for terminal environments. It's a C codebase originally developed by Akinori Ito and currently maintained as a Debian fork. The browser can display HTML documents in text mode, follow links, handle forms, and display images using external viewers.

Build System

This project uses autotools (autoconf/automake) with a traditional Makefile-based build system:

# Configure the build (generates Makefile from Makefile.in)
./configure

# Build the project
make

# Install (requires root privileges)
make install

# Clean build artifacts
make clean

Dependencies

  • GC library (version 6.1 or later): Boehm garbage collector for memory management
  • Standard C development tools: gcc/clang, make, autoconf
  • Optional: Various image libraries for image display support

Testing

Basic regression tests are available:

# Run the test suite
cd tests
./run_tests

The test suite compares w3m HTML rendering output against expected results for various HTML test cases.

Code Architecture

Core Components

  • main.c: Entry point and main event loop
  • fm.h: Central header file containing core data structures and definitions
  • buffer.c: Buffer management for document content and display
  • display.c: Terminal display and rendering logic with color support
  • html.c: HTML parsing and tag processing
  • file.c: File and URL handling, protocol support (HTTP, FTP, etc.)
  • form.c: HTML form processing and interaction
  • table.c: HTML table rendering and layout
  • frame.c: HTML frame support

Key Data Structures

  • Buffer: Central data structure for document content, defined in fm.h
  • TabBuffer: Tab management for multiple documents
  • BufferPos: Position tracking within documents

Character Encoding Support

The libwc/ directory contains comprehensive character encoding support:

  • Multi-byte character handling (UTF-8, EUC-JP, Big5, etc.)
  • Character set detection and conversion
  • Wide character support for international text

Image Support

The w3mimg/ directory provides image display capabilities:

  • fb/: Framebuffer image display
  • x11/: X11 image display
  • win/: Windows image display

Internationalization

  • po/: Translation files for multiple languages (German, Japanese, Chinese, etc.)
  • Multi-language documentation in doc/, doc-jp/, doc-de/

Configuration

  • Configuration is handled through autoconf-generated config.h
  • Runtime configuration via rc.c and various RC files
  • Menu and keymap configurations in doc/ directories

Development Notes

  • The codebase uses the Boehm GC for memory management
  • Heavy use of custom string handling via Str.c/Str.h
  • Terminal capabilities handled through terms.c
  • Mouse support available through GPM and other terminal mouse protocols

Screen Reader Navigation Features

Screen reader-style navigation commands have been successfully implemented to improve accessibility:

New Navigation Commands:

  • d - Move to next heading (NEXT_HEADING)
  • e - Move to previous heading (PREV_HEADING)
  • f - Move to next form element (NEXT_FORM)
  • p - Move to previous form element (PREV_FORM)

Implementation Details:

  • Heading navigation: Uses intelligent heuristic text analysis to identify actual headings while filtering out paragraphs, links, and other non-heading content
  • Form navigation: Leverages existing formitem anchor system for reliable form element traversal
  • Functions: Implemented in main.c as _nextHeading(), _prevHeading(), _nextForm(), _prevForm()
  • Key bindings: Integrated into hardcoded keymap array in keybind.c for reliable key processing
  • Status: WORKING - Heading navigation fully functional, form navigation ready for testing

Build Notes

Modernization Status

The codebase has been partially modernized to compile with modern GCC versions:

FIXED:

  • Signal handler type compatibility issues in main.c, terms.c, istream.c
  • Function pointer type issues in parsetagx.c (function dispatch table)
  • Input keymap function call issues in linein.c
  • GPM mouse library compatibility (Gpm_Wgetch vs Gpm_Getch)

⚠️ REMAINING ISSUES:

  • Function pointer compatibility in libwc/ (character encoding library)
  • Various other function pointer signature mismatches throughout codebase

BUILD COMMAND:

make WARNINGS="-Wall -Wnull-dereference -Wno-incompatible-pointer-types -Wno-pointer-sign"

The core w3m functionality including the new screen reader navigation compiles successfully. The remaining issues are in the character encoding subsystem and would require systematic function pointer signature updates throughout libwc/.

Security Considerations

Recent security fixes have addressed buffer overflow vulnerabilities (CVE-2023-38252, CVE-2023-38253). When modifying string handling or buffer operations, pay careful attention to bounds checking.

JavaScript Integration Project

⚠️ CRITICAL SESSION REMINDERS

  • UPDATE THIS CLAUDE.md FILE FREQUENTLY during each session with progress, problems, and discoveries
  • SECURITY FIRST: Be extra security-conscious - we're adding script execution to a browser
  • TEST THOROUGHLY: Each feature must be tested before moving to next milestone
  • BRANCH: Work in javascript-integration branch, merge to master when stable

Project Status: 🚀 ACTIVE DEVELOPMENT

Current Branch: javascript-integration Current Phase: Phase 1 - Foundation (Starting) Started: 2025-01-16

Phase Progress Tracking

Phase 1: Foundation (Months 1-2) - COMPLETED

Goal: Basic JavaScript execution infrastructure

Milestones:

  • QuickJS Integration: Add QuickJS source files to w3m build system
  • Buffer Extension: Extend Buffer structure with JavaScript state
  • Script Tag Recognition: Modify HTML parser to recognize <script> tags

Status: COMPLETED - All Phase 1 objectives achieved! Completion Date: 2025-01-16

Phase 2: DOM Foundation (Months 3-4) - PLANNED

  • DOM Structure Creation
  • Document Object Implementation
  • Element Property Access

Phase 3: Event System (Months 5-6) - PLANNED

  • Event Listener Registration
  • Form Event Integration
  • Click Event Integration

Phase 4: Form and Network Integration (Months 7-8) - PLANNED

  • Advanced Form Support
  • Network Request Support
  • URL and Navigation Control

Phase 5: Performance and Compatibility (Months 9-10) - PLANNED

  • Performance Optimization
  • Enhanced Compatibility
  • Security Implementation

Phase 6: Advanced Features (Months 11-12) - PLANNED

  • Advanced DOM Manipulation
  • Modern Web APIs
  • Testing and Documentation

Implementation Notes

Architecture Decisions:

  • JavaScript Engine: QuickJS (367kB, ES2023, MIT license)
  • Integration Points: Buffer system, HTML parsing, event handling
  • Build System: Conditional compilation with --enable-javascript

Security Requirements:

  • ⚠️ NEVER execute untrusted code without sandboxing
  • ⚠️ ALWAYS validate JavaScript input and DOM manipulation
  • ⚠️ IMPLEMENT same-origin policy enforcement
  • ⚠️ LIMIT execution time and memory usage
  • ⚠️ SANITIZE all user input before JavaScript processing

Key Files:

  • javascript.txt: Comprehensive implementation roadmap (12-month plan)
  • fm.h: Core data structures (Buffer, Line, etc.)
  • file.c: HTML parsing and loading
  • main.c: Event handling and input processing

Session Progress Log

Session 2025-01-16:

  • Completed comprehensive analysis of w3m architecture
  • Researched JavaScript engines, selected QuickJS
  • Created detailed 12-month implementation roadmap
  • Created javascript-integration branch
  • Updated CLAUDE.md with project tracking
  • COMPLETED Phase 1 - Full JavaScript foundation implemented!

Phase 1 Implementation Details:

  • Integrated QuickJS 2024-01-13 into w3m build system
  • Added --enable-javascript configure option with autotools
  • Extended Buffer structure with JavaScript state fields
  • Implemented JavaScript context management (w3m_javascript.c)
  • Created DOM foundation framework (w3m_dom.c)
  • Built event system infrastructure (w3m_events.c)
  • Added HTML <script> tag recognition in parser
  • Resolved complex C type compatibility issues
  • Fixed QuickJS compilation with CONFIG_VERSION
  • Successfully tested - w3m compiles and runs with JavaScript support!

Troubleshooting & Discoveries

Build System Notes:

  • w3m uses autotools (autoconf/automake)
  • Configure script needs --enable-javascript option
  • QuickJS requires C99 compiler support
  • Memory management integration with Boehm GC

Code Integration Challenges:

  • Buffer structure extension without breaking ABI
  • JavaScript context lifecycle management
  • DOM-to-Buffer synchronization complexity
  • Event system integration with existing input handling

Performance Considerations:

  • JavaScript execution must not slow down text rendering
  • Memory usage must remain reasonable for terminal browser
  • Script execution timeouts essential for responsiveness

Development Commands

# Switch to JavaScript development branch
git checkout javascript-integration

# Build with warnings suitable for development
make WARNINGS="-Wall -Wnull-dereference -Wno-incompatible-pointer-types -Wno-pointer-sign"

# Future: Build with JavaScript support (when implemented)
./configure --enable-javascript
make

# Run tests
cd tests && ./run_tests

🚨 REMEMBER: UPDATE THIS FILE EVERY SESSION! 🚨