214 lines
9.3 KiB
HTML
214 lines
9.3 KiB
HTML
<html>
|
|
<head>
|
|
<title>History of w3m</title>
|
|
</head>
|
|
<body>
|
|
<h1>History of w3m</h1>
|
|
<div align=right>
|
|
1999/2/18<br>
|
|
1999/3/8 revised<br>
|
|
1999/6/11 translated into English<br>
|
|
Akinori Ito<br>
|
|
aito@fw.ipsj.or.jp
|
|
</div>
|
|
<h2>Introduction</h2>
|
|
W3m is a text-based pager and WWW browser.
|
|
It is similar application to the famous text-based
|
|
browser <a href="http://www.lynx.browser.org/">Lynx</a>.
|
|
However, w3m has several advantages against Lynx. For example,
|
|
<UL>
|
|
<LI>W3m can render tables.
|
|
<LI>W3m can render frame (by converting frame into table).
|
|
<LI>As w3m is a pager, it can read document from standard input.
|
|
(I heard Lynx also can display standard-input-given document, like this:
|
|
<pre>
|
|
lynx /dev/fd/0 > file
|
|
</pre>
|
|
Hmm, it works on Linux. )
|
|
<LI>W3m is small. Its stripped binary for Sparc (compiled with
|
|
gcc -O2, version beta-990217) is only 260kbyte, while binary size
|
|
of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.)
|
|
</UL>
|
|
It is true that Lynx is an excellent browser, who have many
|
|
features w3m doesn't have. For example,
|
|
<UL>
|
|
<LI>Lynx can handle cookies.
|
|
<LI>Lynx has many options.
|
|
<LI>Lynx is multilingual. (W3m is Japanese-English bilingual)
|
|
</UL>
|
|
etc. It is also a great advantage that Lynx has a lot of
|
|
documentation.
|
|
<P>
|
|
<b>I don't intend w3m to be a substitute of any other browsers,
|
|
including Netscape and Lynx.</b> Why did I wrote w3m?
|
|
Because I felt inconvenient with conventional browsers
|
|
to `take a look' at web pages.
|
|
I am browsing web pages in LAN environment. When I want to take
|
|
a glance at a web page, I don't want to wait to start up Netscape.
|
|
Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand,
|
|
w3m starts immediately with little load to the host machine.
|
|
After looking at the information using w3m, I use other browser
|
|
if I want to read the the page in detail. As for me, however,
|
|
w3m is enough to read most of web pages.
|
|
|
|
<h2>The birth of w3m</h2>
|
|
<P>
|
|
w3m was derived from a pager named `fm'. Fm was written before
|
|
1991 (I don't remember the exact date) when WWW was not popular.
|
|
At that time, the word `browser' meant a file browser like
|
|
`more' or `less'.
|
|
<P>
|
|
I wrote fm to debug a program for my research. To trace the status
|
|
of the program, it dumped megabytes of values of variables into a file,
|
|
and I debugged it by checking the dumped file. The program dumped
|
|
information at a certain time in one line, which made the dumped line
|
|
several hundred characters long. When I looked the file using `more' or
|
|
`less', one line was folded into several lines and it was very hard
|
|
to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed
|
|
one logical line as one physical line. When seeing the hidden
|
|
part of a line, fm shifted entire screen. As I used 80x24 terminal at that
|
|
time, fm was very useful for the debugging.
|
|
<P>
|
|
Several years later, I got to know WWW and began to use it.
|
|
I used XMosaic and Chimera. I liked Chimera because it was light.
|
|
As I was interested in the mechanism of WWW, I learned HTML and
|
|
HTTP, and I felt it simpler than I expected. The earlier version
|
|
of HTTP was very similar to Gopher protocol. HTML 2.0 was
|
|
simple enough to render. All I have to do seemed to be line folding
|
|
and itemized display. Then I made a little modification to fm
|
|
and made a web browser. It was the first version of w3m.
|
|
The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru',
|
|
which means `see WWW'. It was an inheritance from `fm', which
|
|
was an abbreviation of `File wo miru'. The first version of w3m
|
|
was released at the beginning of 1995.
|
|
|
|
<h2>Death and rebirth of w3m</h2>
|
|
<p>
|
|
I had used w3m as a pager to read files, E-mails and online manuals.
|
|
It was a substitute of less. Sometimes I used w3m as a web browser,
|
|
but there were many pages w3m couldn't display correctly, most of
|
|
which used table for page layout. Once I tried to implement table
|
|
renderer, but I gave up because it seemed to be too difficult for me.
|
|
<P>
|
|
It was 1998 when I tried to modify w3m again. There were two reasons.
|
|
The first is that I had some time to do it. I stayed Boston University
|
|
as a visiting researcher at that time. The second reason is that
|
|
I wanted to use table in my personal web page. I had written research
|
|
log using HTML, and I wanted to write a table in it. At first I used
|
|
<pre>..</pre> to describe table, but it was not cool at all.
|
|
One day I used <table> tag, which made me to use Netscape to
|
|
read the research log. Then I decided to implement a table renderer
|
|
into w3m.
|
|
<P>
|
|
I didn't intend to write a perfect table renderer because tables
|
|
I used was not very complicated. However, incomplete table rendering
|
|
made the display of table-layout pages horrible. I realized that
|
|
it required almost-perfect table renderer
|
|
to do well both in `rendering (real) table' and `fine display of
|
|
table-layout page.' It was a thorn path.
|
|
<P>
|
|
After taking several months, I finished `fair' table renderer.
|
|
Then I implemented form into w3m. Finally, w3m was reborn as a
|
|
practical web browser.
|
|
|
|
<h2>Table rendering algorithm in w3m</h2>
|
|
|
|
HTML table rendering is difficult. Tabular environment
|
|
of LaTeX is not very difficult, which makes the width of a column
|
|
either a specified value or the maximum width to put items into it.
|
|
On the other hand, HTML table renderer has to decide
|
|
the width of a column so that the entire table can fit into the
|
|
display appropriately, and fold the contents of the table according
|
|
to the column width. Inappropriate column width decision makes
|
|
the table ugly. Moreover, table can be nested, which makes the algorithm
|
|
more complicated.
|
|
|
|
<OL>
|
|
<LI>First, calculate the maximum and minimum width of each column.
|
|
The maximum width is the width required to display the column
|
|
without folding the contents. Generally, it is the length of
|
|
paragraph delimited by <BR> or <P>.
|
|
The minimum width is the lower limit to display the contents.
|
|
If the column contains the word `internationalization', the minimum
|
|
width will be 20. If the column contains
|
|
<pre>..</pre>, the maximum width of the preformatted
|
|
text will be the minimum width of the column.
|
|
|
|
<LI>If the width of the column is specified by WIDTH attribute,
|
|
fix the column width using that value. If the specified width is
|
|
smaller than the minimum width of the column, fix the column width
|
|
to the minimum width.
|
|
|
|
<LI>Calculate the sum of the maximum width (or fixed width) of
|
|
each column and check if the sum exceeds the screen width.
|
|
If it is smaller than screen width, these values are used for
|
|
width of each column.
|
|
|
|
<LI>If the sum is larger than the screen width, determine the widths
|
|
of each column according to the following steps.
|
|
<OL>
|
|
<LI>Let W be the screen width subtracted by the sum of widths of
|
|
fixed-width columns.
|
|
<LI>Distribute W into the columns whose width are not decided,
|
|
in proportion to the logarithm of the maximum width of each column.
|
|
<li>If the distributed width of a column is smaller than the minimum width,
|
|
then fix the width of the column to the minimum width, and
|
|
do the distribution again.
|
|
</OL>
|
|
</OL>
|
|
|
|
In this process, distributed width is proportion to logarithm of
|
|
maximum width, but I am not sure that this heuristic is the best.
|
|
It can be, for example, square root of the maximum width.
|
|
<P>
|
|
The algorithm above assumes that the screen width is known.
|
|
But it is not true for nested table. According the algorithm above,
|
|
the column width of the outer table have to be known to render
|
|
the inner table, while the total width of the inner table have to
|
|
be known to determine the column width of the outer table.
|
|
If WIDTH attribute exists there are no problems. Otherwise, w3m
|
|
assumes that the inner table is 0.8 times as wide as the outer
|
|
table. It works fine, but if there are two tables side by side in an outer
|
|
table, the width of the outer table always exceeds the screen width.
|
|
To render this kind of table correctly, one have to render the table once,
|
|
check the width of outmost table, and then render the entire table again.
|
|
Netscape might employ this kind of algorithm.
|
|
|
|
<h2>Libraries</h2>
|
|
|
|
w3m uses
|
|
<a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a>
|
|
library. This library was written by H. Boehm and A. Demers.
|
|
I could distribute w3m without this library because one can
|
|
get the library separately, but I decided to contain it in the
|
|
w3m distribution for the convenience of an installer.
|
|
<P>
|
|
# Boehm GC library is no longer included into w3m packages
|
|
# after w3m-0.4.2.
|
|
<P>
|
|
W3m doesn't use libwww.
|
|
<P>
|
|
Boehm GC is a garbage collector for C and C++. I began to use this
|
|
library when I implemented table, and it was great. I couldn't
|
|
implement table and form without this library.
|
|
<P>
|
|
Older version than beta-990304 used
|
|
<a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a>
|
|
because I felt tired of writing codes to handle FTP protocol.
|
|
But I rewrote the FTP code by myself to make w3m completely free.
|
|
It made w3m slightly smaller.
|
|
<P>
|
|
By the way, w3m doesn't use UNIX standard regexp library and curses library.
|
|
It is because I want to use Japanese. When I wrote fm, there were
|
|
no free regexp/curses libraries that can treat Japanese. Now both libraries
|
|
are available and they looks faster than w3m code.
|
|
|
|
<h2>Future work</h2>
|
|
|
|
...Nothing. As w3m's virtues are its small size and rendering speed,
|
|
adding more features might lose these advantages. On the other hand,
|
|
w3m is still known to have many bugs, and I will continue fixing them.
|
|
|
|
</body>
|
|
</html>
|