Adding upstream version 0.5.1
This commit is contained in:
209
doc/STORY.html
Normal file
209
doc/STORY.html
Normal file
@@ -0,0 +1,209 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>History of w3m</title>
|
||||
</head>
|
||||
<body>
|
||||
<h1>History of w3m</h1>
|
||||
<div align=right>
|
||||
1999/2/18<br>
|
||||
1999/3/8 revised<br>
|
||||
1999/6/11 translated into English<br>
|
||||
Akinori Ito<br>
|
||||
aito@fw.ipsj.or.jp
|
||||
</div>
|
||||
<h2>Introduction</h2>
|
||||
W3m is a text-based pager and WWW browser.
|
||||
It is similar application to the famous text-based
|
||||
browser <a href="http://www.lynx.browser.org/">Lynx</a>.
|
||||
However, w3m has several advantages against Lynx. For example,
|
||||
<UL>
|
||||
<LI>W3m can render tables.
|
||||
<LI>W3m can render frame (by converting frame into table).
|
||||
<LI>As w3m is a pager, it can read document from standard input.
|
||||
(I heard Lynx also can display standard-input-given document, like this:
|
||||
<pre>
|
||||
lynx /dev/fd/0 > file
|
||||
</pre>
|
||||
Hmm, it works on Linux. )
|
||||
<LI>W3m is small. Its stripped binary for Sparc (compiled with
|
||||
gcc -O2, version beta-990217) is only 260kbyte, while binary size
|
||||
of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.)
|
||||
</UL>
|
||||
It is true that Lynx is an excellent browser, who have many
|
||||
features w3m doesn't have. For example,
|
||||
<UL>
|
||||
<LI>Lynx can handle cookies.
|
||||
<LI>Lynx has many options.
|
||||
<LI>Lynx is multilingual. (W3m is Japanese-English bilingual)
|
||||
</UL>
|
||||
etc. It is also a great advantage that Lynx has a lot of
|
||||
documentation.
|
||||
<P>
|
||||
<b>I don't intend w3m to be a substitute of any other browsers,
|
||||
including Netscape and Lynx.</b> Why did I wrote w3m?
|
||||
Because I felt inconvenient with conventional browsers
|
||||
to `take a look' at web pages.
|
||||
I am browsing web pages in LAN environment. When I want to take
|
||||
a glance at a web page, I don't want to wait to start up Netscape.
|
||||
Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand,
|
||||
w3m starts immediately with little load to the host machine.
|
||||
After looking at the information using w3m, I use other browser
|
||||
if I want to read the the page in detail. As for me, however,
|
||||
w3m is enough to read most of web pages.
|
||||
|
||||
<h2>The birth of w3m</h2>
|
||||
<P>
|
||||
w3m was derived from a pager named `fm'. Fm was written before
|
||||
1991 (I don't remember the exact date) when WWW was not popular.
|
||||
At that time, the word `browser' meant a file browser like
|
||||
`more' or `less'.
|
||||
<P>
|
||||
I wrote fm to debug a program for my research. To trace the status
|
||||
of the program, it dumped megabytes of values of variables into a file,
|
||||
and I debugged it by checking the dumped file. The program dumped
|
||||
information at a certain time in one line, which made the dumped line
|
||||
several hundred characters long. When I looked the file using `more' or
|
||||
`less', one line was folded into several lines and it was very hard
|
||||
to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed
|
||||
one logical line as one physical line. When seeing the hidden
|
||||
part of a line, fm shifted entire screen. As I used 80x24 terminal at that
|
||||
time, fm was very useful for the debugging.
|
||||
<P>
|
||||
Several years later, I got to know WWW and began to use it.
|
||||
I used XMosaic and Chimera. I liked Chimera because it was light.
|
||||
As I was interested in the mechanism of WWW, I learned HTML and
|
||||
HTTP, and I felt it simpler than I expected. The earlier version
|
||||
of HTTP was very similar to Gopher protocol. HTML 2.0 was
|
||||
simple enough to render. All I have to do seemed to be line folding
|
||||
and itemized display. Then I made a little modification to fm
|
||||
and made a web browser. It was the first version of w3m.
|
||||
The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru',
|
||||
which means `see WWW'. It was an inheritance from `fm', which
|
||||
was an abbreviation of `File wo miru'. The first version of w3m
|
||||
was released at the beginning of 1995.
|
||||
|
||||
<h2>Death and rebirth of w3m</h2>
|
||||
<p>
|
||||
I had used w3m as a pager to read files, E-mails and online manuals.
|
||||
It was a substitute of less. Sometimes I used w3m as a web browser,
|
||||
but there were many pages w3m couldn't display correctly, most of
|
||||
which used table for page layout. Once I tried to implement table
|
||||
renderer, but I gave up because it seemed to be too difficult for me.
|
||||
<P>
|
||||
It was 1998 when I tried to modify w3m again. There were two reasons.
|
||||
The first is that I had some time to do it. I stayed Boston University
|
||||
as a visiting researcher at that time. The second reason is that
|
||||
I wanted to use table in my personal web page. I had written research
|
||||
log using HTML, and I wanted to write a table in it. At first I used
|
||||
<pre>..</pre> to describe table, but it was not cool at all.
|
||||
One day I used <table> tag, which made me to use Netscape to
|
||||
read the research log. Then I decided to implement a table renderer
|
||||
into w3m.
|
||||
<P>
|
||||
I didn't intend to write a perfect table renderer because tables
|
||||
I used was not very complicated. However, incomplete table rendering
|
||||
made the display of table-layout pages horrible. I realized that
|
||||
it required almost-perfect table renderer
|
||||
to do well both in `rendering (real) table' and `fine display of
|
||||
table-layout page.' It was a thorn path.
|
||||
<P>
|
||||
After taking several months, I finished `fair' table renderer.
|
||||
Then I implemented form into w3m. Finally, w3m was reborn as a
|
||||
practical web browser.
|
||||
|
||||
<h2>Table rendering algorithm in w3m</h2>
|
||||
|
||||
HTML table rendering is difficult. Tabular environment
|
||||
of LaTeX is not very difficult, which makes the width of a column
|
||||
either a specified value or the maximum width to put items into it.
|
||||
On the other hand, HTML table renderer has to decide
|
||||
the width of a column so that the entire table can fit into the
|
||||
display appropriately, and fold the contents of the table according
|
||||
to the column width. Inappropriate column width decision makes
|
||||
the table ugly. Moreover, table can be nested, which makes the algorithm
|
||||
more complicated.
|
||||
|
||||
<OL>
|
||||
<LI>First, calculate the maximum and minimum width of each column.
|
||||
The maximum width is the width required to display the column
|
||||
without folding the contents. Generally, it is the length of
|
||||
paragraph delimited by <BR> or <P>.
|
||||
The minimum width is the lower limit to display the contents.
|
||||
If the column contains the word `internationalization', the minimum
|
||||
width will be 20. If the column contains
|
||||
<pre>..</pre>, the maximum width of the preformatted
|
||||
text will be the minimum width of the column.
|
||||
|
||||
<LI>If the width of the column is specified by WIDTH attribute,
|
||||
fix the column width using that value. If the specified width is
|
||||
smaller than the minimum width of the column, fix the column width
|
||||
to the minimum width.
|
||||
|
||||
<LI>Calculate the sum of the maximum width (or fixed width) of
|
||||
each column and check if the sum exceeds the screen width.
|
||||
If it is smaller than screen width, these values are used for
|
||||
width of each column.
|
||||
|
||||
<LI>If the sum is larger than the screen width, determine the widths
|
||||
of each column according to the following steps.
|
||||
<OL>
|
||||
<LI>Let W be the screen width subtracted by the sum of widths of
|
||||
fixed-width columns.
|
||||
<LI>Distribute W into the columns whose width are not decided,
|
||||
in proportion to the logarithm of the maximum width of each column.
|
||||
<li>If the distributed width of a column is smaller than the minimum width,
|
||||
then fix the width of the column to the minimum width, and
|
||||
do the distribution again.
|
||||
</OL>
|
||||
</OL>
|
||||
|
||||
In this process, distributed width is proportion to logarithm of
|
||||
maximum width, but I am not sure that this heuristic is the best.
|
||||
It can be, for example, square root of the maximum width.
|
||||
<P>
|
||||
The algorithm above assumes that the screen width is known.
|
||||
But it is not true for nested table. According the algorithm above,
|
||||
the column width of the outer table have to be known to render
|
||||
the inner table, while the total width of the inner table have to
|
||||
be known to determine the column width of the outer table.
|
||||
If WIDTH attribute exists there are no problems. Otherwise, w3m
|
||||
assumes that the inner table is 0.8 times as wide as the outer
|
||||
table. It works fine, but if there are two tables side by side in an outer
|
||||
table, the width of the outer table always exceeds the screen width.
|
||||
To render this kind of table correctly, one have to render the table once,
|
||||
check the width of outmost table, and then render the entire table again.
|
||||
Netscape might employ this kind of algorithm.
|
||||
|
||||
<h2>Libraries</h2>
|
||||
|
||||
w3m uses
|
||||
<a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a>
|
||||
library. This library was written by H. Boehm and A. Demers.
|
||||
I could distribute w3m without this library because one can
|
||||
get the library separately, but I decided to contain it in the
|
||||
w3m distribution for the convenience of an installer.
|
||||
W3m doesn't use libwww.
|
||||
<P>
|
||||
Boehm GC is a garbage collector for C and C++. I began to use this
|
||||
library when I implemented table, and it was great. I couldn't
|
||||
implement table and form without this library.
|
||||
<P>
|
||||
Older version than beta-990304 used
|
||||
<a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a>
|
||||
because I felt tired of writing codes to handle FTP protocol.
|
||||
But I rewrote the FTP code by myself to make w3m completely free.
|
||||
It made w3m slightly smaller.
|
||||
<P>
|
||||
By the way, w3m doesn't use UNIX standard regexp library and curses library.
|
||||
It is because I want to use Japanese. When I wrote fm, there were
|
||||
no free regexp/curses libraries that can treat Japanese. Now both libraries
|
||||
are available and they looks faster than w3m code.
|
||||
|
||||
<h2>Future work</h2>
|
||||
|
||||
...Nothing. As w3m's virtues are its small size and rendering speed,
|
||||
adding more features might lose these advantages. On the other hand,
|
||||
w3m is still known to have many bugs, and I will continue fixing them.
|
||||
|
||||
</body>
|
||||
</html>
|
Reference in New Issue
Block a user