214 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			214 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
| <html>
 | |
| <head>
 | |
| <title>History of w3m</title>
 | |
| </head>
 | |
| <body>
 | |
| <h1>History of w3m</h1>
 | |
| <div align=right>
 | |
| 1999/2/18<br>
 | |
| 1999/3/8 revised<br>
 | |
| 1999/6/11 translated into English<br>
 | |
| Akinori Ito<br>
 | |
| aito@fw.ipsj.or.jp
 | |
| </div>
 | |
| <h2>Introduction</h2>
 | |
| W3m is a text-based pager and WWW browser.
 | |
| It is similar application to the famous text-based
 | |
| browser <a href="http://www.lynx.browser.org/">Lynx</a>.
 | |
| However, w3m has several advantages against Lynx. For example,
 | |
| <UL>
 | |
| <LI>W3m can render tables.
 | |
| <LI>W3m can render frame (by converting frame into table).
 | |
| <LI>As w3m is a pager, it can read document from standard input.
 | |
| (I heard Lynx also can display standard-input-given document, like this:
 | |
| <pre>
 | |
|    lynx /dev/fd/0 > file
 | |
| </pre>
 | |
| Hmm, it works on Linux. )
 | |
| <LI>W3m is small. Its stripped binary for Sparc (compiled with
 | |
| gcc -O2, version beta-990217) is only 260kbyte, while binary size
 | |
| of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.)
 | |
| </UL>
 | |
| It is true that Lynx is an excellent browser, who have many
 | |
| features w3m doesn't have. For example,
 | |
| <UL>
 | |
| <LI>Lynx can handle cookies.
 | |
| <LI>Lynx has many options.
 | |
| <LI>Lynx is multilingual. (W3m is Japanese-English bilingual)
 | |
| </UL>
 | |
| etc. It is also a great advantage that Lynx has a lot of
 | |
| documentation.
 | |
| <P>
 | |
| <b>I don't intend w3m to be a substitute of any other browsers,
 | |
| including Netscape and Lynx.</b> Why did I wrote w3m?
 | |
| Because I felt inconvenient with conventional browsers 
 | |
| to `take a look' at web pages.
 | |
| I am browsing web pages in LAN environment. When I want to take
 | |
| a glance at a web page, I don't want to wait to start up Netscape.
 | |
| Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand,
 | |
| w3m starts immediately with little load to the host machine.
 | |
| After looking at the information using w3m, I use other browser
 | |
| if I want to read the the page in detail. As for me, however,
 | |
| w3m is enough to read most of web pages.
 | |
| 
 | |
| <h2>The birth of w3m</h2>
 | |
| <P>
 | |
| w3m was derived from a pager named `fm'. Fm was written before
 | |
| 1991 (I don't remember the exact date) when WWW was not popular.
 | |
| At that time, the word `browser' meant a file browser like
 | |
| `more' or `less'.
 | |
| <P>
 | |
| I wrote fm to debug a program for my research. To trace the status
 | |
| of the program, it dumped megabytes of values of variables into a file,
 | |
| and I debugged it by checking the dumped file. The program dumped
 | |
| information at a certain time in one line, which made the dumped line
 | |
| several hundred characters long. When I looked the file using `more' or
 | |
| `less', one line was folded into several lines and it was very hard
 | |
| to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed
 | |
| one logical line as one physical line. When seeing the hidden
 | |
| part of a line, fm shifted entire screen. As I used 80x24 terminal at that
 | |
| time, fm was very useful for the debugging.
 | |
| <P>
 | |
| Several years later, I got to know WWW and began to use it.
 | |
| I used XMosaic and Chimera. I liked Chimera because it was light.
 | |
| As I was interested in the mechanism of WWW, I learned HTML and
 | |
| HTTP, and I felt it simpler than I expected. The earlier version
 | |
| of HTTP was very similar to Gopher protocol. HTML 2.0 was
 | |
| simple enough to render. All I have to do seemed to be line folding
 | |
| and itemized display. Then I made a little modification to fm
 | |
| and made a web browser. It was the first version of w3m.
 | |
| The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru',
 | |
| which means `see WWW'. It was an inheritance from `fm', which
 | |
| was an abbreviation of `File wo miru'. The first version of w3m
 | |
| was released at the beginning of 1995.
 | |
| 
 | |
| <h2>Death and rebirth of w3m</h2>
 | |
| <p>
 | |
| I had used w3m as a pager to read files, E-mails and online manuals. 
 | |
| It was a substitute of less. Sometimes I used w3m as a web browser,
 | |
| but there were many pages w3m couldn't display correctly, most of
 | |
| which used table for page layout. Once I tried to implement table
 | |
| renderer, but I gave up because it seemed to be too difficult for me.
 | |
| <P>
 | |
| It was 1998 when I tried to modify w3m again. There were two reasons.
 | |
| The first is that I had some time to do it. I stayed Boston University
 | |
| as a visiting researcher at that time. The second reason is that
 | |
| I wanted to use table in my personal web page.  I had written research
 | |
| log using HTML, and I wanted to write a table in it. At first I used 
 | |
| <pre>..</pre> to describe table, but it was not cool at all.
 | |
| One day I used <table> tag, which made me to use Netscape to
 | |
| read the research log. Then I decided to implement a table renderer
 | |
| into w3m.
 | |
| <P>
 | |
| I didn't intend to write a perfect table renderer because tables
 | |
| I used was not very complicated. However, incomplete table rendering
 | |
| made the display of table-layout pages horrible. I realized that
 | |
| it required almost-perfect table renderer 
 | |
| to do well both in `rendering (real) table' and `fine display of
 | |
| table-layout page.' It was a thorn path.
 | |
| <P>
 | |
| After taking several months, I finished `fair' table renderer.
 | |
| Then I implemented form into w3m. Finally, w3m was reborn as a
 | |
| practical web browser.
 | |
| 
 | |
| <h2>Table rendering algorithm in w3m</h2>
 | |
| 
 | |
| HTML table rendering is difficult. Tabular environment
 | |
| of LaTeX is not very difficult, which makes the width of a column
 | |
| either a specified value or the maximum width to put items into it.
 | |
| On the other hand, HTML table renderer has to decide
 | |
| the width of a column so that the entire table can fit into the
 | |
| display appropriately, and fold the contents of the table according
 | |
| to the column width. Inappropriate column width decision makes
 | |
| the table ugly. Moreover, table can be nested, which makes the algorithm
 | |
| more complicated.
 | |
| 
 | |
| <OL>
 | |
| <LI>First, calculate the maximum and minimum width of each column.
 | |
| The maximum width is the width required to display the column
 | |
| without folding the contents. Generally, it is the length of 
 | |
| paragraph delimited by <BR> or <P>.
 | |
| The minimum width is the lower limit to display the contents.
 | |
| If the column contains the word `internationalization', the minimum
 | |
| width will be 20. If the column contains 
 | |
| <pre>..</pre>, the maximum width of the preformatted
 | |
| text will be the minimum width of the column.
 | |
| 
 | |
| <LI>If the width of the column is specified by WIDTH attribute,
 | |
| fix the column width using that value. If the specified width is
 | |
| smaller than the minimum width of the column, fix the column width
 | |
| to the minimum width.
 | |
| 
 | |
| <LI>Calculate the sum of the maximum width (or fixed width) of
 | |
| each column and check if the sum exceeds the screen width.
 | |
| If it is smaller than screen width, these values are used for
 | |
| width of each column.
 | |
| 
 | |
| <LI>If the sum is larger than the screen width, determine the widths
 | |
| of each column according to the following steps.
 | |
| <OL>
 | |
| <LI>Let W be the screen width subtracted by the sum of widths of 
 | |
| fixed-width columns.
 | |
| <LI>Distribute W into the columns whose width are not decided,
 | |
| in proportion to the logarithm of the maximum width of each column.
 | |
| <li>If the distributed width of a column is smaller than the minimum width,
 | |
| then fix the width of the column to the minimum width, and 
 | |
| do the distribution again.
 | |
| </OL>
 | |
| </OL>
 | |
| 
 | |
| In this process, distributed width is proportion to logarithm of
 | |
| maximum width, but I am not sure that this heuristic is the best.
 | |
| It can be, for example, square root of the maximum width.
 | |
| <P>
 | |
| The algorithm above assumes that the screen width is known.
 | |
| But it is not true for nested table. According the algorithm above,
 | |
| the column width of the outer table have to be known to render
 | |
| the inner table, while the total width of the inner table have to
 | |
| be known to determine the column width of the outer table.
 | |
| If WIDTH attribute exists there are no problems. Otherwise, w3m
 | |
| assumes that the inner table is 0.8 times as wide as the outer
 | |
| table. It works fine, but if there are two tables side by side in an outer
 | |
| table, the width of the outer table always exceeds the screen width.
 | |
| To render this kind of table correctly, one have to render the table once,
 | |
| check the width of outmost table, and then render the entire table again.
 | |
| Netscape might employ this kind of algorithm.
 | |
| 
 | |
| <h2>Libraries</h2>
 | |
| 
 | |
| w3m uses
 | |
| <a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a>
 | |
| library. This library was written by H. Boehm and A. Demers.
 | |
| I could distribute w3m without this library because one can
 | |
| get the library separately, but I decided to contain it in the
 | |
| w3m distribution for the convenience of an installer.
 | |
| <P>
 | |
| # Boehm GC library is no longer included into w3m packages
 | |
| # after w3m-0.4.2.
 | |
| <P>
 | |
| W3m doesn't use libwww.
 | |
| <P>
 | |
| Boehm GC is a garbage collector for C and C++. I began to use this
 | |
| library when I implemented table, and it was great. I couldn't
 | |
| implement table and form without this library. 
 | |
| <P>
 | |
| Older version than beta-990304 used 
 | |
| <a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a>
 | |
| because I felt tired of writing codes to handle FTP protocol.
 | |
| But I rewrote the FTP code by myself to make w3m completely free.
 | |
| It made w3m slightly smaller.
 | |
| <P>
 | |
| By the way, w3m doesn't use UNIX standard regexp library and curses library.
 | |
| It is because I want to use Japanese. When I wrote fm, there were
 | |
| no free regexp/curses libraries that can treat Japanese. Now both libraries
 | |
| are available and they looks faster than w3m code.
 | |
| 
 | |
| <h2>Future work</h2>
 | |
| 
 | |
| ...Nothing. As w3m's virtues are its small size and rendering speed,
 | |
| adding more features might lose these advantages. On the other hand,
 | |
| w3m is still known to have many bugs, and I will continue fixing them.
 | |
| 
 | |
| </body>
 | |
| </html>
 |