Initial revision
This commit is contained in:
		
							
								
								
									
										209
									
								
								doc/STORY.html
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										209
									
								
								doc/STORY.html
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,209 @@ | ||||
| <html> | ||||
| <head> | ||||
| <title>History of w3m</title> | ||||
| </head> | ||||
| <body> | ||||
| <h1>History of w3m</h1> | ||||
| <div align=right> | ||||
| 1999/2/18<br> | ||||
| 1999/3/8 revised<br> | ||||
| 1999/6/11 translated into English<br> | ||||
| Akinori Ito<br> | ||||
| aito@ei5sun.yz.yamagata-u.ac.jp | ||||
| </div> | ||||
| <h2>Introduction</h2> | ||||
| W3m is a text-based pager and WWW browser. | ||||
| It is similar application to the famous text-based | ||||
| browser <a href="http://www.lynx.browser.org/">Lynx</a>. | ||||
| However, w3m has several advantages against Lynx. For example, | ||||
| <UL> | ||||
| <LI>W3m can render tables. | ||||
| <LI>W3m can render frame (by converting frame into table). | ||||
| <LI>As w3m is a pager, it can read document from standard input. | ||||
| (I heard Lynx also can display standard-input-given document, like this: | ||||
| <pre> | ||||
|    lynx /dev/fd/0 > file | ||||
| </pre> | ||||
| Hmm, it works on Linux. ) | ||||
| <LI>W3m is small. Its stripped binary for Sparc (compiled with | ||||
| gcc -O2, version beta-990217) is only 260kbyte, while binary size | ||||
| of Lynx is beyond 1.8Mbyte. (Actually, lynx it 800K on my i386 system, w3m is 200K + libgc.) | ||||
| </UL> | ||||
| It is true that Lynx is an excellent browser, who have many | ||||
| features w3m doesn't have. For example, | ||||
| <UL> | ||||
| <LI>Lynx can handle cookies. | ||||
| <LI>Lynx has many options. | ||||
| <LI>Lynx is multilingual. (W3m is Japanese-English bilingual) | ||||
| </UL> | ||||
| etc. It is also a great advantage that Lynx has a lot of | ||||
| documentation. | ||||
| <P> | ||||
| <b>I don't intend w3m to be a substitute of any other browsers, | ||||
| including Netscape and Lynx.</b> Why did I wrote w3m? | ||||
| Because I felt inconvenient with conventional browsers  | ||||
| to `take a look' at web pages. | ||||
| I am browsing web pages in LAN environment. When I want to take | ||||
| a glance at a web page, I don't want to wait to start up Netscape. | ||||
| Lynx also takes a few seconds to start up (you can get lynx startup time to almost zero when you rm /etc/mailcap). On the other hand, | ||||
| w3m starts immediately with little load to the host machine. | ||||
| After looking at the information using w3m, I use other browser | ||||
| if I want to read the the page in detail. As for me, however, | ||||
| w3m is enough to read most of web pages. | ||||
|  | ||||
| <h2>The birth of w3m</h2> | ||||
| <P> | ||||
| w3m was derived from a pager named `fm'. Fm was written before | ||||
| 1991 (I don't remember the exact date) when WWW was not popular. | ||||
| At that time, the word `browser' meant a file browser like | ||||
| `more' or `less'. | ||||
| <P> | ||||
| I wrote fm to debug a program for my research. To trace the status | ||||
| of the program, it dumped megabytes of values of variables into a file, | ||||
| and I debugged it by checking the dumped file. The program dumped | ||||
| information at a certain time in one line, which made the dumped line | ||||
| several hundred characters long. When I looked the file using `more' or | ||||
| `less', one line was folded into several lines and it was very hard | ||||
| to read it. Therefore, I wrote fm, which didn't fold a line. Fm displayed | ||||
| one logical line as one physical line. When seeing the hidden | ||||
| part of a line, fm shifted entire screen. As I used 80x24 terminal at that | ||||
| time, fm was very useful for the debugging. | ||||
| <P> | ||||
| Several years later, I got to know WWW and began to use it. | ||||
| I used XMosaic and Chimera. I liked Chimera because it was light. | ||||
| As I was interested in the mechanism of WWW, I learned HTML and | ||||
| HTTP, and I felt it simpler than I expected. The earlier version | ||||
| of HTTP was very similar to Gopher protocol. HTML 2.0 was | ||||
| simple enough to render. All I have to do seemed to be line folding | ||||
| and itemized display. Then I made a little modification to fm | ||||
| and made a web browser. It was the first version of w3m. | ||||
| The name `w3m' was an abbreviation of Japanese phrase `WWW wo miru', | ||||
| which means `see WWW'. It was an inheritance from `fm', which | ||||
| was an abbreviation of `File wo miru'. The first version of w3m | ||||
| was released at the beginning of 1995. | ||||
|  | ||||
| <h2>Death and rebirth of w3m</h2> | ||||
| <p> | ||||
| I had used w3m as a pager to read files, E-mails and online manuals.  | ||||
| It was a substitute of less. Sometimes I used w3m as a web browser, | ||||
| but there were many pages w3m couldn't display correctly, most of | ||||
| which used table for page layout. Once I tried to implement table | ||||
| renderer, but I gave up because it seemed to be too difficult for me. | ||||
| <P> | ||||
| It was 1998 when I tried to modify w3m again. There were two reasons. | ||||
| The first is that I had some time to do it. I stayed Boston University | ||||
| as a visiting researcher at that time. The second reason is that | ||||
| I wanted to use table in my personal web page.  I had written research | ||||
| log using HTML, and I wanted to write a table in it. At first I used  | ||||
| <pre>..</pre> to describe table, but it was not cool at all. | ||||
| One day I used <table> tag, which made me to use Netscape to | ||||
| read the research log. Then I decided to implement a table renderer | ||||
| into w3m. | ||||
| <P> | ||||
| I didn't intend to write a perfect table renderer because tables | ||||
| I used was not very complicated. However, imcomplete table rendering | ||||
| made the display of table-layout pages horrible. I realized that | ||||
| it required almost-perfect table renderer  | ||||
| to do well both in `rendering (real) table' and `fine display of | ||||
| table-layout page.' It was a thorn path. | ||||
| <P> | ||||
| After taking several months, I finished `fair' table renderer. | ||||
| Then I implemented form into w3m. Finally, w3m was reborn as a | ||||
| practical web browser. | ||||
|  | ||||
| <h2>Table rendering algorithm in w3m</h2> | ||||
|  | ||||
| HTML table rendering is difficult. Tabular environment | ||||
| of LaTeX is not very difficult, which makes the width of a column | ||||
| either a specified value or the maximum width to put items into it. | ||||
| On the other hand, HTML table renderer has to decide | ||||
| the width of a column so that the entire table can fit into the | ||||
| display appropriately, and fold the contents of the table according | ||||
| to the column width. Inappropriate column width decision makes | ||||
| the table ugly. Moreover, table can be nested, which makes the algorithm | ||||
| more complicated. | ||||
|  | ||||
| <OL> | ||||
| <LI>First, calculate the maximum and minimum width of each column. | ||||
| The maximum width is the width required to display the column | ||||
| without folding the contents. Generally, it is the length of  | ||||
| paragraph delimited by <BR> or <P>. | ||||
| The minimum width is the lower limit to display the contents. | ||||
| If the column contains the word `internationalization', the minimum | ||||
| width will be 20. If the column contains  | ||||
| <pre>..</pre>, the maximum width of the preformatted | ||||
| text will be the minimum width of the column. | ||||
|  | ||||
| <LI>If the width of the column is specified by WIDTH attribute, | ||||
| fix the column width using that value. If the specified width is | ||||
| smaller than the minimum width of the column, fix the column width | ||||
| to the minimum width. | ||||
|  | ||||
| <LI>Calculate the sum of the maximum width (or fixed width) of | ||||
| each column and check if the sum exceeds the screen width. | ||||
| If it is smaller than screen width, these values are used for | ||||
| width of each column. | ||||
|  | ||||
| <LI>If the sum is larger than the screen width, determine the widths | ||||
| of each column according to the following steps. | ||||
| <OL> | ||||
| <LI>Let W be the screen width subtracted by the sum of widths of  | ||||
| fixed-width columns. | ||||
| <LI>Distribute W into the columns whose width are not decided, | ||||
| in proportion to the logarithm of the maximum width of each column. | ||||
| <li>If the distributed width of a column is smaller than the minimum width, | ||||
| then fix the width of the column to the minimum width, and  | ||||
| do the distribution again. | ||||
| </OL> | ||||
| </OL> | ||||
|  | ||||
| In this process, distributed width is proportion to logarithm of | ||||
| maximum width, but I am not sure that this heuristic is the best. | ||||
| It can be, for example, square root of the maximum width. | ||||
| <P> | ||||
| The algorithm above assumes that the screen width is known. | ||||
| But it is not true for nested table. According the algorithm above, | ||||
| the column width of the outer table have to be known to render | ||||
| the inner table, while the total width of the inner table have to | ||||
| be known to determine the column width of the outer table. | ||||
| If WIDTH attribute exists there are no problems. Otherwise, w3m | ||||
| assumes that the inner table is 0.8 times as wide as the outer | ||||
| table. It works fine, but if there are two tables side by side in an outer | ||||
| table, the width of the outer table always exceeds the screen width. | ||||
| To render this kind of table correctly, one have to render the table once, | ||||
| check the width of outmost table, and then render the entire table again. | ||||
| Netscape might employ this kind of algorithm. | ||||
|  | ||||
| <h2>Libraries</h2> | ||||
|  | ||||
| w3m uses | ||||
| <a href="http://reality.sgi.com/boehm/gc.html">Boehm GC</a> | ||||
| library. This library was written by H. Boehm and A. Demers. | ||||
| I could distribute w3m without this library because one can | ||||
| get the library separately, but I decided to contain it in the | ||||
| w3m distribution for the convenience of an installer. | ||||
| W3m doesn't use libwww. | ||||
| <P> | ||||
| Boehm GC is a garbage collector for C and C++. I began to use this | ||||
| library when I implemented table, and it was great. I couldn't | ||||
| implement table and form without this library.  | ||||
| <P> | ||||
| Older version than beta-990304 used  | ||||
| <a href="http://home.cern.ch/~orel/libftp/libftp/libftp.html">LIBFTP</a> | ||||
| because I felt tired of writing codes to handle FTP protocol. | ||||
| But I rewrote the FTP code by myself to make w3m completely free. | ||||
| It made w3m slightly smaller. | ||||
| <P> | ||||
| By the way, w3m doesn't use UNIX standard regexp library and curses library. | ||||
| It is because I want to use Japanese. When I wrote fm, there were | ||||
| no free regexp/curses libraries that can treat Japanese. Now both libraries | ||||
| are available and they looks faster than w3m code. | ||||
|  | ||||
| <h2>Future work</h2> | ||||
|  | ||||
| ...Nothing. As w3m's virtues are its small size and rendering speed, | ||||
| adding more features might lose these advantages. On the other hand, | ||||
| w3m is still known to have many bugs, and I will continue fixing them. | ||||
|  | ||||
| </body> | ||||
| </html> | ||||
		Reference in New Issue
	
	Block a user