|
|
Bio
picture papers talks
Code Linux Quantian About Blog
|
|
 |
|
Enjoying a Free Lunch: Computational Economics with Linux
Draft, June 1997
This is an expanded version of the talk given at the Third Conference
Computing in Economics and Finance in Stanford, CA, on July 1, 1997. It
describes the Linux operating system, both in general terms and more
specifically how it can be a most valuable computing environment for
economists and econometricians working in computational economics and
finance.
Linux is an Unix operating system. It has been written entirely from scratch
since 1991 by Linus Torvalds and many others in a decentralised effort which
uses the Internet as the primary means of communication and dissemination.
Linux contains no proprietary code and is, inclusive of all source code,
distributed freely under the terms of the GNU General Public License (or GPL
for short). It should be noted that Linux is not the only free Unix system;
the three BSD variants FreeBSD, NetBSD and OpenBSD are also free but have not
gained similar momentum as Linux.
Linux implements the POSIX operating system specification, along with
extensions from the System V and BSD variants of Unix. This means it looks,
feels and acts like Unix, but does not come from the same source code base
--- and, therefore, is not restricted by the source licenses of the commercial
Unices. Linux has all of the features one would expect in a modern Unix
system, including multitasking, virtual memory, shared libraries, demand
loading, efficient memory management and complete TCP/IP networking.
Further, Linux runs on a variety of hardware platforms. It was first
developed on IBM PC compatibles with a 386 cpu. PCs are still the most common
hardware platform. Linux also runs on Motorola m68k platforms as the Amiga
and Atari computers (provided they have a MMU unit along with m68k cpu, ie
68030/68040/68060). Linux has also been ported to Unix workstation platforms
as the Digital Alpha and Sun Sparc stations. Support for PowerPC, MIPS and
ARM processors as well as Macs using the m68k is currently under development.
Linux is both powerful, fast
(as shown by benchmarks comparing Linux on a Sparc to SunOS and Solaris on the exact same hardware) and extremely reliable. It can be
employed for almost any computational purpose. As a true Unix system, Linux
provides excellent multitasking and multiuser support. Moreover, the
availability of the X11 windowing system, (La)TeX, all the networking
standards and protocols around which the Internet is built as well as the
availability of a great number of powerful applications make Linux an
excellent choice for both network server and client machines.
Linux is part of the evolving history of Unix systems. In a very brief
manner, the
history of unix is outlined below where it is shown that Unix itself
is strongly linked to free soure code and licensing decisions.
- 1969
Ken Thompson wrote the first version of what was to be Unix on a
PDP-7.
- 1970
Thompson and Dennis Ritchie ported this to the PDP-11 whereby
Ritchie wrote the first C compiler.
- 1973
Ritchie and Thompson rewrote the kernel in C.
- 1974-1977
ATT gave the Unix source code freely to US universities.
- 1978
ATT released Version 7 and started to charge license fees.
- 1979
In respone to the ATT license fee, UC Berkely created BSD. Most of
the DARPA sponsored work involving the construction and expansion of Internet
was carried out on BSD, and commercial Unices as SunOS and Ultrix (DEC) are
largely based on BSD.
- 1983
ATT released System V. BSD released 4.2.
- 1987
ATT released System V, release 3, which is at the basis of HP-UX
and AIX. BSD released 4.3.
- 1990
ATT released System V, release 4, as a new unifying Unix
resulting from a Sun and ATT cooperation. DEC, HP, IBM create the
Open Software Foundation.
- 1991
Work on the Unix clones Linux and FreeBSD began.
The starting point for Linux was
Minix, a small Unix clone
developed for teaching purposes by Andrew Tanenbaum. Minix is a complete
rewrite of a core Unix system. While it is copyrighted by the publisher, it
can still be used freely for education and research. This, and the low
hardware requirements which allow it to run on Intel 80286 (or better) cpus,
made Minix a very popular choice for universities since its introduction in
the late 1980s. Linus Torvalds started to play with some task-switching
features of the 386 cpu in protected-mode --- and recognised that he had
started to write the core parts of a kernel. The development of Linux can be
traced back as follows (see also
an account by Linus and
an interview with Linus about the early stages).
- Linux 0.01, released in August 1991, contained a disk-driver and a
small filesystem but could only be compiled under Minix and did not really do
anything useful. However, it should be noted that this was, respectively,
seven months after Linus obtained his first 386, and four months after he
received the Minix floppies.
- Linux 0.02, the first official version,
was released on October 5, 1991 and already ran the GNU tools bash, gcc, make
and sed (which others had ported).
- Linux 0.10 was released in
November 1991.
- Linux 0.95 was released in March 1992 and the
increased version number
signals that a stable 1.0 version would arrive `soon', but
it took a lot longer, and many 0.99 patchlevels. However, the very
stable 0.99 kernels began to build a growing reputation for Linux.
- Linux 1.0, released on March 14, 1994, formed the first official
stable version. Its final release was 1.0.9 from April 17, 1994.
- Linux 1.1 was developed from April 6, 1994 until March 4, 1995.
- Linux 1.2, released March 7, 1995, included disk access speedups, TTY
improvements, virtual memory enhancements, multiple platform support,
quotas, and more. Its final version was 1.2.13 dated August 2, 1995.
- Linux 1.3 was developed from Jun 12, 1995 to Jun 9, 1996.
- Linux 2.0, released June 9, 1996, is the current stable version and
has even
more enhancements, including many performance improvements, several
new networking protocols and one of the fastest TCP/IP implementations
in the world. It support symmetric multiprocessing and multiple
hardware architectures. Even higher performance, more
networking protocols, and more device drivers will be available in
Linux 2.2.
- The development of Linux 2.1 has been ongoing since September 30, 1996.
The mere size of the kernel releases speaks volumes: Linux 1.0 was shipped as
a tar.gz file of 1 259 161 bytes. By 1.2, this had doubled to 2 301 256
bytes. For the truly multiplatform Linux 2.0, the size had grown further to
5 843 677 bytes. The current `bleeding edge' version 2.1.43 comprises as
staggering 8 128 328 bytes.
A discussion of operating system features is ncessarily of a technical
nature. The following points summarise some of the main strengths of Linux.
- Multitasking: several programs can run at once.
- Multiuser: several users can work on the same machine at once.
- Multiplatform: runs on many different cpu types.
- Multiprocessor: SMP is available for Intel and SPARC cpus.
- Memory protection: no process can influence other running processes by
out-of-segment writes.
- Demand loading: only those parts of an executable that are actually
used are loaded.
- Shared copy-on-write pages among executables:
multiple processes can share the same memory; if one process tries
to write to a jointly used segment, that page is copied to another
location.
- Virtual memory is implemented using paging and swapping of whole
processes; either a separate swap partition or a swap file, or both, can
form swap space and more can be added during runtime.
- Unified memory pool which user programs and disk cache share
so that all free memory can be used for caching.
- Dynamically linked shared libraries.
- Core dumps for post-mortem analysis.
- Mostly source-level compatible with POSIX, System V, and BSD
- Mostly binary-level compatible with SCO, SVR3, and SVR4 through an
iBCS2-compliant emulation module.
- All source code is available, and freely distributable, including the
whole kernel and all drivers, the development tools and many user
programs; many commercial programs are being provided for Linux
without source, but everything that has been free, including the
entire base operating system, is still free and will remain free.
- Multiple virtual consoles and several independent login sessions.
- Very good filesystem support: many filesystems including minix, Xenix,
and all the common system V filesystems are supported, and an
advanced and extremely robust filesystem of its own offers
filesystems of up to 4 TB; transparent access to MS-DOS
partitions (or OS/2 FAT partitions); VFAT (WNT, Windows 95) support
is available in Linux 2.0; UMSDOS special filesystem which allows
Linux to be installed on a DOS filesystem; read-only HPFS-2 support
for OS/2 2.1; HFS (Macintosh) file system support is available
separately as a module.
- CD-ROM filesystem which reads all standard formats of CD-ROMs.
- Linux is very fast, and makes good use of hardware resources.
Concerning standard packages, economists will appreciate the availability of
many commonly used commercial programs such as:
Furthermore, excellent free (or at least free for academic use) programs are
available for mathematical applications as for example,
More general commercial applications such as databases, spreadsheets, word
processors (including WordPerfect) and publishing programs (including Corel
Draw), complete the selection of available software. This makes Linux an
excellent workstation for economists and econometricians. Programs written
for other operating systems can also be used, at least on PCs running
Linux. The free iBCS emulator allows one to run binaries for other
Intel-based unices derived from releases 3 and 4 of System V such as SCO or
Coherent. It also allows Linux/Sparc to run native Sparc programs.
Commercial emulators such as
WABI and
Executor allow
one to run (16bit) MS-Windows and Macintosh software, respectively. A free
emulator for MS-DOS allows one to run DOS applications, and a free Windows
emulator is in the works.
For scientific text processing, TeX and LaTeX, the choice for many
researchers, are of course available. Other text processing tools that are
available are groff (for the typesetting macros used in manual pages and most
of the original Unix documentation), ez (from the Andrews project at CMU),
LyX (a LaTeX WYSIWYG sytem), Lout and SGML (used for this document via the
linuxdoc variant).
A great many editors have been developped for and under Unix. They can be
broadly classified into three camps. First, there are the emacs (or
emacs-alike) editors such as GNU Emacs, XEmacs (formerly known as Lucid
Emacs), MicroEmacs, jove, and jed. Second, there are vi editors such as nvi
(port of the BSD vi), elvis (GNU vi), vim and vile. Finally, there are
editors that fall into neither of the above camps as ae, axe, ee, fte, joe, pico, ez,
epoch, sam, sex, wily and xcoral.
A comprehensive list of commercial applications is provided at
http://www.linux.org.uk/LxCommercial.html and in the Commercial
HOWTO. Scientific applications for Linux, both free and commercial are listed
in the excellent and fully searchable page at
http://sal.kachinatech.com which
features over 1150 entries. Releases of Linux software are commonly announced
on the moderated newsgroup
comp.os.linux.announce (which also has an
archive)
and are entered into the searchable
Linux Software Map.
The applications listed in the previous section show that Linux can be used
for a wide variety of tasks. Linux, as a Unix system, is even more suited
for programming and development work. There are free compilers or
interpreters for well-known programming languages such as
- Ada (gnat)
- APL (j1)
- Asseambler (as, nasm)
- Basic (ybasic)
- C (gcc)
- C++ (g++)
- Fortran (f2c, g77, ratfor77)
- Forth (gforth, yforth)
- Java (guavac, jdk, kaffe)
- Lisp (gcl)
- Logo (ucblogo)
- Pascal (gpc, p2c)
- Prolog (swi-prolog)
- Scheme (bigloo, elk, rscheme, scm, slib).
Commercial compilers are also available; most noteworthy is the
NAG Fortran 90 compiler, a variant of which
can be used for distributed work on Linux workstation clusters.
Furthermore, as many current computer science research projects are (at least
partially) implemented using Linux, newer languages are available such as
- guile
- icon
- ilu
- intercal
- ocaml
- mercury.
Naturally, all of the classic Unix development tools are available:
- awk (gawk, nawk)
- lex (flex)
- perl
- python
- sed
- tcl/tk.
- yacc (bison).
Moreover, excellent editors (see the previous section), debuggers
and development tools such as
- cvs
- checker
- ddd
- electric fence
- gdb
- gprof
- make
- rcs.
make for a superb programming and development environment. Unix was
developped by programmers for programmers, and it remains the environment of
choice for most research-oriented programmers.
It can be argued that the real strength of Unix is in networking. As we
already pointed out above, the Internet itself is built using Unix
machines. This means that each and every protocol used on the Internet is
implemented for Unix servers and clients.
Linux fits right in there. To the best of my knowledge, each and every
protocol of TCP/IP networking has been implemented for Linux and can be
served by a Linux machine for a network:
- automounter (amd,automount)
- domain name servers
- firewall
- file transfer (ftp)
- internet relay chat
- news
- mail
- network information service (nis)
- network file systems (nfs)
- network printer
- network time (xntp)
- proxy server
- routers
- telnet
- world wide web.
Moreover, the networking code is extremely reliable. Rumour has it that a
Linux box running a 0.99 release is still running with an uptime of well
over 600 days. And since the 1.3 kernels, Linux has some of the fastest
networking implementations
(as shown by benchmarks comparing Linux on a Sparc to SunOS and Solaris on the exact same hardware) known.
Another strong point of Linux is that it integrates very well into
heterogeneous environments. Standard networking protocols included in the
stable kernels include TCP, IPv4, IPX, DDP, and AX.25. Whatever operating
system the other machines in your lan have, Linux will probably be able to
run either a server or a client to mount filesystem or share files and
printers. Examples include
- Appletalk server
- Lan Manager (SMB) client and server (samba)
- Netware client and server.
Both usenet news and mail are supported by several program:
- News transport via cnews, leafnode, inews, innd, suck
- News readers such as gnus, knews, trn, nn, slrn, tin
- Mail transport agents such as smail, sendmail, qmail and exim
- Mail readers as af, elm, exmh, mh, mutt, pine, vm, xmailtool, xmh,
- Mail utilities such as berolist, mailagent, majordomo, mpack,
metamail, procmail, smartlist
- Mail pop server and clients as cucipop, imapd, qpopper, fetchmail.
Finally, another strong point of Linux is telecommunication. Linux provides
Taylor uucp, slip, cslip, ppp, isdn, kermit and szrz. Also, several terminal
emulators as such as minicom and seyon, several fax and voice-mail packages
and several bbs packages are available.
As the ongoing developement of Linux itself is undertaken almost exclusively
via the Internet, it is fitting that a large number of resources can be found
on the Net, and the World Wide Web, in particular. There are hundreds of
sites that regroup information for Linux, including many that are aimed
specifically at academic and scientific purposes. Several dozen mailing lists
are available, both for beginners and developers, as well as about ten Usenet
newsgroups that are exclusively devoted to Linux.
A canonical starting point is the
Linux Documentation Project as it lists hundreds of other resources. The Linux Documentation
Project also coordinates several books such as
Installation and Getting Started, the
Linux Kernel Hackers' Guide, the
Linux Network Administrators' Guide, the
Linux Programmers' Guide, and the
Linux System Administrators' Guide. As these books are released under the GPL, they can
be downloaded freely along with their sources.
Installation and Getting Started has been published by O'Reilly in a slightly augmented
version under the title Running Linux and can be recommended
highly. The
Linux Network Administrators' Guide is also available from O'Reilly.
Moreover, the Linux Documentation Project coordinates a series of
HOWTOs and mini-HOWTOs. By now, over 70 HOWTOs and over 90 mini-HOWTOs have been
written. They each address a particular topic, and while they greatly vary in
sophistication and detail, they all can usually be relied upon as a succint
and reliable source of first-hand information. Finally, the Linux
Documentation Project coordinated the Linux manual pages, a small series of
FAQs and the
Linux Gazette, an on-line periodical.
There are two standard archives. The first,
sunsite.unc.edu,
hosts all major distributions, applications and the complete kernel
sources. The second,
tsx-11.mit.edu, is more devoted to current development projects. Both
machines have many mirrors sites throughout the world.
Usenet carries ten different newsgroups in the
comp.os.linux.* hierarchy which are all fairly
high-traffic:
Searching these newsgroups with
Altavista or
DejaNews is often an efficient way to
find answers to specific questions.
On paper, and other than the aforementioned Running Linux book by
Welsh and Kaufman, a total of over 50 books have now been published and are
available in local bookstores. Also, the monthly
Linux Journal magazine
provides good and detailed information on different aspects of Linux.
As for installations of Linux, several Linux distributions are available
which are all described in detail in a special Distribution-HOWTO. The older
Slackware distribution used to be widespread, but currently the two dominant
Linux distributions are
Red Hat
and
Debian. While Red Hat is a
commercial enterprise, it still has to oblige the licensing terms of the GNU
GPL and allow for free downloads of its product from ftp sites. Debian is
the only non-profit organization among the Linux distributions. It is
produced by 200 volunteers with a focus on both technical excellence and
stability. For those two reasons, it is, for example, used in the Space
Shuttle mission STS-94 that leaves into orbit on the day of this
presentation.
Linux is a very powerful operating system. It can benefit computational
economists for a wide variety of tasks as it
- can run all applications that are common in research
- can be used to code in almost any programming language
- can handle every networking job as client or server
- can interoperate with almost every other operating system
- is very efficient in its use of resources, fast and reliable
- is a lot of fun to work with and learn about.
This paper, as well as my personal and professional use of Linux over the
last three years, has profited from the execellent information provided by
both the Linux Documentation Project, and the Linux HOWTOs in
particular. Thanks go also to the Linux community at large, and specifically
the other members of the Debian Project. It should be noted that the
information that is provided here about the Debian Project is not impartial
as I have been contributing to Debian as a package maintainer since July
1995.
| |