Enjoying a Free Lunch: Computational Economics with Linux

Dirk Eddelbüttel, edd@rosebud.sps.queensu.ca

Draft, June 1997


This is an expanded version of the talk given at the Third Conference Computing in Economics and Finance in Stanford, CA, on July 1, 1997. It describes the Linux operating system, both in general terms and more specifically how it can be a most valuable computing environment for economists and econometricians working in computational economics and finance.

1. Overview

Linux is an Unix operating system. It has been written entirely from scratch since 1991 by Linus Torvalds and many others in a decentralised effort which uses the Internet as the primary means of communication and dissemination. Linux contains no proprietary code and is, inclusive of all source code, distributed freely under the terms of the GNU General Public License (or GPL for short). It should be noted that Linux is not the only free Unix system; the three BSD variants FreeBSD, NetBSD and OpenBSD are also free but have not gained similar momentum as Linux.

Linux implements the POSIX operating system specification, along with extensions from the System V and BSD variants of Unix. This means it looks, feels and acts like Unix, but does not come from the same source code base --- and, therefore, is not restricted by the source licenses of the commercial Unices. Linux has all of the features one would expect in a modern Unix system, including multitasking, virtual memory, shared libraries, demand loading, efficient memory management and complete TCP/IP networking.

Further, Linux runs on a variety of hardware platforms. It was first developed on IBM PC compatibles with a 386 cpu. PCs are still the most common hardware platform. Linux also runs on Motorola m68k platforms as the Amiga and Atari computers (provided they have a MMU unit along with m68k cpu, ie 68030/68040/68060). Linux has also been ported to Unix workstation platforms as the Digital Alpha and Sun Sparc stations. Support for PowerPC, MIPS and ARM processors as well as Macs using the m68k is currently under development.

Linux is both powerful, fast (as shown by benchmarks comparing Linux on a Sparc to SunOS and Solaris on the exact same hardware) and extremely reliable. It can be employed for almost any computational purpose. As a true Unix system, Linux provides excellent multitasking and multiuser support. Moreover, the availability of the X11 windowing system, (La)TeX, all the networking standards and protocols around which the Internet is built as well as the availability of a great number of powerful applications make Linux an excellent choice for both network server and client machines.

2. History

Linux is part of the evolving history of Unix systems. In a very brief manner, the history of unix is outlined below where it is shown that Unix itself is strongly linked to free soure code and licensing decisions.

1969

Ken Thompson wrote the first version of what was to be Unix on a PDP-7.

1970

Thompson and Dennis Ritchie ported this to the PDP-11 whereby Ritchie wrote the first C compiler.

1973

Ritchie and Thompson rewrote the kernel in C.

1974-1977

ATT gave the Unix source code freely to US universities.

1978

ATT released Version 7 and started to charge license fees.

1979

In respone to the ATT license fee, UC Berkely created BSD. Most of the DARPA sponsored work involving the construction and expansion of Internet was carried out on BSD, and commercial Unices as SunOS and Ultrix (DEC) are largely based on BSD.

1983

ATT released System V. BSD released 4.2.

1987

ATT released System V, release 3, which is at the basis of HP-UX and AIX. BSD released 4.3.

1990

ATT released System V, release 4, as a new unifying Unix resulting from a Sun and ATT cooperation. DEC, HP, IBM create the Open Software Foundation.

1991

Work on the Unix clones Linux and FreeBSD began.

The starting point for Linux was Minix, a small Unix clone developed for teaching purposes by Andrew Tanenbaum. Minix is a complete rewrite of a core Unix system. While it is copyrighted by the publisher, it can still be used freely for education and research. This, and the low hardware requirements which allow it to run on Intel 80286 (or better) cpus, made Minix a very popular choice for universities since its introduction in the late 1980s. Linus Torvalds started to play with some task-switching features of the 386 cpu in protected-mode --- and recognised that he had started to write the core parts of a kernel. The development of Linux can be traced back as follows (see also an account by Linus and an interview with Linus about the early stages).

  • Linux 0.01, released in August 1991, contained a disk-driver and a small filesystem but could only be compiled under Minix and did not really do anything useful. However, it should be noted that this was, respectively, seven months after Linus obtained his first 386, and four months after he received the Minix floppies.
  • Linux 0.02, the first official version, was released on October 5, 1991 and already ran the GNU tools bash, gcc, make and sed (which others had ported).
  • Linux 0.10 was released in November 1991.
  • Linux 0.95 was released in March 1992 and the increased version number signals that a stable 1.0 version would arrive `soon', but it took a lot longer, and many 0.99 patchlevels. However, the very stable 0.99 kernels began to build a growing reputation for Linux.
  • Linux 1.0, released on March 14, 1994, formed the first official stable version. Its final release was 1.0.9 from April 17, 1994.
  • Linux 1.1 was developed from April 6, 1994 until March 4, 1995.
  • Linux 1.2, released March 7, 1995, included disk access speedups, TTY improvements, virtual memory enhancements, multiple platform support, quotas, and more. Its final version was 1.2.13 dated August 2, 1995.
  • Linux 1.3 was developed from Jun 12, 1995 to Jun 9, 1996.
  • Linux 2.0, released June 9, 1996, is the current stable version and has even more enhancements, including many performance improvements, several new networking protocols and one of the fastest TCP/IP implementations in the world. It support symmetric multiprocessing and multiple hardware architectures. Even higher performance, more networking protocols, and more device drivers will be available in Linux 2.2.
  • The development of Linux 2.1 has been ongoing since September 30, 1996.

The mere size of the kernel releases speaks volumes: Linux 1.0 was shipped as a tar.gz file of 1 259 161 bytes. By 1.2, this had doubled to 2 301 256 bytes. For the truly multiplatform Linux 2.0, the size had grown further to 5 843 677 bytes. The current `bleeding edge' version 2.1.43 comprises as staggering 8 128 328 bytes.

3. Features

A discussion of operating system features is ncessarily of a technical nature. The following points summarise some of the main strengths of Linux.

  • Multitasking: several programs can run at once.
  • Multiuser: several users can work on the same machine at once.
  • Multiplatform: runs on many different cpu types.
  • Multiprocessor: SMP is available for Intel and SPARC cpus.
  • Memory protection: no process can influence other running processes by out-of-segment writes.
  • Demand loading: only those parts of an executable that are actually used are loaded.
  • Shared copy-on-write pages among executables: multiple processes can share the same memory; if one process tries to write to a jointly used segment, that page is copied to another location.
  • Virtual memory is implemented using paging and swapping of whole processes; either a separate swap partition or a swap file, or both, can form swap space and more can be added during runtime.
  • Unified memory pool which user programs and disk cache share so that all free memory can be used for caching.
  • Dynamically linked shared libraries.
  • Core dumps for post-mortem analysis.
  • Mostly source-level compatible with POSIX, System V, and BSD
  • Mostly binary-level compatible with SCO, SVR3, and SVR4 through an iBCS2-compliant emulation module.
  • All source code is available, and freely distributable, including the whole kernel and all drivers, the development tools and many user programs; many commercial programs are being provided for Linux without source, but everything that has been free, including the entire base operating system, is still free and will remain free.
  • Multiple virtual consoles and several independent login sessions.
  • Very good filesystem support: many filesystems including minix, Xenix, and all the common system V filesystems are supported, and an advanced and extremely robust filesystem of its own offers filesystems of up to 4 TB; transparent access to MS-DOS partitions (or OS/2 FAT partitions); VFAT (WNT, Windows 95) support is available in Linux 2.0; UMSDOS special filesystem which allows Linux to be installed on a DOS filesystem; read-only HPFS-2 support for OS/2 2.1; HFS (Macintosh) file system support is available separately as a module.
  • CD-ROM filesystem which reads all standard formats of CD-ROMs.
  • Linux is very fast, and makes good use of hardware resources.

4. Applications

Concerning standard packages, economists will appreciate the availability of many commonly used commercial programs such as:

Furthermore, excellent free (or at least free for academic use) programs are available for mathematical applications as for example,

More general commercial applications such as databases, spreadsheets, word processors (including WordPerfect) and publishing programs (including Corel Draw), complete the selection of available software. This makes Linux an excellent workstation for economists and econometricians. Programs written for other operating systems can also be used, at least on PCs running Linux. The free iBCS emulator allows one to run binaries for other Intel-based unices derived from releases 3 and 4 of System V such as SCO or Coherent. It also allows Linux/Sparc to run native Sparc programs. Commercial emulators such as WABI and Executor allow one to run (16bit) MS-Windows and Macintosh software, respectively. A free emulator for MS-DOS allows one to run DOS applications, and a free Windows emulator is in the works.

For scientific text processing, TeX and LaTeX, the choice for many researchers, are of course available. Other text processing tools that are available are groff (for the typesetting macros used in manual pages and most of the original Unix documentation), ez (from the Andrews project at CMU), LyX (a LaTeX WYSIWYG sytem), Lout and SGML (used for this document via the linuxdoc variant).

A great many editors have been developped for and under Unix. They can be broadly classified into three camps. First, there are the emacs (or emacs-alike) editors such as GNU Emacs, XEmacs (formerly known as Lucid Emacs), MicroEmacs, jove, and jed. Second, there are vi editors such as nvi (port of the BSD vi), elvis (GNU vi), vim and vile. Finally, there are editors that fall into neither of the above camps as ae, axe, ee, fte, joe, pico, ez, epoch, sam, sex, wily and xcoral.

A comprehensive list of commercial applications is provided at http://www.linux.org.uk/LxCommercial.html and in the Commercial HOWTO. Scientific applications for Linux, both free and commercial are listed in the excellent and fully searchable page at http://sal.kachinatech.com which features over 1150 entries. Releases of Linux software are commonly announced on the moderated newsgroup comp.os.linux.announce (which also has an archive) and are entered into the searchable Linux Software Map.

5. Programming

The applications listed in the previous section show that Linux can be used for a wide variety of tasks. Linux, as a Unix system, is even more suited for programming and development work. There are free compilers or interpreters for well-known programming languages such as

  • Ada (gnat)
  • APL (j1)
  • Asseambler (as, nasm)
  • Basic (ybasic)
  • C (gcc)
  • C++ (g++)
  • Fortran (f2c, g77, ratfor77)
  • Forth (gforth, yforth)
  • Java (guavac, jdk, kaffe)
  • Lisp (gcl)
  • Logo (ucblogo)
  • Pascal (gpc, p2c)
  • Prolog (swi-prolog)
  • Scheme (bigloo, elk, rscheme, scm, slib).

Commercial compilers are also available; most noteworthy is the NAG Fortran 90 compiler, a variant of which can be used for distributed work on Linux workstation clusters.

Furthermore, as many current computer science research projects are (at least partially) implemented using Linux, newer languages are available such as

  • guile
  • icon
  • ilu
  • intercal
  • ocaml
  • mercury.

Naturally, all of the classic Unix development tools are available:

  • awk (gawk, nawk)
  • lex (flex)
  • perl
  • python
  • sed
  • tcl/tk.
  • yacc (bison).

Moreover, excellent editors (see the previous section), debuggers and development tools such as

  • cvs
  • checker
  • ddd
  • electric fence
  • gdb
  • gprof
  • make
  • rcs.

make for a superb programming and development environment. Unix was developped by programmers for programmers, and it remains the environment of choice for most research-oriented programmers.

6. Networking

It can be argued that the real strength of Unix is in networking. As we already pointed out above, the Internet itself is built using Unix machines. This means that each and every protocol used on the Internet is implemented for Unix servers and clients.

Linux fits right in there. To the best of my knowledge, each and every protocol of TCP/IP networking has been implemented for Linux and can be served by a Linux machine for a network:

  • automounter (amd,automount)
  • domain name servers
  • firewall
  • file transfer (ftp)
  • internet relay chat
  • news
  • mail
  • network information service (nis)
  • network file systems (nfs)
  • network printer
  • network time (xntp)
  • proxy server
  • routers
  • telnet
  • world wide web.

Moreover, the networking code is extremely reliable. Rumour has it that a Linux box running a 0.99 release is still running with an uptime of well over 600 days. And since the 1.3 kernels, Linux has some of the fastest networking implementations (as shown by benchmarks comparing Linux on a Sparc to SunOS and Solaris on the exact same hardware) known.

Another strong point of Linux is that it integrates very well into heterogeneous environments. Standard networking protocols included in the stable kernels include TCP, IPv4, IPX, DDP, and AX.25. Whatever operating system the other machines in your lan have, Linux will probably be able to run either a server or a client to mount filesystem or share files and printers. Examples include

  • Appletalk server
  • Lan Manager (SMB) client and server (samba)
  • Netware client and server.

Both usenet news and mail are supported by several program:

  • News transport via cnews, leafnode, inews, innd, suck
  • News readers such as gnus, knews, trn, nn, slrn, tin
  • Mail transport agents such as smail, sendmail, qmail and exim
  • Mail readers as af, elm, exmh, mh, mutt, pine, vm, xmailtool, xmh,
  • Mail utilities such as berolist, mailagent, majordomo, mpack, metamail, procmail, smartlist
  • Mail pop server and clients as cucipop, imapd, qpopper, fetchmail.

Finally, another strong point of Linux is telecommunication. Linux provides Taylor uucp, slip, cslip, ppp, isdn, kermit and szrz. Also, several terminal emulators as such as minicom and seyon, several fax and voice-mail packages and several bbs packages are available.

7. Documentation

As the ongoing developement of Linux itself is undertaken almost exclusively via the Internet, it is fitting that a large number of resources can be found on the Net, and the World Wide Web, in particular. There are hundreds of sites that regroup information for Linux, including many that are aimed specifically at academic and scientific purposes. Several dozen mailing lists are available, both for beginners and developers, as well as about ten Usenet newsgroups that are exclusively devoted to Linux.

A canonical starting point is the Linux Documentation Project as it lists hundreds of other resources. The Linux Documentation Project also coordinates several books such as Installation and Getting Started, the Linux Kernel Hackers' Guide, the Linux Network Administrators' Guide, the Linux Programmers' Guide, and the Linux System Administrators' Guide. As these books are released under the GPL, they can be downloaded freely along with their sources. Installation and Getting Started has been published by O'Reilly in a slightly augmented version under the title Running Linux and can be recommended highly. The Linux Network Administrators' Guide is also available from O'Reilly.

Moreover, the Linux Documentation Project coordinates a series of HOWTOs and mini-HOWTOs. By now, over 70 HOWTOs and over 90 mini-HOWTOs have been written. They each address a particular topic, and while they greatly vary in sophistication and detail, they all can usually be relied upon as a succint and reliable source of first-hand information. Finally, the Linux Documentation Project coordinated the Linux manual pages, a small series of FAQs and the Linux Gazette, an on-line periodical.

There are two standard archives. The first, sunsite.unc.edu, hosts all major distributions, applications and the complete kernel sources. The second, tsx-11.mit.edu, is more devoted to current development projects. Both machines have many mirrors sites throughout the world.

Usenet carries ten different newsgroups in the comp.os.linux.* hierarchy which are all fairly high-traffic:

Searching these newsgroups with Altavista or DejaNews is often an efficient way to find answers to specific questions.

On paper, and other than the aforementioned Running Linux book by Welsh and Kaufman, a total of over 50 books have now been published and are available in local bookstores. Also, the monthly Linux Journal magazine provides good and detailed information on different aspects of Linux.

As for installations of Linux, several Linux distributions are available which are all described in detail in a special Distribution-HOWTO. The older Slackware distribution used to be widespread, but currently the two dominant Linux distributions are Red Hat and Debian. While Red Hat is a commercial enterprise, it still has to oblige the licensing terms of the GNU GPL and allow for free downloads of its product from ftp sites. Debian is the only non-profit organization among the Linux distributions. It is produced by 200 volunteers with a focus on both technical excellence and stability. For those two reasons, it is, for example, used in the Space Shuttle mission STS-94 that leaves into orbit on the day of this presentation.

8. Summary

Linux is a very powerful operating system. It can benefit computational economists for a wide variety of tasks as it

  • can run all applications that are common in research
  • can be used to code in almost any programming language
  • can handle every networking job as client or server
  • can interoperate with almost every other operating system
  • is very efficient in its use of resources, fast and reliable
  • is a lot of fun to work with and learn about.

9. Acknowledgements

This paper, as well as my personal and professional use of Linux over the last three years, has profited from the execellent information provided by both the Linux Documentation Project, and the Linux HOWTOs in particular. Thanks go also to the Linux community at large, and specifically the other members of the Debian Project. It should be noted that the information that is provided here about the Debian Project is not impartial as I have been contributing to Debian as a package maintainer since July 1995.