SE Developer's Guide

Introduction

This document describes se from the point of view of the programmer who has the source code and wishes to make changes. The changes could be anything from minor cosmetic alterations to major porting onto new machines.

Familiarity with the C language is essential; familiarity with se from the user's point of view is also very important. Knowledge of C compilers, operating systems and programming tools is assumed.

The Distribution Disks

Se is distributed in many forms on many media. One of the most popular forms is on IBM PC disks, either in plain text files or in an ARC file. Atari ST computers can read the 3.5" IBM disks.

The VAX version is usually distributed on a TK50 tape cartridge. The Unix versions are often distributed on DC600 tape cartridges or via E-mail.

You Will Need

In order to recompile se you will need one of the following:

If you're trying to build any version of se apart from the PC and Unix versions, you'll probably need a great deal of patience and perseverance as well. The code worked on all the above hosts once, really it did!

In order to rebuild the se documentation, you will need, in addition, all of the following:
A copy of the PD M4 program.
A copy of fmt, the Software Tools Text Formatter.
A copy of pg, the paginator.
An fmt post-processor suitable for your printer. fx for the Epson FX, lj for the HP LaserJet Series II or ps for PostScript.
If you have the LaTeX system, it is possible to post-process the fmt documents into a form that LaTeX will accept. The resulting document will look nicer than the fmt version, but then not everyone can run LaTeX. To do this you will also need a copy of the stream editor, sed.

It's also possible to post-process the fmt documents into HTML, which nowadays nearly everybody can use. The resulting documents look OK in any browser and can be viewed at the same time as running se in another window. At present, however, there are no hyperlinks within the documents. This feature is high on the list of things to add.

Of course, you will also need an editor, such as se itself!

SE Design Philosophy

Se was not designed to be an easy-to-use word processor for idiots. Its command language is very similar to the Unix ed line editor, which is famed for its terseness. It has been said that an ed command line resembles line noise; this of course assumes that the reader can remember using dial-up teletypes over 300 baud modems...

Unix users will know the vi screen editor. Although vi has a command line, it starts up in full-screen mode. Unfortunately, there is no way of knowing from looking at the screen whether one is in “append” mode or “command” mode. There is no status line and line numbers have to be invoked with an option.

Some favour the emacs approach: no command line, everything done with control keys and function keys. One of the great advantages of emacs is its ability to redefine almost any command key. However, emacs cannot do any of the tricks se can do with “global on pattern” or “global on markname” commands. This can result in laborious editing sessions doing repetitive changes throughout a file.

Se attempts to be friendlier than vi while retaining the power of a command-line driven editor. The screen layout includes a status line, so that the user can see what state se is in at a glance. Line numbers are displayed for reference against, say, a compiler error message. On machines with colour displays, se supports a fully configurable colour option. On machines with a mouse, it supports the mouse. But if you really want to do a substitute command on all lines that begin with a dot and don't contain the letter “A”, you can.

Better still, se is free and comes with full source code. It is portable between widely differing machines, so once you've learned to use it, you can take it with you to another system. It has full documentation, so users can't complain (but probably will).

OK, now for the bad news. On micros, se is best used with a hard disk — although it will work on floppies if you're desperate. It needs at least half a meg of memory to run in (the more the better). It takes time to learn how to use se.

If a choice has to be made between “easy to use” and “powerful”, se usually goes for “powerful”.

Structure of the Code

Se is as modular as possible, both for good structure and for ease of compilation. The following sections describe each of the source files that make up se.

main.c

This file contains all of the global data that se uses to store information about its option settings, the screen display and of course the text buffer itself.

The function ‘main’ sets up all the internal state that the editor needs. Certain modules have private data that must be initialised and functions are called to do this. On some systems, the terminal must be switched into an unbuffered mode (“cbreak” on Unix, for instance) and this is done by a function called from ‘main ()’. Once the terminal is set up, the text buffer is created with ‘mkbuf ()’ and then ‘edit ()’ is called. When the user decides to quit, ‘edit ()’ returns and the buffer is destroyed, the terminal is reset and se exits to the operating system.

edit.c

In ‘edit.c’ the major function is ‘edit ()’. It parses the command line and enters the main command loop. ‘getcmd ()’ is called to get a command line, followed by ‘docmd ()’ to execute it. Various other things are done once per iteration of the command loop: messages are polled by ‘mswait ()’, breaks are polled by ‘intrpt ()’ and the command prompt is displayed and erased by ‘prompt ()’.

docmd.c

Commands are parsed in ‘docmd.c’ by a function called ‘docmd (uchar lin[’, int i, bool glob, int *status)]. The letters used for command names are defined in the header file ‘cmds.h’ and are included here.

docmd1.c

This file and one other, ‘docmd2.c’, contain most of the code to actually execute editor commands.

getcmd.c’ contains just one major function ‘getcmd ()’. A number of smaller functions deal with scanning the command line for tabs or for specific characters; they are declared static. An important static function here is ‘stuff_string ()’. It is used to insert strings into the command line in response to certain function tokens such as ‘curln’, ‘date’ and ‘filename’.

pat.c

pat.c’ contains the Software Tools Pattern Library. The preprocessor symbol ‘SWT’ is tested to build the two different versions with different metacharacters. Unfortunately, this once-useful feature not only complicates the code, it also complicates the documentation, adds to the testing problems and creates difficultites for the user when the compiled binary and the on-line help files get out of sync. Not to mention the confusion that results when a Unix user runs a SWT version of se without realising it. All in all, this compile-time option is more trouble than it's worth and will be removed from the code at the next version.

Most of the pattern-matching code is, in fact, a C translation of the original Software Tools Ratfor code. Virtually the only addition is the use of a global flag to control case-sensitive matching.

bind.c

The main purpose of this file is to manage key binding. All keystrokes are mapped into integer tokens by the machine-dependant video driver. The function of the bind module is to map single keystroke tokens into a stream of cursor editing functions and text.

A cursor editing function is represented by a token, a negative code that triggers an editing function (such as cursor motion). The user sets up mappings with the ‘ob’ option, which results in a call to ‘dobind ()’.

doopt.c

Options are parsed and set by ‘doopt.c’. The file ‘cmds.h’ is again included here to define the option letters. The main function is ‘doopt (uchar lin[’, int *i)], which is simply a big switch statement.

Other, static, functions in this file deal with more complex options such as colour and tab settings.

screen.c

Most of the code to generate the screen display is in this file, although some of it is in ‘display.c’.

It makes calls on other routines in the video driver to actually do the output. All the code to recognise reserved words is in this module, functions ‘kwlookup ()’, ‘addkw ()’ and ‘clearkws ()’.

display.c

The four prompt strings are kept in static arrays in this file.

The function ‘watch (void)’, which displays the time on the status line, is also in this module but the code to read the OS clock is not — that belongs elsewhere.

The function ‘scroll_window’ is used by the code for command mode and overlay mode to handle vertical motion of the cursor. Cursor functions such as ‘page_up’ and ‘scroll_up’ are implemented here.

The minor function ‘hwinsdel’ just deals with the hardware line insert/delete flag.

scratch.c

The text buffer structure is maintained by code in the files ‘scratch.c’ and ‘buffer.c’. Two different organisations are possible for the line descriptors depending on the setting of the ‘OLD_SCRATCH’ macro. If ‘OLD_SCRATCH’ is defined, se will maintain a linked-list of line descriptors; if it is undefined, a simple array is used. ‘scratch.c’ also maintains an integer variable called ‘Scratch’, which is initially set to MEMORY. If ‘Scratch’ is set to DISK, the text is held in a temporary file instead of being kept in memory via calls to ‘malloc ()’. The function ‘create_scratch (void)’ is called to switch over from memory-based to disk-based text. The third value for ‘Scratch’, EMS, is only used with the brain-damaged MS-DOS operating system on the PC. It allows the use of EMS memory for storing the text while keeping the LINEDESC structures in normal RAM. An EMS allocation library is in ‘emslib.c’.

os.c

All the operating system dependent code is in this module except that which concerns the screen output. The file is big enough without the complexity of terminal drivers...

Two routines, ‘shell_open ()’ and ‘shell_close ()’, deal with reading or writing to shell commands. On Unix, this is done via a pipe; on MS-DOS a temporary file is used. Another function, ‘call_shell (const uchar *)’ handles shell escapes.

The function ‘getflen ()’ is used in just one place, to determine the size of the message file and therefore how much memory to allocate before reading it in.

The functions ‘getreadonly ()’ and ‘setreadonly ()’ are used to read and set a file's read-only flag, respectively.

If the symbol ‘LOG_USAGE’ is defined, the function ‘log_usage ()’ is compiled in and called whenever se starts up. Its purpose is to write a line into a log file that identifies the date and time when the editor was used, the version that was used and the name of the user. This log is only really useful on multi-user systems.

filecmds.c

Commands that read or write the buffer to or from files are handled here. ‘filecmds.c’ also contains Unix code to open a pipe to crypt, the Unix data encryption program. It has, unfortunately, no equivalent in other systems and besides, the US government won't let anyone have crypt anyway.

dowind.c

Second window commands are parsed and executed in ‘dowind.c’.

global.c

Global prefixes are parsed and executed in ‘global.c’. You will also find the global prefix characters included from ‘cmds.h’ here.

misccmds.c

Miscellaneous commands, those prefixed with ‘z’, are parsed and executed in ‘misccmds.c’. At present there are only a few, such as ‘zb’ and ‘zv’.

markcmds.c

Commands dealing with marknames are in ‘markcmds.c’. This means ‘k’ and ‘n’.

misc.c

Several miscellaneous support routines are in ‘misc.c’. In particular, if a certain function that se requires is missing from the C library on some systems, it is usually implemented here. ‘strcmpi ()’ is a good example.

The Machine Dependent Video Drivers

All the remaining C files are Machine Dependent Video Drivers. They can be broadly divided into terminal drivers and memory-mapped display drivers.

The file ‘ibmpc.c’ drives various IBM PC video displays via a small assembler routine in ‘mvaddch.asm’. Some of these displays are particularly obscure...

The file ‘sirius.c’ deals with Bob Green's Sirius and also uses some assembler code, ‘sirtty.asm’. Anybody who still runs a Sirius is welcome to compile up the code and see if it still works.

The Atari ST driver currently drives the bit-mapped screen via the ROM-based VT-52 emulator; a more direct A-line driver might have been useful if A-line code hadn't been deprecated by Atari Corp. Well, the whole machine's deprecated now...

On a VAX running VMS, ‘vaxtty.c’ drives the terminal via the ‘SMG$’ library.

On Unix systems, the driver file is ‘uxterm.c’. This driver uses either the ‘termcap’ library or the ‘terminfo’ library according to the flavour of Unix that you're running. Linux is just another flavour of Unix, as far as this driver is concerned. Minix V1.50 systems also have ‘termcap’, so this file suits them too.

An additional driver, ‘hardterm.c’, is provided for those rare systems that require control sequences for terminals but provide no terminal-handling library. A notable example is the ICL PERQ running certain versions of PNX.

Initialisation and Termination

Several functions in the video driver deal with setting things up and closing them down again. One reason for the apparant over-design here is that certain video modules require extra set-up steps and corresponding close-down steps. An example is the X windows driver that allocates various data structures at start-up and must free them again cleanly on exit.

Displaying Text

Two functions deal with displaying text at particular row and column coordinates on the screen. The function ‘load ()’ places a single character at a given coordinate and in a given screen zone. It is used to draw the “bar” at the left margin and also for writing the status line. Most of the other parts of the display are drawn with ‘loadstr ()’, which places a string of characters on the screen.

Controlling Colours

Data Abstraction

Se holds a copy of the video screen in memory, conventionally in an array called ‘Screen_image’. This data structure is declared within the video driver and is private, i.e. declared static.

Certain routines in the main part of se need to read back from the screen image and ‘mvinch ()’ is provided for this. Usually, it is simply implemented by a single line of code that returns a character from ‘Screen_image’.

The code that deals with the .RC file needs to know the name of the terminal and calls ‘term_name ()’ to get this from the video driver. The terminal name is passed in as an argument to ‘set_term ()’, but this routine is provided for systems where the terminal type is fixed, such as the IBM PC, Atari ST and X windows. In such systems, the value passed into ‘set_term ()’ will be NULL and will be ignored.

When se operates on a terminal that cannot do hardware line insert and delete, the behaviour of append mode is altered. This involves special code to copy a row of the screen from the bottom to the top of the text window. The function ‘cprow ()’ performs this copying, since it needs access to the screen image.

Finally, the function ‘restore_screen ()’ is called in response to the ‘fix_screen’ cursor function. Its job is to rebuild the display from se's internal copy in ‘Screen_image’. The most efficient way to do this is to clear the screen and then redraw all the characters that are either non-blank or have non-default colours.

Controlling the Cursor

The function ‘position_cursor ()’ is called by the main se code only when doing user input, i.e. in ‘getcmd ()’. On memory-mapped systems, the screen can be updated at any location without the need to move the cursor; on terminal-based systems, the video driver must call ‘position_cursor ()’ itself before sending characters to the terminal. The terminal-based code for ‘position_cursor ()’ is usually optimised to send as few characters as possible and therefore increase the speed of screen updates. The memory-mapped driver code need not be as efficient since all it does is reposition the cursor when the user moves it.

The ‘show_cursor ()’ function is currently unused but reserved for future expansion.

The ‘shape_cursor ()’ function is used to change to appearance of the cursor when in insert mode. Not many terminals support this, but it is useful as a hook for future use.

Screen Control Functions

One of the first things that se must do to the terminal is clear the screen. The function ‘clrscreen ()’ does this and also initialises the cursor position variables to the top left corner of the screen (row = 0, col = 0).

If the terminal is capable of performing line insert and delete, as mentioned above, two functions will be called whenever this operation is required. ‘inslines ()’ is called to insert lines and ‘dellines ()’ to delete them.

There are several routines in se that need to make an audible alert signal. This is generally referred to as “ringing the terminal's bell”, but nowadays is unlikely to involve a large brass bell being struck inside a teletype! Some computers can control the volume, pitch or duration of the bleep that they produce and ‘ringbell ()’ accepts a parameter that can be used to distinguish various classes of bell sounds.

Reading Keyboard and Mouse Input

A single function, ‘readkey ()’, performs all of se's input operations. In the case of simple character-based systems, this means reading ASCII codes from the terminal and mapping some of them into key tokens. On more complex systems, it may also involve scan-code translation and mouse button detection. The most complicated implementations of ‘readkey ()’ are complete event dispatch loops — this is the case for GEM and X windows.

Porting SE

Se can be made to operate with little more than the C Standard Library but for full functionality (shell escapes in particular) you will need a fairly rich set of operating system functions.

The obvious functions are ‘fopen ()’, ‘fclose ()’, ‘fgets ()’ and ‘fputs ()’. Hopefully your C library contains these! Se will open a file during ‘e’, ‘r’ and ‘w’ commands and close it as soon as it has been read or written. The scratch file is opened when se is invoked and remains open until either a garbage collect is done, a new file is edited or the user quits. The file in the second window is opened when the window is opened and remains open until the window is closed. If message polling is enabled, se may attempt to open a file after each command; if this takes significant time, conditionally compile out the code.

Se also needs to read characters from the keyboard with no echo and no waiting for a carriage return. In Unix, this is done by setting “cbreak” mode. On the IBM PC and Atari ST, a BIOS call is available that does the job. Some machines assume all input is buffered until return is pressed, while others throw away certain control characters whether you like it or not. This type of system may be a problem.

Output to the screen is the most versatile part of se's links with the hardware. The interface is well-defined via a few function calls, all of which hide the implementation of the screen driver itself. See the next section for a description of the current video driver modules.

In “os.c” you will find a number of operating system routines which are conditionally compiled according to the target machine. The simpler functions do things like reading the system clock and polling for messages.

A function called “sysname (void)” returns a pointer to a string that gives the current machine name for the “l” command. Two functions, “isreadonly (const uchar *)” and “setreadonly (const uchar *, bool)”, deal with reading and setting the file's read-only flag, respectively. Most of these are usually available but they can be faked if necessary.

call_shell ()’ is the real problem. By no means all systems can invoke a new level of shell and even those that can may well not do it reliably. The Atari ST falls into the latter category, with a variety of shells available but little in the way of standardisation. The functions ‘shell_open ()’ and ‘shell_close ()’ allow se to read from or write into a pipe connected to another program. Again, not many systems can do this. The MS-DOS version fakes it by means of a temporary file.

If you cannot do shell escapes at all, arrange to conditionally compile out all references to the shell in the source code and in the documentation. If you can call the shell, but with limitations, make sure se gives a reasonable error message when something goes wrong and document the problem. Refer to the error codes ENOSHELL, ESHELLERR and ECANTFORK. If there is a strong chance of losing the buffer due to a problem with shell escapes, do not provide them.

The Video Drivers

Unix machines can be broadly divided onto BSD and System V. In each case, there is a database of terminal types and control characters. Se looks up the current terminal in the database and behaves accordingly. All terminal output is buffered. Only primitive visual attributes are available as a substitute for colour. Unfortunately, not all terminals render the attributes in the same way and the availability of certain attributes is dependent on the setting of the terminal database.

“ibmpc.c” consists of the C routines required to drive a Monochrome Display and Printer Adaptor (MDPA), a Colour Graphics Adaptor (CGA), an Enhanced Graphics Adaptor (EGA) or a Video Graphics Array (VGA). Each of these adaptors is driven in 80 column text mode — mode 7 on MDPAs, mode 3 otherwise. If the PC is found to be in a graphics mode when se is started, it is immediately put into mode 3; the graphics mode is restored on exit.

On the PC, either BIOS calls or direct memory access is used, depending on the setting of the boolean variable ‘Use_bios’. If the symbol “ODDVID” is defined, se knows about the Wyse 700 monochrome display and can drive it in 160 column, 50 row mode. In order to drive the Wyse, se has to use BIOS, so ‘Use_bios’ is set to YES. BIOS is also useful when running under Microsoft Windows, so the environment variable ‘SE_BIOS’ will set the ‘Use_bios’ flag. Otherwise, all characters are placed directly into video memory by an assembler routine called “mvaddch ()”. The reason, of course, is speed of screen updates; se is usable even on a slow 8088 system.

For certain VGAs, it is possible to set a video mode that gives a 132 column display. In this case, a call to “setncols ()” sets the memory width to (132 * 2) bytes instead of (80 * 2). In this way, the low-level code is aware of the layout of the screen and we can still use the fast “mvaddch ()”.

If anyone adds non-IBM video adaptors to the IBM PC version of se, please make the code conditional on “ODDVID” so that the size of the EXE file is not increased by unwanted display driver code. When distributing binary copies of se try to give both variants: “ODDVID” and standard IBM.

On the Atari ST, BIOS is used for all screen output at present, although it may be modified to use A-line instructions instead. The ST has a VT52 emulator built-in which is the only way to do cursor motion from BIOS. Therefore, the “atari.c” file bears a remarkable similarity to the parts of “hardterm.c” that drive a VT52.

The Sirius driver goes right ahead and reprograms the 6845 CRT controller. Cursor size and position are both controlled by 6845 registers on the Sirius. Like the IBM PC driver, an assembler routine is used for most of the speed-critical stuff. Code to perform line insert-delete is also coded in the assembler file “sirtty.asm”.

Error codes and messages

 0  SNULL         V2.08   /* Special null string (not in file) */
 1  EBACKWARD     Line numbers in backward order