PreviousNext

CHAPTER 2 Installation of a Sketcher Web Service


Theory of Operation

The sketcher may be run in various modes. The exact mode of operation must be configured at set-up and cannot be changed without terminating the sketcher server(s) and doing a re-configuration.

All interactive functionality of the sketcher is controlled by a single server program. This server may be run either as a stand-alone process or daemon, or as a standard FCGI application controlled by the main Web server.

Operation as Stand-Alone Process

Operation as a stand-alone daemon or process is primarily useful for debugging purposes. Since in this mode the software can write trace output to standard output, and it can run under any developer account, and can easily be killed without the need to do this as root or with the permissions of the Web server process, customizing the application is much more convenient in this mode. The major disadvantage of running the sketcher as a separate process is that it needs to use a different port for communication with the client than the Web server transmitting the auxiliary HTML pages. This leads to cross-site JavaScript execution problems even if the main Web server and the sketcher process are executing on the same computer. The reason is that, according to a strict interpretation of JavaScript permission rules, JavaScript access across windows is disallowed when the window contents were loaded via different ports, even if the contents came from the same host. There is disagreement among browser developers whether different ports of origin but same host of origin should lead to blocking of cross-window scripting. MS Internet Explorer will still allow this, while most other browsers block JavaScript access in this environment. In practice this means that development and debugging of new Sketcher features will require the use of Internet Explorer. Verification of operation on other browsers can only be performed once the updated version has been re-configured to run as an FCGI.

Operation as FCGI Process

FCGI processes are controlled by a Web server, and both input and output are routed through the Web server so that all data transfer appears to involve only the Web server and its port.Therefore, there are no cross-window scripting problems.

In order to run the sketcher as an FCGI, the main Web server must have been configured to support FCGI. For Apache, this means that standard module mod_fcgi.so needs to be present and loaded. In addition, the sketcher installation directory must be set up in the Web server configuration to allow CGI execution. Currently, all sketcher components need to be served from a common installation directory. There is no provision for a split into the FCGI server component and supporting static files.

The size of the FCGI program is not trivial, and initialization after start-up takes maybe half a second. Since sketching events need to be processed much faster than this, running the application as a standard CGI which is started anew for every processing request and then terminates is not a viable option.

Storage of Structure Information

The sketcher obviously needs to have a memory on what a specific user is currently drawing. The client JavaScript code only reports drawing events, but no information about the current structure. This information must be stored somewhere between events.

The simplest method to store structure data is to keep it in the memory of the sketcher FCGI program. This mode is simple to use and probably the most useful for the majority of application scenarios. Since clients transmit an unique session ID, multiple users can use the same sketcher process (seemingly) simultaneously. The sketcher server can keep multiple structures that are edited in different sessions byt the same or different users in memory. By correlating the session ID with its internal storage manager, the correct structure among the current memory set is selected, edited, and the results sent back to the client. There is no hard-coded limit on the number of parallel editing sessions which can be maintained. On an average Web server host, a couple of dozen parallel editing sessions are easily feasible. As long as no events are received from a client there is only a storage requirement of a few kilobytes of memory, but no further load on the server because there is no computation and traffic associated with that session.

In addition to internal memory storage, the sketcher application supports storage of state information in external storage managers. The default application script supports the NCBI NetCache network data caching system, and the PubChem QueueManager queueing system. If the use of any of those systems is enabled in the sketcher configuration, structure state is swapped out to the external storage manager immediately after every operation, and all internally memorized state information is deleted. When a client event is received, the session ID information is used to first re-load the current structure data before the actual processing of the event begins. The overhead for retrieving and storing structure state is not big, just a few milliseconds.

The big advantage to external storage of structure state information is that in that case multiple sketcher processes can operate in parallel, on one or more physical servers. This way, the software becomes both scalable and robust against failures of single servers. At PubChem, the site http://pubchem.ncbi.nlm.nih.gov/edit/ is actually served by two physical computers, named pubchem3 and pubchem4. Access to these two systems is dynamically load-balanced, and both servers run their own multiple sketcher FCGI instances. During the course of an editing session, the session is typically bounced between hosts multiple times, but in a completely transparent fashion and without loosing any sketcher content because every sketcher server process knows where to obtain the most current structure information to proceed with the drawing event to be processed next. The storage managers are typically itself mirrored for performance and reliability.

This manual assumes that any external storage manager which is going to be used is already installed and configured properly.

In case in-memory state storage is used, the Web server must be prevented from running multiple instances of the sketcher FCGI by appropriate main Web server configuration options. Since state is stored only in a single process and no inter-process communication is supported, a switch to a second FCGI process by the Web server will lead to a loss of the current structure information. Multiple FCGI instances are allowed to be run on a single machine if external state storage is used. This is primarily useful for multi-processor hosts.

Preparing for Installation

To prepare for installation, create an empty temporary staging directory. Download the Sketcher installation package, or obtain it in another way, and unpack it in this staging directory. The standard package format is a simple .tgz file, so the commands should be something like

mkdir stage 
cd stage 
gunzip </my/path/edit.tgz|tar xvf - 

After installation, the contents of the staging directory can be deleted.

Configurable Template Directories

After unpacking a sketcher distribution in an installation directory, there are two subdirectories which contain data which may be customized.

The img directory contains small images used as buttons on the sketcher interface pages. Examples are element symbols and bond type icons. These images were not hand-drawn but can be regenerated programmatically by running the mkicons.tcl script with a sketcher FCGI interpreter, as in

csweb -f mkicons.tcl 

The mkicons.tcl script may be edited to use different fonts and backgrounds, etc. Normal installations will probably stick with the images in their standard supplied form.

The second directory named tpl contains script code and HTML templates which need to be adapted to set up the service in the current location. Files from this directory are processed by the supplied Makefile and stored in modified form in the target directory, overwriting any older processed interface files in that location.

Selecting a Script Interpreter

The standard sketcher package does not contain the script interpreter which runs the application script. For default installations, a suitable interpreter is the csweb executable found in academic and commercial CACTVS toolkit packages.

There are two interpreters commonly used with this application:

To prepare for an installation, either copy the selected interpreter executable into the unpacked installation directory, or make at least sure that it is found in the standard path.

Configuring the Installation

The key to setting up a custom installation is the Makefile found in the staging directory. The first lines of the Makefile contain variable definitions which are used to process template files and update their internal references etc. so that the whole set-up works in the location of the target directory.

The following variables need to be set:

Performing an Installation

After the Makefile in the unpacked staging directory has been edited, perform the following steps:

Cleanup

After a successful installation, the staging directory can be deleted.


PreviousNext