CHAPTER 2 Installation of a Sketcher Web Service

The sketcher may be run in various modes. The exact mode of operation must be configured at set-up and cannot be changed without terminating the sketcher server(s) and doing a re-configuration.

All interactive functionality of the sketcher is controlled by a single server program. This server may be run either as a stand-alone process or daemon, or as a standard FCGI application controlled by the main Web server.

Operation as Stand-Alone Process

Operation as a stand-alone daemon or process is primarily useful for debugging purposes. Since in this mode the software can write trace output to standard output, and it can run under any developer account, and can easily be killed without the need to do this as root or with the permissions of the Web server process, customizing the application is much more convenient in this mode. The major disadvantage of running the sketcher as a separate process is that it needs to use a different port for communication with the client than the Web server transmitting the auxiliary HTML pages. This leads to cross-site JavaScript execution problems even if the main Web server and the sketcher process are executing on the same computer. The reason is that, according to a strict interpretation of JavaScript permission rules, JavaScript access across windows is disallowed when the window contents were loaded via different ports, even if the contents came from the same host. There is disagreement among browser developers whether different ports of origin but same host of origin should lead to blocking of cross-window scripting. MS Internet Explorer will still allow this, while most other browsers block JavaScript access in this environment. In practice this means that development and debugging of new Sketcher features will require the use of Internet Explorer. Verification of operation on other browsers can only be performed once the updated version has been re-configured to run as an FCGI.

Operation as FCGI Process

FCGI processes are controlled by a Web server, and both input and output are routed through the Web server so that all data transfer appears to involve only the Web server and its port.Therefore, there are no cross-window scripting problems.

In order to run the sketcher as an FCGI, the main Web server must have been configured to support FCGI. For Apache, this means that standard module mod_fcgi.so needs to be present and loaded. In addition, the sketcher installation directory must be set up in the Web server configuration to allow CGI execution. Currently, all sketcher components need to be served from a common installation directory. There is no provision for a split into the FCGI server component and supporting static files.

The size of the FCGI program is not trivial, and initialization after start-up takes maybe half a second. Since sketching events need to be processed much faster than this, running the application as a standard CGI which is started anew for every processing request and then terminates is not a viable option.

Storage of Structure Information

The sketcher obviously needs to have a memory on what a specific user is currently drawing. The client JavaScript code only reports drawing events, but no information about the current structure. This information must be stored somewhere between events.

The simplest method to store structure data is to keep it in the memory of the sketcher FCGI program. This mode is simple to use and probably the most useful for the majority of application scenarios. Since clients transmit an unique session ID, multiple users can use the same sketcher process (seemingly) simultaneously. The sketcher server can keep multiple structures that are edited in different sessions byt the same or different users in memory. By correlating the session ID with its internal storage manager, the correct structure among the current memory set is selected, edited, and the results sent back to the client. There is no hard-coded limit on the number of parallel editing sessions which can be maintained. On an average Web server host, a couple of dozen parallel editing sessions are easily feasible. As long as no events are received from a client there is only a storage requirement of a few kilobytes of memory, but no further load on the server because there is no computation and traffic associated with that session.

In addition to internal memory storage, the sketcher application supports storage of state information in external storage managers. The default application script supports the NCBI NetCache network data caching system, and the PubChem QueueManager queueing system. If the use of any of those systems is enabled in the sketcher configuration, structure state is swapped out to the external storage manager immediately after every operation, and all internally memorized state information is deleted. When a client event is received, the session ID information is used to first re-load the current structure data before the actual processing of the event begins. The overhead for retrieving and storing structure state is not big, just a few milliseconds.

The big advantage to external storage of structure state information is that in that case multiple sketcher processes can operate in parallel, on one or more physical servers. This way, the software becomes both scalable and robust against failures of single servers. At PubChem, the site http://pubchem.ncbi.nlm.nih.gov/edit/ is actually served by two physical computers, named pubchem3 and pubchem4. Access to these two systems is dynamically load-balanced, and both servers run their own multiple sketcher FCGI instances. During the course of an editing session, the session is typically bounced between hosts multiple times, but in a completely transparent fashion and without loosing any sketcher content because every sketcher server process knows where to obtain the most current structure information to proceed with the drawing event to be processed next. The storage managers are typically itself mirrored for performance and reliability.

This manual assumes that any external storage manager which is going to be used is already installed and configured properly.

In case in-memory state storage is used, the Web server must be prevented from running multiple instances of the sketcher FCGI by appropriate main Web server configuration options. Since state is stored only in a single process and no inter-process communication is supported, a switch to a second FCGI process by the Web server will lead to a loss of the current structure information. Multiple FCGI instances are allowed to be run on a single machine if external state storage is used. This is primarily useful for multi-processor hosts.

Preparing for Installation

To prepare for installation, create an empty temporary staging directory. Download the Sketcher installation package, or obtain it in another way, and unpack it in this staging directory. The standard package format is a simple .tgz file, so the commands should be something like

Configurable Template Directories

After unpacking a sketcher distribution in an installation directory, there are two subdirectories which contain data which may be customized.

The img directory contains small images used as buttons on the sketcher interface pages. Examples are element symbols and bond type icons. These images were not hand-drawn but can be regenerated programmatically by running the mkicons.tcl script with a sketcher FCGI interpreter, as in

The mkicons.tcl script may be edited to use different fonts and backgrounds, etc. Normal installations will probably stick with the images in their standard supplied form.

The second directory named tpl contains script code and HTML templates which need to be adapted to set up the service in the current location. Files from this directory are processed by the supplied Makefile and stored in modified form in the target directory, overwriting any older processed interface files in that location.

Selecting a Script Interpreter

The standard sketcher package does not contain the script interpreter which runs the application script. For default installations, a suitable interpreter is the csweb executable found in academic and commercial CACTVS toolkit packages.

csweb This is a standard CACTVS Toolkit interpreter, without any NCBI extensions. It will only work with the in-memory storage system for sketcher structure data. It will also not be able to import, for example, PubChem structures via CID reference or perform any other tasks which require interaction with NCBI-specific systems or encodings.

csweb_nlm_static This is the default CACTVS Toolkit interpreter used in a variety of PubChem Web services. Its functionality is a superset of the standard csweb interpreter. It has support for interacting with the PubChem database, knows how to talk to the qman and netcache storage manager systems, supports PubChem ASN.1 encoding and decoding of structure data and various other enhancements for operating in the PubChem environment.

To prepare for an installation, either copy the selected interpreter executable into the unpacked installation directory, or make at least sure that it is found in the standard path.

Configuring the Installation

The key to setting up a custom installation is the Makefile found in the staging directory. The first lines of the Makefile contain variable definitions which are used to process template files and update their internal references etc. so that the whole set-up works in the location of the target directory.

TARGETDIR

The full path name of the target directory for the installation, as seen in the file system. Usually it is a subdirectory of htdocs. An attempt will be made to create the directory if it is not yet present.

INTERPRETER

The name of the interpreter executable, without a path.

SERVER

Allowable values are standalone for the user process/daemon version, and fcgi for FCGI. A standard non-developer installation will use fcgi.

STANDALONEPORT

This variable is only used for the standalone version. It defines the port the sketcher process will be operating on. It must not collide with the port of any other network-based service on the system. In addition, using a port below 1024 will probably require special permissions.

STORAGE

This is the selected storage system. It can be memory, netcache or qman. The netcache and qman options require that the respective service has been set up and is running.

CACHEPORT, CACHEHOST and CACHESERVICE

These parameters are of relevance only if the netcache storage system is used. Please refer to the netcache documentation for details.

HOST

This is the fully qualified name (computer name and domain) of the server host this application is being installed on, as visible from the outside Internet. In case a virtual host is used which comprises of more than one physical host, use the virtual host name.

DOMAIN

This variable defines the document domain the sketcher will be operating in. It can be set either to the same value as the HOST variable, or it can be any more generic Internet domain. For example, if the host is pubchem.ncbi.nlm.nih.gov, it may be set to the same value, or ncbi.nlm.nih.gov, or nlm.nih.gov. This parameter effectively controls from which sites the sketcher functionality can be used from. If the fully qualified host name is used, only pages served from the same host can receive sketcher data in forms and other page elements. If a more generic domain is configured, any host which resides in that domain can use the sketcher. The problem of using document domains and its impact on Web pages wanting to receive sketcher data is discussed in Chapter 3 of this manual.

WEBDIR

The name of the installation directory, without leading or trailing slashes, as seen from the Web server root. For example, if the installation directory was called /www/htdocs/edit in the local filesystem, it may appear has http://$(HOST)/edit on the Web. In that case, the Web directory name set here would just be edit.

TRANSFER

This variable controls which type of data the sketcher will attempt to send to an opener page after every editing operation which changed the structure. The mechanism used to receive sketcher data on form pages etc. is described in Chapter 3 of this manual. The configuration here selects only available data, but does not require that any opening form page actually makes use of it. The parameter is expected to be a space-separated list of the desired transfer formats. The available formats include smiles (SMILES or SMARTS string, depending on sketcher window controls), sln (Sybyl line notation string), jme (JME Java molecule editor string), inchi (IUPAC InChI string), keys (Netcache session keys, only if Netcache storage manager is used), sessionkey (the current session key), blob (CACTVS compressed serialized structure blob in base64-encoding), molfile (MDL Molfile image) and formula (molecular formula of edited structure, with implied hydrogen). CACTVS blobs and Molfile images can have a size of a few kilobytes each, while all other data is roughly 100 bytes per item. Since this data is transferred from the sketcher server to the client browser after every drawing even which modified the structure, transfer of items in these larger data formats can have a notable impact on sketcher usability over slower connections and thus should only be enabled if a format is actually needed by any of the Web applications receiving sketcher data from an installation. Use of lossless CACTVS blobs or MDL Molfile record images has a place, though, because there are, for example, query attributes which can be set in the sketcher and which cannot be transmitted in other formats.

DEBUG

Set this value to 1 if you want debug output on the processing of individual sketcher events on standard output. This option can only be used with the standalone server mode, because otherwise standard output is needed for data transfer with the main Web server.

IMGW and IMGH

This is not the size of the sketcher main drawing area but the size of template depictions in the template panel. You probably do not want to change this except in case you have changed the default templates in file tpl/tpl.tpl to a set with significantly different size characteristics.

Performing an Installation

After the Makefile in the unpacked staging directory has been edited, perform the following steps:

In case any sketcher server processes are running that access the target directory, kill them. Killing Web server-controlled FCGI processes may require special permissions.

Execute the command

make install

in the staging directory. This will fetch a fresh set of configuration-dependent files form the tpl subdirectory, process them, and copy them to the installation directory. Any old files from earlier installs will be overwritten. Some static HMTL files and the script interpreter will also be copied.

In case the operation mode is standalone, an executable called editsrv is assembled in the installation directory. It can either be manually started from there, or via some kind of custom /etc/init.d. init script. In any case, the Web server will not be able to start it automatically when a sketcher page with dynamic content is loaded. To verify proper operation of the sketcher, the server process must be started manually before any sketcher pages are loaded.

In case the operation mode is fcgi, an executable called editsrv.fcgi is assembled in the installation directory. A properly configured Web server should start it automatically if a sketcher page with dynamic content is loaded. If this does not happen, consult the error log of the Web server for information about the problem. In case of an earlier start-up failure or crash due to a misconfiguration, many Web servers impose a blackout period of 10 minutes or more before they will attempt any additional automatic restart of an FCGI. In that case, even a corrected set-up may appear to fail for a period of time. If it is not a production system, restarting the main Web server may be an option.

Verify that the sketcher is working by loading index.html into a client Web browser. Use the Web path of the target directory. You can also simply use the target directory URL if index.html is the default directory view file as configured in the Web server. Next try a few structure drawing operations. Any failure is likely to become immediately obvious by displaying a broken canvas drawing area image. In case a virtual server with multiple physical servers is used, it is rather unpredictable when any of the physical servers will be contacted. In that case we can only suggest a longer sketching test session - we have seen cases where a switch to a specific physical server with a problem only happened after a few minutes, and until that moment no problems were obvious because the faulty server was never contacted.