CHAPTER 3 Linking the Sketcher to Web Pages

The Sketcher program was designed to be easy to use in a variety of Web linking scenarios, without the need to customize separate installations for each application, and with minimum impact on the design of Web pages making use of the sketcher.

General Integration Method

Usually, the sketcher is invoked by opening up a separate sketcher window by clicking onto a button. The associated JavaScript fragment looks like this:

The default size of the sketcher window is 440 by 880 pixels. This will display well on all standard browsers. For the sake of users with non-standard font size settings, the subwindow should be opened as resizable.

In theory, it is also possible to load the sketcher into a frame or iframe in the context of a larger window. However, the required window size of the sketcher is rather large, and pages which embed the editor thus tend to appear crowded.

Pre-loading Sketcher Content

The editor can be opened with initial content. This is done simply by providing CGI parameters to the opening statement, as in

smiles A SMILES string of a chemical structure. The structure will be decoded without implicit hydrogens. When providing SMILES and SMARTS strings as initialization data, look out for the # character which needs to be escaped in order to prevent it from being misread as a page location URL component.

smarts A SMARTS string of a query structure. The query will be fully decoded, but not all SMARTS query features can be displayed in the sketcher, and not all possible atom and bond query attributes can be edited. As long as atom or bonds with extended SMARTS attributes are not touched, these extra attributes will however be preserved and exported in suitable formats.

jme An editor string for the popular JME editor¹ applet.

sln A Sybyl line notation string of a query structure. Support for SLN query attributes is not complete.

minimol A CACTVS V2 base64-encoded minimol string

blob A CACTVS ensemble block in zlib-compressed base64-encoded format

key_cur An identifier of an existing NCBI Netcache session. This will only work with sketcher installations which use NCBI Netcache to store state. NCBI Queueman sessions cannot be resumed for technical reasons.

sid A PubChem database record identified by its Structure identifier.

cid A PubChem database record identified by its Compound identifier.

did A PubChem deposition system record identified by its Deposition identifier. Access will only succeed if a cookie in environment variable HTTP_COOKIE contains the session ID of a registered depositor and when that session ID grants access to the specific deposition identifier. It is not possible to view arbitrary deposition IDs.

qmid A PubChem queue manager ID. This identifier is only used for the purpose of revisiting a previous query on the original PubChem system.

vid A PubChem structure upload vetting ID. This is a space-separated list of a Queueman request ID and the name of a blob associated with that request. Please remember that in parameter URLs the space character separating these two parts must be escaped as a plus sign. The content of the blob is a binary, un-compressed CACTVS structure blob. In contrast to all other initial sketcher content identifiers, this parameter establishes a continuous database connection. Every change of the loaded structure is automatically and immediately reflected in the Queueman database. All other initialization data identifiers are read-only, i.e. the PubChem database is not updated when a pre-loaded CID is edited in the sketcher.

vhadd This parameter is only useful in combination with a VID identifier. It is a boolean value which determines whether the structure written back into the database blob identified by the VID parameter will undergo an automatic hydrogen addition step before it is stored or whether it is transmitted as it appears in the sketcher. The visible sketcher content is not changed regardless of the value of this parameter. If it is not set, the default value for this parameter is 0.

In case more than one of these parameters is specified, only the first one that is successfully decoded will be used. In case a preload identifier cannot be loaded, it is silently ignored.

It is common practice to construct the URL to open the sketcher window with contents of form fields and other dynamic data. Here is a sample JavaScript routine:

function startEditor()  
{ 
	cid = document.forms["query"].elements["simple_cid"].value; 
	smarts = document.forms["query"].elements["simple_searchdata"].value; 
	if (cid!="") { 
		editorwin = open("../edit/index.html?cid="+cid+"&cnt="+eventcnt, "editor", 
			"height=440,width=880,scrollbars=no,status=yes,location=no,menubar=no,toolbar=no,resizable=yes",true); 
	} else if (smarts!="") { 
		smarts = encodeURIComponent(smarts); 
		editorwin = open("../edit/index.html?smarts="+smarts+"&cnt="+eventcnt, "editor", 
			"height=440,width=880,scrollbars=no,status=yes,location=no,menubar=no,toolbar=no,resizable=yes",true); 
	} else { 
		editorwin = open("../edit/index.html?cnt="+eventcnt,"editor", 				 
			"height=440,width=880,scrollbars=no,status=yes,location=no,menubar=no,toolbar=no,resizable=yes",true); 
	} 
}

Note the encoding step for the SMARTS value which is needed to protect characters such as # which frequently occur in SMARTS strings but have special meaning within URLs.

Receiving Sketcher Data

The sketcher server transmits structure updates in a near-continuous fashion. There is no specific user act which initiates the transfer of sketcher content to a receiving form or other recipient.

At the installation time of a sketcher instance, the administrator selects a set of supported transfer formats. When a sketcher window connected to a server is running, it will attempt to transfer its current content in all installed formats to recipients. It does this by trying to call a sequence of JavaScript functions on the opener page with the current structure data in any of the installed encodings as argument. The names of these transfer functions are fixed. A page using the sketcher as data provider simply needs to have one or more of these functions in its <javascript> page sections. Within these functions, page-specific custom JavaScript code can, for example, copy the received data to a form element and simultaneously perform any other related functions. The advantage of this generic mechanism is that the sketcher application does not need to know anything about the data destinations and storage mechanisms of any form it is sending data to.

It is neither required nor normal usage to write all possible transfer functions. If functions are missing, a failure to call them from the sketcher window is simply ignored.

A caller page must use the same document domain as the sketcher installation it is referring to. This is explained in more detail in the next paragraph.

These are all possible transfer functions. A specific installation will usually only support a subset of these.

transferSmiles(s) Receive the SMILES or SMARTS encoding of the current editor content. The encoding style is controlled by the option menu for setting the style of the data display above the drawing area in the sketcher window.

transferBlob(s) Receive a CACTVS toolkit compressed base64-encoded serialized ensemble object. This is the only transfer mechanism guaranteed to be lossless, i.e. not to drop any query attributes or other structure attributes. On the other hand, these blobs are significantly larger than SMARTS strings and not conveniently displayed in an HTML form element. An alternative is to store these in a hidden form field, and to display the approximate SMARTS version in a visible form field. When the form is submitted, and the visible SMARTS text in the form has not been edited after it was received from the sketcher, the hidden blob can be transmitted as real query structure. One drawback is that this option about doubles the bandwidth requirements of the sketcher because of the frequent transfer of full blobs instead of short SMARTS strings. For this reason, some installations may not want to install this transfer option.

transferMolfile(s) Receive a string image of an MDL Molfile, possibly with ISIS query data. Since these Molfile images are comparably large, this function is frequently disabled in Sketcher installations.

transferJme(s) Receive a JME editor string of the current editor content

transferInChI(s) Receive the InChI string of the current editor content

transferSLN(s) Receive the current editor content as Sybyl Line Notation string

transferMinimol(s) Receive a CACTVS V2 minimol in base64-encoding

transferFormula(s) Receive the molecular formula, including implicit H

transferKeys(k1,k2) Receive NCBI Netcache storage keys for the current and backup ensembles This is only meaningful if the state storage mechanism of the sketcher installation uses the NCBI Netcache daemon. In that case, sessions can be resumed using these keys. This is described in the paragraph on pre-loading the sketcher content.

transferSessionID(id) Receive the session ID.

Document Domain Issues

The Internet domain any instance of the sketcher operates in is determined at installation time. In the simplest case, it is the same as that of the host it runs on, but it can be set to a more generic domain. For example, the sketcher may be installed on pubchem.ncbi.nlm.nih.gov, but it may be configured to run in domain nlm.nih.gov. A configured subdomain can only be a left-truncated part of the host domain, not an arbitrary domain, and has at least two levels left. For example, a sketcher installed at pubchem.ncbi.nlm.nih.gov can be configured to run in domains pubchem.ncbi.nlm.nih.gov, ncbi.nlm.nih.gov, nlm.nih.gov or nih.gov, but nowhere else.

It is important to know the domain a referenced sketcher is operating in, because the calling Web page must have its document domain set exactly to the same domain. This is because the sketcher needs to be able to call the data transfer functions in the opener window. If this window is in a different domain (even a more specialized domain of the sketcher domain), the function call will be blocked by the Web browser cross-site scripting security mechanisms, and no data from the sketcher ever arrives at the opener window.

The document domain of an HTML page is set by a simple JavaScript statement like

which should be executed directly at load time in the <script> section, or in an onload()-handler function attached to the <body> tag.

The domain modification only succeeds if the document domain of the page is already that domain, or a more specialized variant, such as cis.ncbi.nlm.nih.gov. It also means that no Web service outside the nlm.nih.gov domain (assuming the sketcher instance is installed there) can make use of the data transfer functionality of the sketcher, because it will not be able to set its document domain to the required value.

However, even from within a qualified domain, an opener page does not necessarily immediately possess a document domain which allows the domain adjustment to succeed. Depending on Web server configuration, the implicit domain may not have been resolved into a fully qualified path name. For example, if a HTML page which wants to link to a sketcher instance is called as http::/cis/myapp/form.html, the page domain as seen from JavaScript on that page may simply be cis, even if the cis computer is in the nlm.nih.gov domain and its fully qualified name is cis.ncbi.nlm.nih.gov. In that case the domain setting statement will still fail, and the editor no be able to transmit data to that page.

In order to solve this problem, a sketcher-dependent Web page should always check its own document domain and execute a reload with a fully qualified domain name if necessary. Here is a simple sample function to achieve this:

This function should only be called in as an onload()-handler attached to the <body> tag. Direct execution in the script section of the page is not safe. Many older Web browsers tend to crash or lock up if a location.replace() is executed while the original page is still loading.

The caller document domain should be kept constant during the time the editor window is open. The callback functions are called every time the editor content changes. If the document domain has been further modified since the window was opened, data transfer can fail mysteriously in the midst of an editing session because the license to call the transfer functions in the opener page has been lost.