Mumps/MDH Toolkit
MDH: The Multi-Dimensional and Hierarchical
Database Toolkit Programmer's Guide
Version 2.1

Kevin C. O'Kane, Ph.D.
Computer Science Department
University of Northern Iowa
Cedar Falls, IA 50614
okane@cs.uni.edu
http://www.cs.uni.edu/~okane
February 22, 2007

Except as otherwise noted, this document is Copyright (c) 2004, 2006 Kevin C. O'Kane, Ph.D.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being: Page 1, with the Front-Cover Texts being: Page 1, and with the Back-Cover Texts being: no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".


The software is distributed under one of the following licenses (please see each source code module for specific copyright and license details applicable to that module). In general, the compiler itself is distributed under the GNU GPL license and the run-time support routines are distributed under the GNU LGPL.

  1. GNU General Public License

    This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

    This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

  2. GNU Lesser General Public License

    This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Full texts of the licenses appear at the end of this document. Programs may call upon the Perl Compatible Regular Expression Library which, in some cases, is distributed with the Mumps Compiler. The separate license and copyright statement for PCRE appears in Appendix B. You should also read the license provided with the Berkeley Data Base (http://www.sleepycat.com).


Contents

Part I - Programmers Guide



Part I - Programmers Guide

Software Distribution

Source code distributions are available at:

http://www.cs.uni.edu/~okane/source/

see also:

http://math-cs.cns.uni.edu/~okane/cgi-bin/newpres/m.compiler/compiler/index.cgi

Important Notes:

  1. Some older g++ compilers have errors in string processing libraries and may generate errors when compiling. This code was developed using g++ (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk). Use of earlier compilers may cause problems.

  2. Likewise, recent versions of the g++ compiler have changed the manner in which string casting takes place. This effects the using the assignment operator (=) to assign global array values to strings. Consequently, either (string &) or (char *) must be used when assigning from global to string. Example:

    global x("x"); string z; z=(string &)x("1"); z=(char *)x("1");

  3. Versions after April 17, 2004, changed some function call parameters, most notably xecute().

  4. Details on installation of the Toolkit for Linux, Windows XP and Cygwin are contained in the Mumps Compiler Manual (compiler.html) which is part of the distribution package as well as operating specific "INSTALL" files contained in the distribution.

  5. Stack size can be an issue for some functions, most notably the Smith-Waterman alignment procedure. The stack size is set for WindowsXP programs in the file "mumpsc.bat" in the batch variable "STACK" which is set, by default, to 5,000,000. It may be raised (or lowered) as needed. The stack size is set for Linux in the "ulimit: command. This can be increased under Linux with the command:

    ulimit -s unlimited

    (Other options are ulimit -a and ulimit -aH to show limits).

Introduction

The MDH (Multi-Dimensional and Hierarchical) Database Toolkit is a Linux-based, open sourced, toolkit of portable software that supports very fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in data bases ranging in size up to 256 terabytes. The package is written in C and C++ and is available under the GNU GPL/LGPL licenses in source code form. The distribution kit contains demonstration implementations of network-capable, interactive text and sequence retrieval tools that function with very large genomic data bases and illustrate the toolkit's capability to manipulate massive data sets of genomic information.

The toolkit is distributed as part of the Mumps Compiler Versions exist for Linux, Cygwin, the DJGPP port of the GCC compiler for Windows XP and the command line version of the MicroSoft Visual C++ Compiler

The toolkit is a solution to the problem of manipulating very large, character string indexed, multi-dimensional, sparse matrices. It is based on Mumps (also referred to as M), a general purpose programming language that originated in the mid 60's at the Massachusetts General Hospital. The toolkit supports access to the PostgreSQL relational data base server, the Perl Compatible Regular Expression Library, the Berkeley Data Base, and the Glade GUI builder as well as server-side development of interactive web pages.

The principal database feature in this project is the global array which permits direct, efficient manipulation of multi-dimensional arrays of effectively unlimited size. A global array is a persistent, sparse, undeclared, multi-dimensional, string indexed data disk based structure. A global array may appear anywhere an ordinary array reference is permitted and data may be stored at leaf nodes as well as intermediate nodes in the data base array. The number of subscripts in an array reference is limited only by the total length of the array reference with all subscripts expanded to their string values. The toolkit includes several functions to traverse the data base and manipulate the arrays.

The toolkit makes the data base and function set available as C++ classes and also permits execution of legacy Mumps scripts. To use the toolkit, you install the MDH and Mumps distribution kit and related code.

You must also use a recent version of the g++ compiler. Many older versions do not include recent changes to the C preprocessor standard and will not work. The code presented here was compiled and tested using g++ version 3.2.2.

Creating Global Arrays

The class, function and macro libraries primarily operate on global arrays. Global arrays are undimensioned, string indexed, disk resident data structures whose size is limited only by available disk space. They can be viewed either as multi-dimensional sparse matrices or as tree structured hierarchies. Global arrays are a C++ class and must be declared or instantiated in your C++ program as an instance of the global. For example, to create the global named "gbl", do the following:

      #include <mumpsc/libmpscpp.h>
      global gbl("gbl");
The instantiation consists of two parts: the name of the global array object and the name of the global array on disk associated with this object. In the above example, these are both "gbl". Note that the disk name of the global is enclosed in a parenthesized character string expression following the object name. The name in the expression need not (but usually does) match the name of the object. The name given in the parenthesized character string is the disk name of the global array. The global array object is associated with the disk name when the object is created. When the object is destroyed, the disk based global array persists.

Global objects may be created through declarations as shown above or dynamically:

      global *gptr;
      gptr = new global ("gbl_name");
      (*gptr)("1","2","3") = "test";
which is equivalent to:
      global g("gbl_name");
      g("1","2","3") = "test";
The #include <mumpsc/libmpscpp.h> statement brings in the necessary header files for you C++ program. These include, in addition to the header files necessary to access the toolkit, the standard system libraries:

      #include <iostream>
      #include <string>
      #include <string.h>
      #include <math.h>
      #include <stdlib.h>

These are referenced at the beginning of libmpscpp.h and you may modify them if your system uses different naming conventions.

Each global declaration creates a global array name (gbl) to be an object or instance of the global class. Each global array you use must be first declared to be an object of the global class.

You create a global by substituting the name of the global you want to create for "gbl" in the above. Global names can be any valid C/C++ variable name.

A global array will typically have one or more subscripts as discussed below. These will be of type mstring, string or "pointer to character" (examples: character arrays, character string constants, pointers to character strings). Subscripts of global arrays must evaluate to a printable characters in the range of decimal 32 (space) to, but not including, tilde (~).

Note:

No data types other than mstring, string or pointer to character may be used as subscripts. Numerics data types (int, short, long, float, double, etc.) may not be used as global array subscripts.

For a given global array reference, all the subscripts must be of the same data type.

mstring is a data type (class) whose behavior is similar to the basic string data type in Mumps. Objects of mstring are store internally as strings but may contain text, integers and floating point values. Addition, multiplication, subtraction, division, modulo, and concatenation may be performed directly on mstring objects (see details below). Many of the following examples use mstring objects.

Structure of Global Arrays

Global arrays may be viewed either as multi-dimensional matrices or as tree structured hierarchies. As matrices, data may be stored not only at fully subscripted matrix elements but also at other levels. For example, given a three dimensional matrix mat1, you could initialize it as follows:

#include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { mstring i,j,k; for (i=0; i<100; i++) for (j=0; j<100; j++) for (k=0; k<100; k++) { mat1(i,j,k)=0; } GlobalClose; return 0; } Alternatively, the above can be performed with strings but the numeric indices must be integer and converted to mstring before use: #include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { int i,j,k; for (i=0; i<100; i++) for (j=0; j<100; j++) for (k=0; k<100; k++) { mat1(mcvt(i),mcvt(j),mcvt(k))=0; } GlobalClose; return 0; }

In this example, all the elements of a three dimensional matrix of 100 rows, 100 columns and 100 planes are initialized to zero. The function cvt() converts from int to mstring.

In the view expressed by the code above, the matrix is a traditional three dimensional structure with data stored at each fully indexed position or node.

Unlike other programming languages, however, there are additional nodes of the matrix which could have been initialized such as indicated by the following example:

#include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { mstring i,j,k; for (i=0; i<100; i++) { mat1(i)=i; for (j=0; j<100; j++) { mat1(i,j)=j; for (k=0; k<100; k++) { mat1(i,j,k)=0; } } } return 0; }

In effect, this means that mat1 can also be a single dimensional vector, a two dimensional matrix and a three dimensional matrix simultaneously.

Furthermore, not all elements of a matrix need exist. That is, the matrix can be sparse. For example:

#include <mumpsc/libmpscpp.h>

      global mat1("mat1");

      int main() {
      mstring i,j,k;
      for (i=0; i<100; i=i+10)
            for (j=0; j<100; j=j+10) {
                  for (k=0; k<100; k=k+10) {
                        mat2(i,j,k)=0;
                        }
                  }
            }
      return 0;
      }

In the above, only index values 0, 10, 20, 30, 40, 50, 60, 70, 80, and 90 are used to create each of the dimensions of the array and only those elements of the matrix are created. The omitted elements do not exist.

For example, if you are running a drug protocol on a number of patients and are dosing with medications M1, M2, M3, ... on patients P1, P2, P3, ... and collecting observations on days D1, D2, D3, ... you could create a three dimensional matrix named protocol in which each plane consisted of the observations for each patient on each medication for a given day:

D1

D2

D3

D4

 

M1

M2

M3

M4

M5

 

M1

M2

M3

M4

M5

 

M1

M2

M3

M4

M5

 

M1

M2

M3

M4

M5

P1

 

 

 

 

 

P1

 

 

 

 

 

P1

 

 

 

 

 

P1

 

X

 

 

 

P2

 

 

 

 

 

P2

 

 

 

 

 

P2

 

 

 

 

 

P2

 

 

 

 

 

P3

 

 

 

 

 

P3

 

 

 

 

 

P3

 

 

 

 

 

P3

 

 

 

 

 

You could refer to patient P1, medication M2 on day D4 with the reference:

protocol("P1","M2","D4")="X";

Alternatively, you can view the same data base as a tree structure with patient id at the root, followed by medication, followed by day of study:

Note that at each node in the tree, a data box may appear containing information about the node. Addressing a node is accomplished by giving its path description such as:

protocol("P2","M2",D2)

Compiling Programs

To compile programs written in C++ that use the MDH (multi-Dimensional and Hierarchical) library, use the command:

      mumpsc myprog.cpp

This will invoke the g++ compiler and make available the necessary libraries. The result will be a program named myprog.cgi which is executable. The cgi extension is used as the default because very often these programs may be used in connection with web servers. You may rename the program as you see fit, however. The script mumpsc is part of the Mumps Compiler which must be installed prior to using the toolkit.

Accessing Global Arrays

Note: prior to exiting a program that accessed globals arrays, you should execute a GlobalClose macro to shut down the global array facility. This flushes the system buffers to disk and insures that the file system if properly closed. This appears in your program as:

GlobalClose;

There are several ways to insert and extract global array elements. They include:

  1. An overloaded form of the assignment operator;
  2. Functions applied to global class objects;
  3. An overloaded shift operator

You can create/modify elements of the global array using either the assignment or the shift operator. The indices of the global array may be specified as variables of type mstring, string, character string constants or pointers to character strings. The values stored at a global array node may be character string constants, pointers to strings, mstrings, strings, integers, other globals arrays and floating point values. Examples (where "index1" and "index2" may be of either type mstring or string):

      global array1("array1");
      global global_array("global_array");

      mstring matring_var="test";
      char * char_pointer="test";
      long long_variable=99;
      string string_variable="test";
      double double_variable=99.0;
      int int_variable=99;
      short short_variable=99;

      goobal_array("10")=99;

      array1("100") =        "character string";
      array1("101") =        mstring_var;
      array1(indx1) =        char_pointer;
      array1(indx1,"3") =    long_variable;
      array1(indx2,indx1) =  string_variable;
      array1("10","2","3") = double_variable;
      array1("10","2","4") = int_variable;
      array1("10","2","5") = global_array("10");

      array1("100")        << "character string";
      array1("101")        << mstring_var;
      array1(indx1)        << char_pointer;
      array1(indx1,"3")    << long_variable;
      array1(indx2,indx1)  << string_variable;
      array1("10","2","3") << double_variable;
      array1("10","2","4") << int_variable;
      array1("10","2","5") << global_array("1");

      mstring_var        = array1(indx1,"3");
      char_pointer       = array1(indx1,"3");
      string_variable    = (string &) array1(indx2,indx1);
      float_variable     = array1("10","2","3");
      int_variable       = array1("10","2","4");
      long_variable      = array1("10","2","5");
      short_variable     = array1("10");
      global_array("10") = array1("10");

      array1("100")        >> string_var;
      array1("100")        >> char_pointer;
      array1(indx1)        >> float_variable;
      array1(indx1,"3")    >> double_variable;
      array1(indx2,indx1)  >> string_variable;
      array1("10","2","3") >> int_variable;
      array1("10","2","4") >> long_variable;
      array1("10","2","5") >> global_array("1");
      array1("10","2","5") >> char_pointer;

Global arrays are sparse so not all elements need to exist. In the examples above, the lowest value of the first index of the global array is "10" but this does not imply that elements "1" through "9" exist.

The shift operator may only be used as shown above. It may not be used in multiple chained format as is the case with cin and cout. Internally, all data is stored at nodes in character string form. If you shift or assign a global array to a target whose data type is incompatible with the contents of the global array, for example, shifting text data into an integer variable, an error will result:

      int i;
      arrray1("100")="this is a test";
      i=array1("100");    // error - string cannot be converted to int
      array1("100") >> i; // error - string cannot be converted to int

Note: when assigning to a global from a pointer-to-character, the contents of the array pointed to by the pointer are copied to the global array whether you use the shift (<<) or assignment form (=). However, when you assign from a global to a pointer-to-character using the assignment operator, only the address of the character string is assigned to the pointer. The actual string is not copied and the pointer reference is valid only until the global array is referenced again. Instead, you should copy the contents of the character array to the target:

      char tmp[]="this is a test";
      array1("100")=tmp;        // works - char array is copied to global
      tmp=array1("100");        // error - attempt to alter value of pointer "tmp"
      strcpy(tmp,array("100")); // works - global value copied to "tmp"

The above notes only apply to char arrays - not to string or mstring data:

      string tmp="this is a test";
      array1("100")=tmp;        // works - string is copied to global
      tmp=(string &)array1("100");        // works - global is copied to string
      strcpy(tmp,array("100")); // error - string variables may not be used with strcpy()

Alternatively, if you use the shift operator form of assignment, character strings are copied to the address specified by the contents of the target pointer:

      char tmp[]="this is a test";
      array1("100") << tmp;     // works - char array is copied to global
      array1("100") >> tmp;     // works - value of global copied to char array

The reason for the above is restrictions in the C++ language with regard to handling the overloaded assignment operator: the left hand side of an assignment expression must be a class member. In order to bypass this for fundamental data types (int, float, etc.), we use an overloaded cast operator on the right hand side that converts the right hand side to a basic data type prior to non-overloaded assignment. Thus, in the case of character strings, only the pointer is assigned. If you use the assignment operator with a pointer to character, be aware that the pointer is only valid until the next access to the same global. After another access, the pointer is undefined. For other data types, the assignment is as expected.

If a numeric value is stored in a global, it may be assigned to an appropriate numeric variable. The assignment or shift operator will convert the strings stored in the global to the appropriate numeric form. It is important, however, that the data stored in the array nodes conform to the numeric type requested. For example:

      global array1("array1");

      long x;
      double y;
      string z;

      array1("1","2","3") = "test string";
      array1("1","2","4") = "100";
      array1("1","2","5") = "100.123";

      x = array("1","2","4");           // integer 100 assigned to x
      y = array("1","2","4");           // 100 converted to double and assigned to y
      z = (string &)array("1","2","4"); // character string "100" assigned to z
      x = array("1","2","5");           // integer 100 assigned to x
      y = array("1","2","5");           // 100.123 assigned to y
      x = array("1","2","3");           // error - string cannot be converted to long

Alternatively, the following shift operator versions have the same effect:

      array1("1","2","3") >> z;  // character string copied to z
      array1("1","2","4") >> x;  // 100 stored in x
      array1("1","2","4") >> y;  // 100. stored in y

When global array references are passed to function, no more than one instance of the same global object should be used in the argument list. Each global object maintains a private static string which contains the most recent value fetched from the data base. When a global object is passed to a function, its this string value is effectively passed. This means that, in a function reference where two references to the same global object are passed, even though they have differing indices, the value passed will be the value for the second instance of the global. This restriction only applies where there are two or more instances of the same global.

If you use a reference to a global without a parenthesized list following the name of the global, the reference will be to the most recent referenced global. Effectively, this is similar to the "naked indicator" from Mumps. Example:

global x("x"); x("123")="test"; cout << x("123") << endl; cout << x << endl; prints "test" twice.

Global Array Indices

Internally, the indices of global arrays are always stored as character strings (null terminated array of char). If you initialize a global array with a loop, you must insure that the indices are converted to an appropriate character string format before using them as global array indices. Indices to globals may be either char*, string or mstring but MUST all be of the same type (i.e. all string, all char * or all mstring). For example:

      mstring A,B,C;
      for (A=0; A<1000; A++)
            for (B=0; B<1000; B++)
                  for (C=0; C<1000; C++) {
                        array1(A,B,C) << "0";
                        }

The above initializes an array of 1 billion elements to zero.

Navigating Globals

There are several builtin functions used to navigate the globals. The two most important are the data functions and the order functions. The data functions tell you if a node exists and if it has descendants and the order functions give you the next higher (or lower) index at a given level in the global array tree.

The data functions return an integer which indicates whether the global array node is defined:

  1. 0 if the global array node is undefined;
  2. 1 if it is defined and has no descendants;
  3. 10 if it is defined but has no value stored at the node (but does have descendants);
  4. 11 it is defined and has descendants.
A global is defined is data has been stored at it. A "10" is returned for a node at which nothing has been stored but the node has descendants. For example, assuming the global array has only the contents created in the example below:

      global array1("array1");

      int result;

      array1("1","11") << "foo"
      array1("1","11","21") << "bar"

      result = array1("1").Data() ;            // yields 10
      result = array1("1","11").Data();        // yields 11
      result = array1("1","11","21").Data();   // yields 1 

The other major navigation functions are the Order() functions. These give you, for a given global array index, the next ascending or descending value for the last index. There are several forms of the function. For example:

      mstring x;
      global array1("array1");

      array1("100") = "a";            // initialize the array with three entries
      array1("200") = "b";
      array1("300") = "c";

      x = "";                          // initialize the index with empty string
      
      x = array1(x).Order(1);          // get the first value of the first index: 100
      cout << x << endl;               // writes 100

      x = array1(x).Order(1);          // get the second value of the first index: 200
      cout << x << endl;               // writes 200

      x = array1(x).Order(1);          // get the third value of the first index: 300
      cout << x << endl;               // writes 300

      x = array1(x).Order(1);          // get the next value of the first index: empty string
      if (x == "") 
            cout << "done" << endl     // write "done"

Each call to Order() gives the next value of the last index. The numeric qualifier indicates if the direction is ascending (1) or descending (-1). To get the first index, the empty string is supplied and the function returns the first index of the global array. For subsequent calls, it returns the next ascendant index value until there are no more indices. The it returns the empty string. The second parameter to each function invocation specifies the direction. A 1 means ascending key order and a -1 means descending key order. Thus, if in the above each of the 1's in the Order() function were replaced by -1, the sequence of values printed would be 300, 200, 100, empty rather than 100, 200, 300, empty.

In the following example, we build a global array vector from an input file consisting of keywords with one keyword per line, keep a count of each time the keyword is used, and, at the end, print an alphabetized list of the keywords followed by the number of times each occurs, do the following:

    #include <mumpsc/libmpscpp.h>
    global key("key"); 

    int main() { 

    mstring word;
    long i; 

    while (1) {
        if ( ! word.ReadLine(cin)) break;
        if (key(word).Data())      // is word in vector?
            key(word)++;           // yes, increment count
        else key(word) = 1;       // not in vector - add
        } 

    word = "";                              // empty string begins
    while ((word = key(word).Order(1)) != "") // next word
      cout << word << " " << key(word) << endl; // print word and count
    return 0;
    } 

In the above, each line is read into the variable word until the end of file is reached. Each word is tested with the Data() function of the global array to determine if word exists in the key vector. The Data() returns zero if the element does not exist, non-zero if it does. In the case where the word is in the key global array vector, the value stored in the vector for the word is extracted into the variable i, incremented and stored back into the vector. If the word does not exist in the vector, it is added and its initial count is set to one.

When all the words have been read and stored into the vector, the program sequences through the word entries and prints the words and the total number of times each one was present in the input file. Since global arrays are stored in ascending key order, the display of words will be alphabetic. The function that sequences through the vector is the Order()() function. When the function is passed a string containing a value, it returns the contents of the string with the next ascending index from the vector or the empty string if there are no indices in the vector greater than the string passed. If the empty string is passed to the function, the function replaces it with the first index in the vector.

Similarly, given a global array of patient lab data organized hierarchically first by patient id, then by lab test, then by date, we can print a table of patient id's, labs, dates and results with the following:

      #include <mumpsc/libmpscpp.h>
      global labs("labs");
      int main() {
      mstring ptid,lab_test,date,rslt;

      // create dummy example data base

      labs("1000","hct","July 12, 2003")="45";
      labs("1000","hct","July 13, 2003")="46";
      labs("1000","hct","July 14, 2003")="47";
      labs("1000","hct","July 15, 2003")="48";
      labs("1000","hgb","July 12, 2003")="15";
      labs("1000","hgb","July 15, 2003")="14";
      labs("1001","hct","July 12, 2003")="35";
      labs("1001","hct","July 13, 2003")="36";
      labs("1001","hct","July 14, 2003")="37";
      labs("1001","hct","July 15, 2003")="38";
      labs("1001","hgb","July 13, 2003")="15";
      labs("1001","hgb","July 14, 2003")="15";
      labs("1002","hct","Sept 12, 2003")="35";
      labs("1002","hct","Sept 13, 2003")="36";
      labs("1002","hct","Sept 14, 2003")="37";
      labs("1002","hct","Sept 15, 2003")="38";
      labs("1002","hgb","Sept 13, 2003")="15";
      labs("1002","hgb","Sept 14, 2003")="15";

      ptid = "";
      while (1) {
          ptid = labs(ptid).Order(1);
          if (ptid == "") break;
          lab_test = "";
          while (1) {
              lab_test = labs(ptid,lab_test).Order(1);
              if (lab_test == "") break;
              date = "";
                  while (1) {
                      date = labs(ptid,lab_test,date).Order(1);
                      if (date == "") break;
                      cout << ptid << " " << lab_test << " " << date ;
                      cout << " " << labs(ptid,lab_test,date) << endl;
                      }
                  }
              }
      GlobalClose;
      return 1;
      }

The above begins with an empty string for patient id ptid. This is used at the outer loop level to cycle through all the patient ids. At the first nexted loop, the program cycles through all the lab test names (lab_test) then at the innermost level, it cycles through all the dates (date). The resulting table is of the form:

      1000 hct July 12, 2003 45
      1000 hct July 13, 2003 46
      1000 hct July 14, 2003 47
      1000 hct July 15, 2003 48
      1000 hgb July 12, 2003 15
      1000 hgb July 15, 2003 14
      1001 hct July 12, 2003 35
      1001 hct July 13, 2003 36
      1001 hct July 14, 2003 37
      1001 hct July 15, 2003 38
      1001 hgb July 13, 2003 15
      1001 hgb July 14, 2003 15

Tabular Access to Globals

If the database from the previous example is modified slightly, it can be viewed purely as a table or a relation (for more detail on relational access, see below). To accomplish this, the data values are moved into the array reference as a final index and the empty string is stored for each node.

To perform tabular access to the database, we use the Select() primitive function which returns successive rows from a global array viewed as a tree. In the following example, we access and print the lab values for patient "1001":

#include <mumpsc/libmpscpp.h>
global labs("labs");

int main() {
mstring ptid,test,date,rslt;

// create dummy example data base

labs("1000","hct","July 12, 2003","45") = "";
labs("1000","hct","July 13, 2003","46") = "";
labs("1000","hct","July 14, 2003","47") = "";
labs("1000","hct","July 15, 2003","48") = "";
labs("1000","hgb","July 12, 2003","15") = "";
labs("1000","hgb","July 15, 2003","14") = "";
labs("1001","hct","July 12, 2003","35") = "";
labs("1001","hct","July 13, 2003","36") = "";
labs("1001","hct","July 14, 2003","37") = "";
labs("1001","hct","July 15, 2003","38") = "";
labs("1001","hgb","July 13, 2003","15") = "";
labs("1001","hgb","July 14, 2003","15") = "";
labs("1002","hct","Sept 12, 2003","35") = "";
labs("1002","hct","Sept 13, 2003","36") = "";
labs("1002","hct","Sept 14, 2003","37") = "";
labs("1002","hct","Sept 15, 2003","38") = "";
labs("1002","hgb","Sept 13, 2003","15") = "";
labs("1002","hgb","Sept 14, 2003","15") = "";

ptid="";
test="";
date="";
rslt="";

while ( labs(ptid,test,date,rslt).Select(ptid,test,date,rslt) != NULL ) {
      if (ptid != "1001") continue;
      cout << ptid << " " << test << " " << date << " " << rslt << endl;
      }
GlobalClose;
}

Rows of the database are presented in overall key ascending order. Those rows whose first columns do not contain "1001" re rejected while those continuing the value are printed.

Using the database from above, a set of simple speedup techniques can be applied by starting the scan at the patient id and terminating the scan when the next patient id appears:

#include <mumpsc/libmpscpp.h>
global labs("labs");

int main() {
mstring ptid,test,date,rslt;

ptid="1001";
test="";
date="";
rslt="";

while ( labs(ptid,test,date,rslt).Select(ptid,test,date,rslt) != NULL ) {
      if (ptid != "1001") break;
      cout << ptid << " " << test << " " << date << " " << rslt << endl;
      }
GlobalClose;
}

In the above example, by setting the initial value of ptid to "1001", the scan will begin at that point in the table. Any or all of the leading column values may be specified in this manner to target to a specific starting point. For example, to print the "hct" values only of patient "1001" using the database from above:

#include <mumpsc/libmpscpp.h>
global labs("labs");

int main() {
mstring ptid,test,date,rslt;

ptid="1001";
test="hct";
date="";
rslt="";

while ( labs(ptid,test,date,rslt).Select(ptid,test,date,rslt) != NULL ) {
      if (ptid != "1001" || test != "hct" ) break;
      cout << ptid << " " << test << " " << date << " " << rslt << endl;
      }
GlobalClose;
}

Note: if one or more column values are supplied, they must be the initial column values and they may be no intervening values specified as the empty string.

To copy the results to another global array using the database from above:

#include <mumpsc/libmpscpp.h>
global labs("labs");
global tmp("tmp");

int main() {
mstring ptid,test,date,rslt;

kill(tmp());  // delete any prior values

ptid="1001";
test="hct";
date="";
rslt="";

while ( labs(ptid,test,date,rslt).Select(ptid,test,date,rslt) != NULL ) {
      if ( test != "hct"  && rslt > "40" ) continue;
      cout << ptid << " " << test << " " << date << " " << rslt << endl;
      tmp(ptid,test,data,rslt) = "";  // build new array
      }
GlobalClose;
}

In the above example, the array tmp() is built consisting only of "hct" tests whose values were above "40". The array being built may be constructed from all, or some to the column values extracted from the source array, arranged in any order and may contain column values from other sources. For example, to identify all the patients with diagnosis code "Y06" whose "hct" values are less that "40" using a global array named diagnosis whose columns are patient id, diagnostic code and date and the labs() from above:

#include <mumpsc/libmpscpp.h>
global labs("labs");
global diagnosis("diagnosis");
global tmp("tmp");

int main() {
mstring ptid,test,date,rslt,dx,dxDate;

kill(tmp());  // delete any prior values

ptid="1001";
test="hct";
date="";
rslt="";

while ( labs(ptid,test,date,rslt).Select(ptid,test,date,rslt) != NULL ) {
      if ( test != "hct"  && rslt > "40" ) continue;
      if ( !diagnosis(ptid,"Y06").Data() ) continue; // row does not exist
      cout << ptid << " " << test << " " << date << " " << rslt << endl;
      tmp(ptid) = "";  // build new array
      }
GlobalClose;
}

Relational Operations on Globals

Global arrays, if properly constructed, can be the subject of basic relational operations. For example, consider the following:

Global array names and column meanings:

patient(P,NAME,ADDRESS,SEX)
lab(L,TEST,NORMALS)
med(M,MED,QTY)
ptlab(P,L,RSLT,DATE)
ptmed(P,M,DATE)

Where:      P is patient id number
            NAME is patient name
            ADDRESS is patient home city
            SEX is patient gender
            L is test id number
            TEST is lab test name
            NORMALS is lab test normal values
            M is medication id number
            MED is medication name
            QTY is quantity in inventory
            RSLT is lab test result
            DATE is date administration

The global arrays are defined in code as:

global patient("patient");
global lab("lab");
global med("med");
global ptlat("ptlab");
global ptmed("ptmed");

A possible set of values in the global array data base might be:

patient("001","Jones","Boston","Male") = "";
patient("002","Smith","New York","Female") = "";
patient("003","Blake","Washington","Male") = "";
patient("004","Doe","Hartford","Female") = "";
patient("005","Morley","New York","Male") = "";

lab("100","Hct","38-54%") = "";
lab("101","Hgb","14-18 Gm.") = "";
lab("102","Platelets","200-500k") = "";
lab("103","Acetone","0.3-2 mg/100ml") = "";
lab("104","Cholesterol","150-250 mg/100ml") = "";
lab("105","Creatinine","70-140 mcg/100ml") = "";
lab("106","Iron","75-175 mcg/100ml") = "";
lab("107","Uric Acid","3-6 mg/100ml") = "";

med("200","Protamine Sulfate","125") = "";
med("201","Quinidine Sulfate","150") = "";
med("202","Probenecid","90") = "";
med("203","Allopurinol","200") = "";
med("204","Colchicine","50") = "";
med("205","Hydrochlorothiazide","100") = "";

ptlab("001","107","8.5","1-Jul-84") = "";
ptlab("001","100","42","1-Jul-84") = "";
ptlab("002","103","250k","1-Aug-84") = "";
ptlab("003","107","80","1-Sep-84") = "";
ptlab("004","104","1.1","1-Oct-84") = "";
ptlab("005","107","9.0","1-Nov-84") = "";

ptmed("001","204","1-Jul-84") = "";
ptmed("001","205","1-Jul-84") = "";
ptmed("005","203","1-Nov-84") = "";
ptmed("005","206","1-Nov-84") = "";

Example queries answered by relational manipulations:

  1. Query: "Find the names of those patients who have received colchicine and probenecid."

          // get medication codes for medication names
    
          global t1("t1");
          kill (t1());
          mstring mcode="",mname="",qty="";
          while ( med(mcode,mname,qty).Select(mcode,mname,qty) != NULL) {
                if (mname != "colchicine" && mname != "benemid" ) continue;
                t1(mcode)=""; // create node by medication code
                }
    
          // get list of patients who are taking one or more of these meds 
    
          mstring ptid="",date="",code="";
          mcode="";
    
          // for each row of "ptmed"
          while ( ptmed(ptid,mcode,date).Select(ptid,mcode,date) != NULL) {
    
                code = "";
                // for each medication code sought
                while ( t1(code).Select(code) != NULL)
    
                      if (code == mcode ) {
                            // get/print the name and address of ptid
                            mstring name="",addr="";
                            patient(ptid,name,addr).Select(ptid,name,addr);
                            cout << "PTID=" << ptid << endl;
                            cout << "Name=" << name << endl;
                            cout << "Address << addr << endl;
                            }
                }
    

  2. Query: "Find the id numbers of those patients who are not receiving medication hydrochlorothiazide."

          /* get med code for med name */
    
          mstring mcode="",mname="",qty="";
          while ( med(mcode,mname,qty).Select(mcode,mname,qty) != NULL) {
                if (mname != "hydrochlorothiazide") continue;
                break;
                }
    
          /* get list of patients who are taking this med */
    
          mstring ptid="",code="",date="";
          global t1("t1");
          kill (t1());
          while ( ptmed(ptid,code,date).Select(ptid,code,date) != NULL)
                if (code == mcode ) t1(ptid) = "";
    
          /* create list t2() of patients who not in t1() */
    
          global t2("t2");
          kill (t2());
          while ( patient(ptid,name,addr,s).Select(ptid,name,addr,s) != NULL ) {
                if (t1(ptid).Data()) continue;
                t2(ptid)="";
                }
    
          /* get the names and address of patients in t2() */
    
          ptid="";
          while ( t2(ptid).Select(ptid) != NULL }
                mstring name="",addr="",s="";
                patient(ptid,name,addr,s).Select(ptid,name,addr,s);
                cout << "PTID=" << ptid << endl;
                cout << "Name=" << name << endl;
                cout << "Address << addr << endl;
                }
    

  3. Query: "Find the names of those patients whose uric acid is greater than 7 who are not receiving probenecid."

          /* get lab code number */
    
          mstring lcode="",test="",norm="";
          while (lab(lcode,test,norm).Select(lcode,test,norm) != NULL) 
                if (test == "Uric aAcid" ) break;
    
          /* find ptid's and rslt's of those who have had lcode > 7 */
    
          global t1("t1");
          while (ptlab(ptid,lcode,rslt).Select(ptid,lcode,rslt) != NULL) 
                if (rslt > 7) t1(ptid)="";
    
          /* get med code for "probenecid" */
    
          mstring mcode="",mname="",qty="";
          while ( med(mcode,mname,qty).Select(mcode,mname,qty) != NULL) {
                if (mname != "probenecid") continue;
                break;
                }
          ptid="";
          while (t1(ptid).Select(ptid) != NULL) {
                if (ptmed(p,mcode).Data()) {
                      mstring name="",addr="",s="";
                      patient(ptid,name,addr,s).Select(ptid,name,addr,s);
                      cout << "PTID=" << ptid << endl;
                      cout << "Name=" << name << endl;
                      cout << "Address << addr << endl;
                      }
                }
    

As can be seen, these manipulations have considerable similarity from one query to the next. The basic manipulations, from a relational algebra point of view are:

  1. Union: the set of all rows belonging to two global arrays provided that the number and meanings of the columns are the same. Example assuming two global arrays t1() and t2() each containing three columns and leaving the result in t3():

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                t3(a,b,c)="";
          a = b = c = "";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                t3(a,b,c)="";
          while (t2(a,b,c).Select(a,b,c) != NULL )
                t3(a,b,c)="";
    

  2. Intersection: the set of rows belonging to two global arrays:

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                if (t2(a,b,c).Data()) t3(a,b,c)="";
    

  3. Difference: the set of rows in one global array not in another:

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                if (!t2(a,b,c).Data()) t3(a,b,c)="";
    

  4. Cartesian Product: the set of rows consisting of all rows from one global array concatenated to all rows of a second global array:

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          mstring d="",e="",f="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      t3(a,b,c,d,e,f)="";
    

  5. Selection: the set of rows from a global array satisfying some predicate. The predicate can be any string expression involving the column values of the global array being inspected or other data known to the program:

          global t1("t1");
          global t2("t2");
          kill (t2()):
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                if (a == "aaa" && b < "bbb" ) t2(a,b,c)="";
    

  6. Projection: selecting one or more columns from a global array. For example:

          global t1("t1");
          global t2("t2");
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                t2(a,c)="";
    
    or, in combination with selection:
    
          global t1("t1");
          global t2("t2");
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL )
                if (a == "aaa" && b < "bbb" ) t2(a,c)="";
    

  7. Join: a Cartesian product where the selection of rows to concatenate is based on a predicate involving one or more columns from the participating globals. Example: if you have two relations named t1() and t2() each with three columns and you want to create a third relation t3() consisting of columns 1, 2, and 3 from relation t1() and columns 2 and 3 from t2(), where rows from the first relation are joined to rows in the second relation to form rows in the third relation if the value in the third column of the first relation is equal to the value in the first column of the second relation, you would write a code segment of the form:

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d="",e="",f="";
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      if ( c == d ) t3(a,b,c,e,f)="";
                }
    

    In the example code, the rows of both relations are scanned and the values of the third column from the first relation (variable "c") are compared with the values of the first column (variable "d") of the second relation. If each relation contains 100 rows, the above would test 10,000 row combinations. This could be speeded up considerably by re-writing the code as follows:

          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d=c,e="",f="";
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      if ( c != d ) break;
                      t3(a,b,c,e,f)="";
                }
    

    Here, each scan of the second relation begins with the first row containing a value for the first column which is equal to the third column of the first relation. The scan of the second relation terminates when the value of the first column is no longer equal to the value of the third column from the first relation.

    For comparisons other than equality:

          // join if col 3 of t1() < col 1 of t2()
    
          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d=c,e="",f="";
                // begin scan of t2() at value of d equal to c
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      // skip initial cases where c is still equal to d
                      if ( c == d ) continue; 
                      t3(a,b,c,e,f)="";
                }
    

    In the above, the scan of the second relation begins at the first row where the first column is equal to the third column of the first relation. The continue will cause those rows where "c" and "d" are equal to be skipped. Since the rows are presented in ascending key order, after the rows where "c" and "d" are skipped, there will follow only rows where "c" is less than "d".

    Similarly, for a greater-than relation:

          // join if col 3 of t1() > col 1 of t2()
    
          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d="",e="",f="";
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      // scan lines up so long as c is < than d
                      if ( c <= d ) break; 
                      t3(a,b,c,e,f)="";
                }
    

    The above terminates the inner loop when "c" is less than or equal to "d" . Prior to that point, where "c" is greater than "d", rows are joined.

    For relations involving columns that are not the initial columns of the second relation, other speed-up techniques are possible.

          // join if col 3 of t1() > col 3 of t2()
    
          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d="",e="",f="";
                while (t2(d,e,f).Select(d,e,f) != NULL )
                      // scan lines up so long as c is < than f
                      if ( c <= f ) break; 
                      t3(a,b,c,e,f)="";
                }
    

    The above will produce minimal savings as many combinations of "d" and "e" may need to be tried in locating rows with values of "f" that meet the search criteria. In such cases, it may be more efficient to build a temporary copy if the second relation with the columns reordered so that the scan can proceed more quickly:

          // join if col 3 of t1() > col 3 of t2()
    
          global t1("t1");
          global t2("t2");
          global t3("t3");
          kill (t3());
          kill (t4());
    
          mstring a="",b="",c="";
          while (t2(a,b,c).Select(a,b,c) != NULL )
                t4(c,a,b)=""; // reordered relation
    
          mstring a="",b="",c="";
          while (t1(a,b,c).Select(a,b,c) != NULL ) {
                mstring d="",e="",f="";
                while (t4(f,d,e).Select(f,d,e) != NULL )
                      // scan lines up so long as c is < than f
                      if ( c <= f ) break; 
                      t3(a,b,c,e,f)="";
                }
          kill (t4());
    

    In large joins which may result in many iterations of the inner loop, a single pass to build a temporary, reordered relation may be faster.

Builtin Relational Algebra Functions

There are several builtin relational functions, written in Mumps, that can be called from the C++ environment. To use these, you must include the following at the beginning of your C++ program:

#include <mumpsc/libmpsrdbms.h>

The functions available (implemented as macros) are:

  1. SELECT(arr,out,exp) - copy rows from global "arr" to global "out" if "exp" is true.
  2. PRINT(arr,exp) - print those rows of global "arr" for which "exp" is true.
  3. UNION(arr1,arr2,out) - copy rows of globals "arr1" and "arr2" to global "out".
  4. PROJECT(arr,out,cols) - copy only "cols" columns of rows of global "arr" to global "out".
  5. SUBTRACT(arr,sub,out) - copy rows of global "arr" to global "out" which are not in "sub".
  6. INTERSECT(arr,sub,out) - copy rows of global "arr" to to global "out" if there is an identical row in global "sub".
  7. JOIN(arr1,arr2,out,exp) - concatenate rows of global "arr1" and global "arr2" and copy to global "out" if "exp" is true.

For a full description, see the Mumps Compiler Programmers Guide section on Relational algebra for global arrays. The macros above correspond to the functions described in the manual except the macro names are all upper case. The actual functions, which have the same names except that only the first letter is in upper case and the remainder are lower case, have two additional initial parameters used internally by the Mumps service routines. The macros automatically substitute these added parameters.

The processing functions are wittten in Mumps and have been compiled to an object code library. When compiling a Mumps program for use with the class library, the first line of the Mumps program must be:

+#define CPP

This line causes the compiler to omit some lines of code that would conflict with the C++ runtime routines.

Locking the Data Base

There are several functions for locking portions of the data base. Following legacy convention, a lock does not prevent access to an element but merely flags the element as locked. Locking views a global array as a tree structure. If an element is locked, its descendants are locked. An attempt to lock a locked element of an element that has a locked parent or a locked descendant will fail. The primary locking functions are $lock(), Lock() and UnLock():

      if ($lock(gbl(a,b,c)) cout << "locked" << endl;
      if (gbl(a,b,c).Lock()) cout << "locked" << endl;
      gbl(a,b,c).UnLock();

The $lock() and Lock() functions test to see if the node can be locked and locks it if possible. It returns true (1) if successful and false (0) otherwise ($test is set accordingly). A node can be locked if it itself is not locked, if it has no descendants that are locked and if it is not the descendant of a locked node. The UnLock() function releases a lock on a node.

Additionally, there are functions to release all locks for the current process and all locks for all processes:

    CleanLocks();      // release all locks for this process only
    CleanAllLocks();  // release all locks for all processes

Invoking the Mumps Interpreter

The full facilities of the Mumps interpreter can be invoked from C++ programs. The interpreter reads, parses and executes commands presented to it at run time. It may also read and execute text files containing Mumps programs. The interpreter is invoked by means of the Xecute() macro and xecute() functions:

int Xecute("command")
int xecute(mstring command)
int xecute(string command)
int xecute(char * command)

These functions and macro invoke the Mumps interpreter and execute the text replacing "command". They return 1 of successful, 0 otherwise. With Xecute(), if the mumps command contains quotes or other special symbols, they will be automatically prefixed with backslashes (e.g., quote becomers \").

Xecute("set i="test"));
Xecute("for  s i=$order(^a(i)) quit:i=""  set sum=sum+^a(i)");

Details on the Mumps Language are contained in the file compiler.html in the mumpsc/doc subdirectory of the Mumps Compiler distribution. See also: mtring::Eval() for expression interpretation.

Programming Examples

Hashing Example

The following example stores lines of text into a global array based on a hash function calculation of each line. It reads lines of text from stdin and submits each line to a simple hash function that produces an unsigned long which is converted to character string (char *) and returned. The resulting character string is copied to the string variable x. The input line is stored at hash_table(x,ii) where ii is a string value between 0 and 999. The value if ii is determined by locating the first ascending integer not already in use. If a given hash result produces more that 1000 collisions, the process terminates with an error message.

#include <mumpsc/libmpscpp.h>
global hash_table("hash"); // global array 
int main() {
      char in[1024];
      string x;
      long i;
      
while (fgets(in,1024,stdin)!=NULL) {
      x = hash(in);  // hash input line
      for (i=0; i<1000; i++) {
            string ii=cvt(i);
            if (hash_table(x,ii).Data()==0) {  // find a slot
                  hash_table(x,ii)=in;  // add line to database
                  cout << x << "," << ii  << " " << in << endl;
                  break;
                  }
            }
      if (i>1000) { 
            cout << "Too many collisions " << x << endl;      
            GlobalClose;
            return 1;
            }
      }
      GlobalClose;
      return 0;
      }

Linking to Compiled Mumps Functions

You may compile functions in Mumps and call them from C++ programs. If you do, you must begin each file of functions with:

#define CPP

which disables some code that would otherwise conflict with the class libraries. If you do not use the class libraries, you may omit this line.

See the Mumps Compiler Programmers Guide for details.

Writing Active Web Server Pages

C++ programs can be written with the toolkit to be web server active pages. For example:

Web page HTML code:

<html> <head> <title>Your title goes here</title> </head> <body bgcolor=silver> <form method="get" action="quiz2.cgi"> <center> Name: <input type="text" name="name" size=40 value=""> <br> </center> Class: <input type="Radio" name="class" value="freshman" > Freshman <input type="Radio" name="class" value="sophmore" > Sophmore <input type="Radio" name="class" value="junior" > Junior <input type="Radio" name="class" value="senior" checked> Senior <input type="Radio" name="class" value="grad" > Grad Student <br> Major: <select name="major" size=1> <option value="computer science" >computer science <option value="mathematics" >Mathematics <option value="biology" selected>Biology <option value="chemistry" >Chemistry <option value="earth science" >Earth Science <option value="industrial technology" >Industrial Technology <option value="physics" >Physics </select> <table border> <tr> <td valign=top> Hobbies: </td> <td> <input type="Checkbox" name="hobby1" value="stamp collecting" > Stamp Collecting<br> <input type="Checkbox" name="hobby2" value="art" > Art<br> <input type="Checkbox" checked name="hobby3" value="bird watching" > Bird Watching<br> <input type="Checkbox" name="hobby4" value="hang gliding" > Hang Gliding<br> <input type="Checkbox" name="hobby5" value="reading" > Reading<br> </td></tr> </table> <input type="submit" value="go for it"> </form> </body> </html>
A C++ program can accept data from the web page, store the data in global arrays and return a summary web page to the browser. When using "get" mode data transmission from HTML forms, the form names and data are concatenated into a string, delimited by ampersands, containing "name=value" tokens. These are passed in an environment variable named QUERY_STRING. The include file mumpsc/cgi.h contains code to extract data from QUERY_STRING and store the data in the runtime symbol table. The function SymGet() can be used to retrieve values from runtim symbol table.
#include <mumpsc/libmpscpp.h> global T("T"); int main() { mstring name; mstring class; mstring major; mstring hobby1; mstring hobby2; mstring hobby3; mstring hobby4; mstring hobby5; #include <mumpsc/cgi.h> cout << "Content-type: text/html " << endl << endl; name = SymGet("name"); class = SymGet("class"); major = SymGet("major"); hobby1 = SymGet("hobby1"); hobby2 = SymGet("hobby2"); hobby3 = SymGet("hobby3"); hobby4 = SymGet("hobby4"); hobby5 = SymGet("hobby5"); cout << "<html><body>"; if (name == "") { cout << "Name not specified <br> "; cout << "</body></html>" << endl; return EXIT_FAILURE; } T(name,mcvt("class"))=class; T(name,mcvt("major"))=major; if (hobby1.Length() Length() != 0 ) T(name,mcvt("hobbies"),hobby1)=""; if (hobby2.Length() != 0) T(name,mcvt("hobbies"),hobby2)=""; if (hobby3.Length() != 0) T(name,mcvt("hobbies"),hobby3)=""; if (hobby4.Length() != 0) T(name,mcvt("hobbies"),hobby4)=""; if (hobby5.Length() != 0) T(name,mcvt("hobbies"),hobby5)=""; cout << "Thank you " << name << " for your input<br>"; cout << "</body></html>" << endl; return EXIT_SUCCESS; }
Note: you can test code by simulating input from a web browser with the following code:
#!/bin/bash
QUERY_STRING="abc=xyz&cde=123"
export QUERY_STRING
your_program.cgi

The "name=value" sets (delimted by ampersands) will be passed to the program. Note: web server cgi protocol requires the value strings to be encoded (see EncodeHTML()).

Hash class

The hash class permits quick direct access by means of a hash table. Objects of the hash class are created by:

hash hashname(filename, filesize, filedisp);

where:

hashname is the name of the object;
filename is the name of the external file name of the object with ".key" appended;
filesize is the size in bytes of the object (at least 1,000, default 100,000);
filedisp is the disposition: "new" or "old".

If the disposition is "new", a new hash will be created and any previous hash discarded. If the disposition is "old", a previously existing disk based hash object will be used. The default is "new". Example:

hash x("x",10000,"new");

will create an object named "x" which will reside in a file named "x.key" that will be 10,000 bytes long and will be newly created.

You may assign values to a hash object by providing the value and the hash key:

x("key one")="test";

where "key one" is a string key and "test" is the value to be stored.

Values may be retrieved into strings by:

string s;
s=x("key one");

You may replace the value stored at a hash key with one of equal or shorter length with no penalty. If the replacement value is longer, the original space will be marked as unavailable, new space will be allocated and the old space will not be reused. In the event of a collision (i.e., two keys produce the same hash code), the functions search forwards in the file for available space. The value for a key is stored immediately after the key in the file.

Class mstring

The mstring class provides Mumps-like strings that can be used to write programs in C++ that treat variables in a manner similar to that of Mumps. This means that mstring objects are essentially strings on which arithmetic operations may be performed. For example:

#include <mumpsc/libmpscpp.cpp> global x("x"); int main() { mstring a,b,c; a="hello "; b="world"; cout << (a || b) << endl; // concatenation // prints "hello world" for (a=0; a<10; a++) cout << a << endl; // prints 0 thru 9 for (a=0; a<10; a++) x(a)=a; // sets global array elements a="" while (1) { a=x(a).Order(1); if (a=="") break; cout << a << endl; // prints 0 thru 9 } cout << x(a).Data() << endl; // prints 1 c="123 elm street"; c=c+1; cout << c << endl; // prints 124 return EXIT_SUCCESS; }

Note: the code "(a || b)" in the cout expression is parenthesized. If not parenthesized, the C++ compiler precedence will result in an error.

Objects of class mstring may:

  1. Contain character strings, integers or floating point values;

  2. Be assigned to from char *, string, mstring, float, int, or double.; Objects of mstring may be initialized with character string constants in declaration statements.

  3. Participate in add(+, +=), subtract(-, -=), multiply(*, *=), divide(/, /=), modulo (%, %=) (integers values only) pre/post increment/decrement (++/--), and concatenation (||) operations. The mode of the operation will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

  4. Participate in relational expressions >, >=, <, <=. The mode of comparison will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

  5. Participate in equality expressions == and !=. The mode of the comparison will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

  6. Participate in input and output stream operations >> and <<.

  7. Participate in assignment to objects of mstring and string.

  8. Be declared as arrays or allocated/freed by the new/delete operators. Only numeric subscripts permitted at this time.

Access functions defined on mstring are:

  1. c_str() - returns the address of a character string containing the value in the mstring.

  2. Eval() - evaluates (interprets) the expression in the invoking mstring object and returns a mstring contianing the result. Thorws InterpreterException on error.
  3. piece(pattern, start) or piece(pattern, start, end) - returns an mstring delimited by the pattern. The parameter "pattern" may be of type mstring or char *. The parameters "start" and "end" may be of types mstring or int.

  4. s_str() - returns a string containing the value in the mstring.

Objects of mstring may be passed to all Mumps $ functions. However, if an object of type mstring is to be used in connection with the interpreter, it must be declared with a string giving its name in the runtime symbol table. FOr example:

      mstring x("x");


Btree Access

Programmers may access the btree directly through the builtin BTREE macro. A number of examples can be found in mumpsc/doc/examples/btree in the distribution.

To access the btree directly from a C++ program:

You must first install the Mumps compiler and MDH. Include at the beginning of your program. You can now access the btree directly with the BTREE macro (see description below). Note: any keys you store in the btree co-exist with Mumps/MDH keys. In rare cases, these can interfere with one another if a key you store lies in the range of a global array key set.

For example, the following program stores NBR_ITERATIONS (defined in btree.h which is included by libmpscpp.h usually with the value 100,000) of keys and data into the btree and then retrieves them (this "btest1.cpp" from mumpsc/doc/examples/btree.cpp). See the other examples and the documentation below for further details.

/*#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *#+ Mumps Compiler Run-Time Support Functions *#+ Copyright (c) A.D. 2001, 2002, 2003, 2004 by Kevin C. O'Kane *#+ okane@cs.uni.edu *#+ *#+ This library is free software; you can redistribute it and/or *#+ modify it under the terms of the GNU Lesser General Public *#+ License as published by the Free Software Foundation; either *#+ version 2.1 of the License, or (at your option) any later version. *#+ *#+ This library is distributed in the hope that it will be useful, *#+ but WITHOUT ANY WARRANTY; without even the implied warranty of *#+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *#+ Lesser General Public License for more details. *#+ *#+ You should have received a copy of the GNU Lesser General Public *#+ License along with this library; if not, write to the Free Software *#+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *#+ *#+ http://www.cs.uni.edu/~okane *#+ *#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *#+ *#+ Some of this code was originally written in Fortran *#+ which will explain the odd array and label usage, *#+ especially arrays beginning at index 1. *#+ *#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #include <mumpsc/libmpscpp.h> int main() { long i,j; unsigned char key[1024],data[1024]; printf("Store sequentially ascending keys"); for (i=0; i<NBR_ITERATIONS; i++) { sprintf( (char *) key,"key %ld",i); sprintf( (char *) data,"%ld%c",i,0); if (!BTREE(STORE,key,data)) { printf("error\n"); return 1; } if (i%60000L==0) { printf("\n %ld ",i); fflush(stdout); } if (i%1000==0) { putchar('.'); fflush(stdout); } } printf("\nretrieve"); for (i=0; i<NBR_ITERATIONS; i++) { sprintf( (char *) key,"key %ld",i); if (!BTREE(RETRIEVE,key,data)) { printf("error 1\n"); return 1; } sscanf( (char *) data,"%ld",&j); if (j!=i) { printf("error 2\n"); printf("%d != %d\n",i,j); return 1; } if (i%60000L==0) { printf("\n %ld ",i); fflush(stdout); } if (i%1000==0) { putchar('.'); fflush(stdout); } } printf("\nlooks good!\n"); strcpy( (char *) key,""); strcpy( (char *) data,""); BTREE(CLOSE,key,data); return 1; }

Function and Macro Library

The following gives details on all the MDH functions and macros. Many have the same or similar syntax to the underlying legacy functions. The discussion assumes that "gbl" has been declared as above. The example indices ("a,b,c") are for illustration purposes. Your actual globals array reference will be different. Many of the functions below mimic the same legacy functions. Please note that not all functions accept all possible argument data types. Check the function definition below for details.

  1. int mstring::Ascii()
    int mstring::Ascii(int start)

    Returns the numeric value of an ASCII character. If no "start" is specified, the numeric values of the first character of invoking mstring is used. If "start" is specified, the numeric value of "start"'th character of nvoking is chosen. If the empty string is given, -1 is returned. For example:

    mstring a;
    a="ABC";
    a.Ascii() yields 65
    a.Ascii(1) yields 65
    a.Ascii(2) yields 66
    

  2. Arithmetic operations on global arrays

    The operations of add, subtract, multiply, divide, pre/post increment and pre/post decrement are defined (overloaded) for global variables. The operations are defined for short, unsigned short, int, unsigned int, long, unsigned long, float and double. Note: the contents of the global array node must be compatible with the dominant data type of the operation. If the contents of a global are not compatible with the operation (example, incrementing a string of text), the value of the global will be interpreted as zero. Examples:

    global gbl("gbl"); int i, j=10; string a = "10", b = "20", c = "30"; gbl(a,b,c) = 10; i = gbl(a,b,c) + 20; cout << i << endl; // prints 30 i = 20 + gbl(a,b,c); cout << i << endl; // prints 30 i = gbl(a,b,c) / j; cout << i << endl; //prints 3 i = gbl(a,b,c) * 2; cout << i << endl; // prints 20 gbl(a,b,c) ++; cout << gbl(a,b,c) << endl; // prints 11 gbl(a,b,c) --; cout << gbl(a,b,c) << endl; // prints 10 i = ++ gbl(a,b,c); cout << i << " " << gbl(a,b,c) << endl; // prints 11 i = gbl(a,b,c) ++; cout << i << " " << gbl(a,b,c) << endl; // prints 11 12 gbl(a,b,c) += 10; cout << gbl(a,b,c) << endl; // prints 22 gbl(a,b,c) -= 10; cout << gbl(a,b,c) << endl; // prints 12 gbl(a,b,c) *= 2; cout << gbl(a,b,c) << endl; //prints 24 gbl(a,b,c) /= 2; cout << gbl(a,b,c) << endl; // prints 12

  3. Assignment operations on global arrays

    Assignments to and from global arrays may be accomplished either with overloaded versions of the shift operators (<< and >>) or the assignment operator (=). Originally, only the shift forms were permitted since restrictions in the C++ language made it difficult to construct assignments from globals to ordinary data types. This was bypassed by using an overloaded impled cast operator which permits most forms of assignment.

    When you access a global array, the access may result in the thrown error exceptions GlobalNotFoundException and/or ConversionException. The first can occur in any context that attempts to retrieve data from a global array where none exists. The second occurs if you attempt to convert the contents of a global to a numeric type where the contents of the global are not valid data for the conversion.

    If uncaught, both exceptions will result in program termination. Both exceptions may be caught, however, with code such as the following:

    #include <mumpsc/libmpscpp.h> global a("a"); int main() { long i; kill(a()); a("1") = "now is the time"; try { i = a("1"); } catch ( ConversionException ce) { cout << ce.what() << endl; } try { i = a("22"); } catch (GlobalNotFoundException nf) { cout << nf.what() << endl; } return 0; }

    The following discussion is divided into two parts: assignment to global arrays and assignment from global arrays:

    Assignment TO global arrays

    Assignments using the overloaded assignment operator are permitted using the following assignment operator overloads:

    global & global::operator = (char * ) 
    global & global::operator = (int)
    global & global::operator = (string)
    global & global::operator = (mstring)
    global & global::operator = (double)
    global & global::operator = (global);
    global & global::operator = (unsigned int);
    global & global::operator = (float);
    global & global::operator = (short);
    global & global::operator = (unsigned short);
    global & global::operator = (long);
    global & global::operator = (unsigned long);
    

    Assignment to a global array using the "=" operator is enabled for right hand side variables of types character array (char *), mstring, string, integer, double, etc (see above). For example:

    gbl(a,b,c) = "test string";
    gbl(a,b,c) = 123;
    gbl(a,b,c) = 123.45;

    Assignment to global arrays can alos be accomplished by using the overloaded shift operator:

    global & global::operator << (char *) 
    global & global::operator << (int) 
    global & global::operator << (unsigned int)
    global & global::operator << (short)
    global & global::operator << (unsigned short)
    global & global::operator << (long)
    global & global::operator << (unsigned long)
    global & global::operator << (float)
    global & global::operator << (double)
    global & global::operator << (string)
    global & global::operator << (mstring)
    global & global::operator << (global)
    
    Examples:

    char source_char[32]; strcpy(source_char,"test data"); gbl(a,b,c) << source_char; gbl(a,b,c) = source_char; long source_long = 100; gbl(a,b,c) << source_long; gbl(a,b,c) = source_long; double source_double = 100.123; gbl(a,b,c) << source_double; gbl(a,b,c) = source_double; string source_string = "test data"; gbl(a,b,c).Set(source_string); gbl(a,b,c) << source_string;

    Assignment from global arrays to other data types:

    Assignment from global arrays by overloaded shift operator:

    char *         global::operator >> (char *)
    int            global::operator >> (int &)
    unsigned int   global::operator >> (unsigned int &)
    long           global::operator >> (long &)
    unsigned long  global::operator >> (unsigned long &)
    short          global::operator >> (short &)
    unsigned short global::operator >> (unsigned short &)
    float          global::operator >> (float &)
    double         global::operator >> (double &)
    string         global::operator >> (string &)
    mstring         global::operator >> (mstring &)
    global &       global::operator >> (global)
    

    Alternatively, the overloaded cast operator can be utilized in combination with the assignment operator ("="):

    global::operator char*()
    global::operator string &()
    global::operator int()
    global::operator unsigned int()
    global::operator short()
    global::operator unsigned short()
    global::operator long()
    global::operator unsigned long()
    global::operator float()
    global::operator doublee()

    Each of the above converts the value stored at a global variable to a builtin data type, mstring or string.

    Note: the C++ language specification does not permit a fundamental data type (e.g. int, double, char) to be placed on the left hand side of an overloaded assignment ("=") operator. In order to get around this, we use two techniques: (1) the overloaded right-shift operator; and (2) the overloaded cast operator.

    Assignment by overloaded right-shift operator copies and, if necessary, converts the global array string value to the target. It works in all cases.

    The overloaded cast operator permits fundamental data types to be placed on the left hand side of the assignment operator ("=") and a global array reference to be placed on the right hand side. When the C++ compiler detects a fundamental data type on the left hand side of the assignment operator and a global on the right hand side, it invokes a default cast to convert the right hand side.

    In all cases except the one in which the left hand side is char *, the overloaded cast will make the necessary conversion and the assignment will take place as expected. In the case of a char * left hand side, only a pointer to a char * is copied from the global to the left hand side char *. The pointer copied is the address of a public char * string in the global class that contains the value of the global. This pointer is only valid until the next reference to the same global object.

    Assignments from global to string or mstring using the assignment operator ("=") copy the value from the global to the object of type string or mstring. Assignments from global to string require a cast of the global consisting of (string &).

    The case of assignment from global to mstring is handled by an overload in the class mstring. The case of assignment from global to string is handled by class string which copies the contents of the character string pointed to by the global.

    Examples using an arbitrary global array reference with three string indices named gbl(a,b,c) at which is stored the string "12345":

    1. Right-shift based assignments (overloaded ">>" operator):

      char target[32]; long s_long; unsigned long u_long; int s_int; unsigned int u_int; short s_short; unsigned short u_short; float float_var; double double_var; string str_var; mstring mstr_var; gbl(a,b,c) >> target; gbl(a,b,c) >> s_long; gbl(a,b,c) >> u_long; gbl(a,b,c) >> s_int; gbl(a,b,c) >> u_int; gbl(a,b,c) >> s_short; gbl(a,b,c) >> u_short; gbl(a,b,c) >> float_var; gbl(a,b,c) >> double_var; gbl(a,b,c) >> str_var; gbl(a,b,c) >> mstr_var; int i; test("1") << 10; // initialize while ( ( test("1") >> i ) > 0 ) { cout << i << endl; test("1") --; }

    2. Cast operator based assignments:

      char *p1; long s_long; unsigned long u_long; int s_int; unsigned int u_int; short s_short; unsigned short u_short; float float_var; double double_var; string str_var; p1 = gbl(a,b,c); s_long = gbl(a,b,c); u_long = gbl(a,b,c); s_int = gbl(a,b,c); u_int = gbl(a,b,c); s_short = gbl(a,b,c); u_short = gbl(a,b,c); float_var = gbl(a,b,c) double_var = gbl(a,b,c); str_var = (string &) gbl(a,b,c);

      The character string stored at gbl(a,b,c) is converted and copied to the target variable in all cases except those where the left hand side of the assignment operator is char *. This case does not copy character strings but, instead, copies the address of a string containing the result to the target (see first example).

      This also means the the left hand side of an assignment operator may not be the name of an array of type char since this implies altering the address of the array. The only permitted char left hand side would be a variable pointer to char.

      The value copied to the pointer will be a public address of an array of char in the class containing the value of the global reference. This reference is valid only until another reference to the same global object. This usage is not preferred. Instead, used the shift form or strcpy():

      char out[STR_MAX];
      strcpy(out, gbl(a,b,c));

      The overloaded cast assignment form may may be used within larger programming structures such as:

      int i; test("1") = 10; // initialize while ( ( i = test("1") ) > 0 ) { cout << i << endl; test("1") --; }

      The above will print numbers 10 through 1.

  4. double global::Avg()

    Returns the average of the values of data bearing nodes beneath the given global array reference. Example:

    global A("A"); mstring i,j; for (i=0; i<1000; i++) for (j=1; j<10; j++) { A(i,j) << j; } cout << A("100").Avg() << endl; // average of nodes below A("100") cout << A().Avg() << endl; // average of all nodes

    The above prints 5.5 - the average value of numeric data bearing nodes beneath A("100"). If there are non-numeric data elements, they are treated as a zero values and contribute to the result.

    The global array object must be specified with indices (i.e., a parenthesized list must follow the name of the global array object. An empty list means the entire array.

  5. int mstring::begins(mstring pattern);

    Returns an integer which is the starting point in the string of pattern or -1 if the pattern is not found. Throws: PatternException if the pattern is in error.

  6. Boyer-Moore-Gosper Functions

    extern "C" int bmg_fullsearch(char * search_string, char * buffer_base);
    int bmg_fullsearch(string search_string, string buffer_base);
    int bmg_fullsearch(mstring search_string, mstring buffer_base);
    int bmg_fullsearch(mstring search_string, global buffer_base);
    int bmg_fullsearch(mstring search_string, global buffer_base);

    Returns the number of non-overlapping instances of "search_string" in "buffer_base". This function is covered by the LGPL license. The string, global and mstring versions are less efficient than the char * version since they copy the contents of the string, global or mstring variable to a character array.

    Note: if you use the char * version, the "buffer_base" may NOT be a character constant or pointer to character constant. The search routines modify this string and using a character constant will generate a segmentation fault on most machines.

    Examples:

    #include <mumpsc/libmpscpp.h> int main() { string a="now is the time for all good men to come to the aid of the party"; string b="to"; cout << bmg_fullsearch(b,a) << endl; char xa[100]; strcpy(xa,"now is the time for all good men to come to the aid of the party"); cout << bmg_fullsearch("to",xa) << endl; return 0; } yields: 2 2

    All use the following functions which are not covered by the GPL/LGPL:

    extern "C" void bmg_setup(char * search_string, int.case_fold_flag);
    extern "C" int bmg_search(char * buffer_base, inti buffer_length, int (*action_func)());

    These functions are publically available from:

    ftp://ftp.uu.net/usenet/comp.sources.unix/volume5/bmgsubs.Z

    and are believed to be contributed source and are unrestricted with respect to use and redistribution, and, that most, if not all, the code was written by employee(s) of the United States and thus in the public domain. The distribution contains, in part, the following notes:

    Here are routines to perform fast string searches using the
    Boyer-Moore-Gosper algorithm; they can be used in any Unix program (and
    should be portable to non-Unix systems).  You can search either a file
    or a buffer in memory.
    
    The code is mostly due to James A. Woods (jaw@ames-aurora.arpa)
    although I have modified it heavily, so all bugs are my fault.  The
    original code is from his sped-up version of egrep, recently posted on
    mod.sources and available via anonymous FTP from ames-aurora.arpa as
    pub/egrep.one and pub/egrep.two.  That code handles regular
    expressions; mine does not.
    
    These have only been tested on 4.2BSD Vax systems.
    
    -Jeff Mogul
    mogul@navajo.stanford.edu
    decwrl!glacier!navajo!mogul
    

    BMGSUBS(3L)							   BMGSUBS(3L)
    
    NAME
           (bmgsubs)   bmg_setup,  bmg_search,  bmg_fsearch	 -  Boyer-Moore-Gosper
           string search routines
    
    SYNOPSIS
           bmg_setup(search_string, case_fold_flag)
           char *search_string;
           int case_fold_flag;
    
           bmg_fsearch(file_des, action_func)
           int file_des;
           int (*action_func)();
    
           bmg_search(buffer_base, buffer_length, action_func)
           char *buffer_base;
           int buffer_length;
           int (*action_func)();
    
    DESCRIPTION
           These routines perform fast searches  for  strings,  using  the	Boyer-
           Moore-Gosper  algorithm.	  No meta-characters (such as `*' or `.')  are
           interpreted, and the search string cannot contain newlines.
    
           Bmg_setup must be called as the first step in performing a search.  The
           search_string   parameter   is	the   string   to   be	searched  for.
           Case_fold_flag should  be  false	 (zero)	 if  characters	 should	 match
           exactly,	 and  true  (non-zero) if case should be ignored when checking
           for matches.
    
           Once a search string has been specified using bmg_setup,	 one  or  more
           searches for that string may be performed.
    
           Bmg_fsearch  searches  a	 file,	open  for  reading  on file descriptor
           file_des (this is not a stdio file.)  For each line that	 contains  the
           search string, bmg_fsearch will call the action_func function specified
           by the caller as action_func(matching_line, byte_offset).   The	match-
           ing_line	 parameter  is	a  (char *) pointer to a temporary copy of the
           line; byte_offset is the offset from the beginning of the file  to  the
           first  occurence of the search string in that line.  Action_func should
           return true (non-zero) if the search should continue, or	 false	(zero)
           if the search should terminate at this point.
    
           Bmg_search  is  like  bmg_fsearch,  except  that instead of searching a
           file, it searches the buffer pointed to by  buffer_base;	 buffer_length
           specifies the number of bytes in the buffer.  The byte_offset parameter
           to action_func gives the offset from the beginning of the buffer.
    
           If the user merely wants the matching lines  printed  on	 the  standard
           output,	the  action_func parameter to bmg_fsearch or bmg_search can be
           NULL.
    
    AUTHOR
           Jeffrey Mogul (Stanford University), based on code written by James  A.
           Woods (NASA Ames)
    
    BUGS
           Might  be  nice	to have a version of this that handles regular expres-
           sions.
    
           There are large, but finite, limits  on	the  length  of	 both  pattern
           strings	and  text lines.  When these limits are exceeded, all bets are
           off.
    
           The string pointer passed to action_func points to a temporary copy  of
           the  matching  line,  and  must	be copied elsewhere before action_func
           returns.
    
           Bmg_search does not permanently modify the buffer in any way, but  dur-
           ing  its execution (and therefore when action_func is called), the last
           byte of the buffer may be temporarily changed.
    
           The Boyer-Moore algorithm cannot find lines that do not contain a given
           pattern	(like  "grep  -v") or count lines ("grep -n").	Although it is
           fast even for short search strings, it gets faster as the search string
           length increases.
    
    				  16 May 1986			   BMGSUBS(3L)
    

  7. int BTREE(int code, unsigned char * key, unsigned char * data)

    BTREE() is a macro permitting direct access to the underlying btree system. The first argument, "code" is an integer indicating the operation to be performed (see below). The second argument is the key to be stored consisting of a null-terminated array printable ASCII characters. The length of the key should be no greater than one quarter of the btree block size whose default value is 8192 (i.e., max key length is about 2048 bytes in the default case). The third argument is the data to be stored with the key. It is a null-terminated string of printable ASCII characters not greater than the system defined limit STR_MAX (defaults to 4096). An empty string is interpreted as no data to be stored. Note that the second and third arguments must be unsigned char *. The macro returns an integer indicating success. It may also alter "key" or "data" to return values or for other purposes. The contents of "key" and "data" are not preserved across in invocation of BTREE() Examlples of using BTREE() are given in mumpsc/doc/examples/btree.

    Permitted btree operations:

    1. STORE - store a key and data value in the btree; retuns zero if successful, non-zero otherwise:
            unsigned char key[]="test key";
            unsigned char data[]="test data";
            if ( BTREE(STORE,key,data) == 0 ) cout << "stored" << endl;
            else cout << "not stored" << endl;
      
    2. RETRIEVE - retrieve data stored with a key; returns zero if successful, non-zero otherwise:
            unsigned char key[]="test key";
            unsigned char data[STR_MAX];
            if ( BTREE(RETRIEVE,key,data) == 0 ) cout << "retrieved: " << data << endl;
            else cout << "not retrieved." << endl;
      
    3. CLOSE - close the btree data base; returns zero:
            unsigned char key[]="";
            unsigned char data[]="";
            BTREE(CLOSE,key,data);
      
    4. XNEXT/PREVIOUS - retrieve next ascendina/descending key; returns one. Value of second and third arguments become the value of the next ascendina/descendingg key. An initial value of the empty string for the second argument will retrieve the first/last key and the value of the second argument becomes the empty string when there are no more ascending/descending values. An initial value of the empty string for the second argument will retrieve the first/last key.
            unsigned char key[]="";
            unsigned char data[STR_MAX];
            printf("\nbegin retrieve...\n");
            while(1) { // rerteive keys in ascending order
                  i=BTREE(XNEXT,key,data);
                  if (strlen( (char *) data)==0) break;
                  cout << key << endl;
                  }
      
      
  8. void global::Centroid(global B)

    A centroid vector B is calculated for the invoking two dimensional global array. The centroid vector is the average value for each for each column of the matrix. Any previous contents of the global array named to receive the centroid vector are lost. The invoking global array (A) must contain at least two dimensions. For example:

    #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { mstring i,j; for (i=0; i<10; i++) for (j=1; j<10; j++) { A(i,j) << 5; } A().Centroid(B()); mstring a=""; while (1) { a=B(a).Order(1); if (a=="") break; cout << a << " --> " << B(a) << endl; } return 0; }
    Yields:
    1 --> 5 2 --> 5 3 --> 5 4 --> 5 5 --> 5 6 --> 5 7 --> 5 8 --> 5 9 --> 5

    The above yields a vector giving the average value of each named column of the matrix "A" (5 in this case since each column is initialized with 5).

  9. void CleanLocks(void)
    void CleanAllLocks(void);

    "CleanLocks()" removes all locks for the current process. "CleanAllLocks()" removes all locks for all processes for which the current directory is the default directory. Locks are implemented by entries in a file named "Mumps.Locks" created and maintained in the current directory. This file must be read/write enabled for the current process. You may also delete all locks by removing this file. Locks are discussed elsewhere but, in brief, they are used to signal ownership of a portion of a global array. When a lock has been applied to a node, no other process may lock this node, any descendant node or any parent node. Locking does not actually prevent access, it merely marks a resource as locked.

  10. char * mstring::c_str()

    Returns a char * to a NULL terminated character string containing the same value as the mstring variable.

  11. command(string)

    "command()" is a macro that takes a quoted string constant argument. The macro surrounds the string with an extra set of quotes and processes any embedded quotes to backslash-quote. It then invokes a function (__command__()) which strips the extra surrounding quotes. The net effect of this is that you can pass a quoted string containing quotes without the need for "leaning toothpick" notation. Example:

    Normal usage: 
    
    $pattern(source_str, "3n1\"-\"2n1\"-\"4n") 
    strcpy(target, "for i=1:1:10 write \"test \",i,!"); 
    
    with command(): 
    
    $pattern(source_string, command("3n1"-"2n1"-"4n")) 
    xecute(command("for i=1:1:10 "test ",i,!")); 
    strcpy(target, command("for i=1:1:10 write "test ",i,!")); 
    

    The argument must be a character string constant.

  12. Comparison operations involving globals.

    The comparison operators >, >=, <, <=, ==, and != are defined for global arrays. The determination of the mode of the comparison is based on the other operand. That is, if a global array is compared with an integer, integer comparison will be used; if it is compared with a character string, character string comparison will be used. The contents of the data stored at the global array node must be compatible with the comparison mode. When two global array elements are compared, the comaprison will be a string comparison, regardless of the contents. For example:

    Given the following definitions: #include <mumpsc/libmpscpp.h> global A("A"); int main() { string x="now is the time"; char y[]="123"; A("1") = "now is the time"; A("2") = 100; A("3") = 123; A("4") = 200; The following apply: A("1") == "now is the time" --> true A("1") == x --> true A("2") < 123 --> true: integer comparison A("2") < 123. --> true: double comparison A("2") < "123" --> true: string comparison A("2") < 2 --> false: integer comparison A("2") < "2" --> true: string comparison

    Note that the mode of comparison is dependent upon the second operand. In the case of string comparisons, an ASCII comparison takes place thus "123" is less than "2".

  13. Conversion functions

    char *cvt(long i)
    char *cvt(double i)
    char *cvt(float i)
    char *cvt(int i)

    These functions return a null terminated varying length character string containing in printable version of the argument. The functions contain short static character arrays and, consequently, are not threadsafe. Note that char * can be assigned to variable of typ string. These functions are mainly useful to get string values for global array indices. BEcause these functions use static character arrays, do not use these functions directly as indices to global arrays. Example:

    #include <mumpsc/libmpscpp.h> global A("A"); int main() { string a; a=cvt(1234); A(a)="test"; return EXIT_SUCCESS; }

  14. GlobalClose;

    This macro closes the global array files. The global arrays must be closed on exit or they will be corrupt. The macro causes the file system to flush all its buffers and cache and close the file system. Normally, a "GlobalClose" is executed automatically when your program ends except if your program is terminated by SIGKILL or SIGSTOP (which cannot be trapped). If your program is using a large memory based cache (cache's can be 1 GB or more, on some systems), there may be a noticeable delay in file system shutdown due to the time required to write the cache to disk.

  15. Correlation functions

    void global::TermCorrelate(global B)
    void global::DocCorrelate(global B, mstring fcnname, double threshold)
    void global::DocCorrelate(global B, char * fcnname, double threshold)

    These functions build document indexing correlation matrices. The invoking global is assumed to be a two dimensional document-term matrix whose rows are documents and whose columns represent the occurrence of terms in the documents (either weights or frequencies).

    TermCorrelate() builds a square term-term correlation matrix in B from the invoking document-term matrix.

    DocCorrelate() builds a square document-document correlation matrix from the invoking document-term matrix. The name of the function to be used in calculating the document-document similarity is given in fcn and may be Cosine, Jaccard, Dice, or Sim1. The minimum corrrelation threshold is given in threshold which defaults to 0.80 if omitted.

    TermCorrelate() Example:
    #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { long i,j; A("1","computer")=5; A("1","data")=2; A("1","program")=6; A("1","disk")=3; A("1","laptop")=7; A("1","monitor")=1; A("2","computer")=5; A("2","printer")=2; A("2","program")=6; A("2","memory")=3; A("2","laptop")=7; A("2","language")=1; A("3","computer")=5; A("3","printer")=2; A("3","disk")=6; A("3","memory")=3; A("3","laptop")=7; A("3","USB")=1; A().TermCorrelate(B()); string a=""; string b; while (1) { a=B(a).Order(1); if (a=="") break; cout << a << endl; b=""; while (1) { b=B(a,b).Order(1); if (b=="") break; cout <<" " << b << "(" << B(a,b) << ")" << endl; } } return 0; }
    Yields:
    USB computer(1) disk(1) laptop(1) memory(1) printer(1) computer USB(1) data(1) disk(2) language(1) laptop(3) memory(2) monitor(1) printer(2) program(2) data computer(1) disk(1) laptop(1) monitor(1) program(1) disk USB(1) computer(2) data(1) laptop(2) memory(1) monitor(1) printer(1) program(1) language computer(1) laptop(1) memory(1) printer(1) program(1) laptop USB(1) computer(3) data(1) disk(2) language(1) memory(2) monitor(1) printer(2) program(2) memory USB(1) computer(2) disk(1) language(1) laptop(2) printer(2) program(1) monitor computer(1) data(1) disk(1) laptop(1) program(1) printer USB(1) computer(2) disk(1) language(1) laptop(2) memory(2) program(1) program computer(2) data(1) disk(1) language(1) laptop(2) memory(1) monitor(1) printer(1)

    The above gives the number of co-occurences of each word with each other word. For example, the words "computer" and "memory" co-occur in two vectors (2 nd 3) while the words "laptop" and "computer" co-occur in all three vectors. If each vector is thought of as a document, the strength of the co-occurences between words is a measure of similarity for indexing purposes.

    DocCorrelate() Example:
    #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { long i,j; A("1","computer")=5; A("1","data")=2; A("1","program")=6; A("1","disk")=3; A("1","laptop")=7; A("1","monitor")=1; A("2","computer")=5; A("2","printer")=2; A("2","program")=6; A("2","memory")=3; A("2","laptop")=7; A("2","language")=1; A("3","computer")=5; A("3","printer")=2; A("3","disk")=6; A("3","memory")=3; A("3","laptop")=7; A("3","USB")=1; A().DocCorrelate(B(),"Cosine",.5); string a=""; string b; while (1) { a=B(a).Order(1); if (a=="") break; cout