Corpus Generator – capture a Click environment

Overview

Xoom Corpus Generator is the tool we use to create a static snapshot of ClickSoftware Service Optimization environment. We call such a snapshot a corpus. A corpus captures configuration data in a way that enables Configo tools to run as though they were physically running on that environment for the purpose of configuration retrieval, but without having any direct access whatsoever to that environment.

We mostly use corpora to perform services such as Configo Insight and Configo Core Basic for on-premise Click installations. Since the configuration retrieval itself needs to be interactive while running in the cloud, a corpus is used to avoid the need for direct access to often closed corporate networks.

For best results, Xoom Corpus Generator should be run on a Service Optimization environment that sports the full diversity of configuration used on the project. This doesn’t have to be the production environment and it doesn’t have to contain every instance of every object. However, it should contain all different kinds of settings and objects that are in use, and ideally a complete configuration.

Usage

The tool can be obtained on request or downloaded from Configo apps’ website when creating snapshots. It is distributed as a zip archive.

In order to run the tool, you first unzip the downloaded archive anywhere on the chosen Service Optimization application server. The content of the folder will look like this:

There is no need to install anything. The tool is run straight from this folder, and doesn’t make any changes to the host system. Once the corpus has been generated and sent to Configo, the whole folder can simply be deleted.

Either from the command line or by double-clicking, you need to run xcg.cmd. This will trigger the tool utilising the configuration stored in the supplied xcg.json. When the tool has been downloaded from the Configo app, it will be preconfigured for the particular environment, so all you’ll need to supply is the password for the preconfigured user. When using the generic version of the tool, additional configuration may be required (see below).

Once run, the tool will ask for all relevant parameters, and then run the corpus capture, which will look something like this:

As seen in the screenshot, a small number of errors will be reported due to Click preventing some collections from being queried. This is nothing to worry about as the tool will simply note the problem and continue. We will let you know should any of the errors reported turn out to be an actual problem.

Once the tool has finished running, there will be two additional files in the tools folder: xcg.log, which contains the logging entries from the tool’s run and is only going to be used in case there are problems, and the corpus file itself. The corpus file is a zip archive whose name starts with corpus_ and lists, in turn, the user that was used, host address, corpus factory (for internal use only) and timestamp. This allows the same copy of the tool to be run repeatedly in order to produce corpora on multiple occasions without worrying that things will get overwritten.

Once the corpus has been produced, you can either upload it to the Configo app (if manual Configo Insight or Core Basic services are being performed), or send it to us for further use and analysis.

Corpus Contents

Each corpus will contain the following two files:

The files are clear text files and can be inspected.

meta.json contains the metadata about the corpus generator run: when was it run, what plug-ins were used, what parameter values were set for the capture, what tasks were performed and what were their results.

data contains raw responses that Service Optimization returned to various queries, in XML format. This is typically a large file, but it can be inspected and searched if required. The most important reason for inspecting this file would be to avoid undesired data to be included in the corpus. For example, if for data protection purposes the customer wants to exclude anything that may reference any engineer names, this file can be inspected and searched in order to find out where those references are coming from and configure the corpus generator to exclude them.

Corpus Capture Configuration

Xoom Corpus Generator supports rich configuration that controls what information is captures. The configuration file xcg.json included with the tool downloaded from the Configo app will look something like this:

{
  "global": {},
  "instances": {
    "*": {
      "{zanyants.com/xoom/w6/corpus-factories}include-data": true,
      "{zanyants.com/xoom/w6/corpus-factories}unconditionally-included-collections": [],
      "{zanyants.com/xoom/w6/corpus-factories}small-collection-size": 10000,
      "{zanyants.com/xoom/w6/corpus-factories}exclude-transactional-collections": true,
      "{zanyants.com/xoom/w6}scenario": "xcg",
      "{zanyants.com/xoom/w6/connection}user-name": "click-user",
      "{zanyants.com/xoom/w6/connection}domain": "",
      "{zanyants.com/xoom/w6/connection}server-uri": "https://click-host-uri/SO",
      "{zanyants.com/xoom/w6/connection/integration-services}credential-type": "Windows",
      "{zanyants.com/xoom/w6/connection/integration-services}virtual-directory": "IntegrationServices",
      "{zanyants.com/xoom/w6/connection}server-version": "8.3.0.0"
    }
  }
}

Each option has a namespace (the bit inside curly braces) which indicates its context, a name and a value. For example, the line

"{zanyants.com/xoom/w6/corpus-factories}include-data": true

specifies that option include-data in namespace zanyants.com/xoom/w6/corpus-factories has a value true.

The options in the namespace zanyants.com/xoom/w6/connection and its descendant namespaces define the Click environment to which the tool will connect and the related parameters (the credentials, which virtual directory to use etc). Most of the options in this namespace are self-explanatory, so we won’t explain them further here.

The namespace zanyants.com/xoom/w6/corpus-factories is more interesting. We have the following options:

include-data: Specifies whether data (as opposed to just the schema, identity information and collection sizes) should be included in the corpus. This parameter should only be set to false in very rare circumstances where only scheme analysis is required, or when only the data for an explicit list of collections should be included (see option unconditionally-included-collections).
exclude-transactional-collections: Specifies whether the data from transactional collections (such as engineers, tasks, assignments, plans, roster allocations, forecasts and a number of other collections) should be excluded from the corpus. This value rarely needs to be modified as individual transactional collections can be included using option unconditionally-included-collections when required.
small-collection-size: Specifies the number of objects that makes the collection small. Unless explicitly included, only the data of small collections will be included in the corpus, while the data of larger collections will not be included.
unconditionally-included-collections: Specifies a list of collections that will be included regardless of the value of any other option.