SAND architecture

SAND architecture

-introduction-
Structs
And
Nodes
Development
-appendices-

This document seeks to provide the most condensed possible introduction to SAND. The vast majority of detail has been relegated to a reference implementation, with only a few of the more basic points referenced here in the appendices. This introduction represents an extensive reduction to the bare essentials, and should by no means be understood to represent a complete description or specification.

SAND (Structs and Nodes Development) is a system architecture and development platform for building highly flexible and scalable applications. It is based on structs (which declare data), and nodes (which receive and produce data), being highly leveraged from within a fully supportive development environment.

In effect, SAND is a return to the simplicity of a struct declaration for data, and code module to work on it, but at a level that

reflects the multiple data formats and distributed nature of enterprise computing,

supports and leverages the OOP aspects of Java

leaves room for emerging technologies both at the component level and across the application

SAND is developed from the application towards the supporting technologies, rather than from the technology APIs towards the application. It provides a framework for leveraging J2EE and/or other proprietary technologies without lock-in.

The intention of SAND is to allow for immediate and continuous application development, focusing almost exclusively on business logic rather than underlying technologies. Wherever possible, code is automatically generated from struct and/or node declarations, and the business logic is insulated from specific implementations of messaging, configuration, control, persistence or other technologies. Structs and nodes provide the conceptual basis, and the development environment makes the concepts executable.

TOC

Structs:

The term "struct" is borrowed from the C programming language. Like in C, a struct declares a conglomeration of basic data elements. In SAND, structs are standard .java class files that form the meta-source for all data declarations. A struct will

by convention have a name ending in "Struct" (e.g. BaseUserStruct.java)

provide a complete specification of its data members through javadoc @tag directives.
consist exclusively of protected data members which are
- int, long, double, String, Date, or
- a reference to a persistent struct (e.g. protected long userID; & @ref BaseUserStruct)
- an array of references to persistent structs (e.g. protected long[] administrators; & @ref BaseUserStruct)
- a reference to a non-persistent struct (e.g. protected BlackboardStruct workstate;)
- an array of references to non-persistent structs (e.g. protected NodeDeclarationStruct[] nodes; )

Structs form the basis for messages, which define the input and output for nodes. Each struct definition generates a corresponding message definition, so for example BaseUserStruct.java generates the BaseUser.java message class, plus optional verb messages used for instance management if declared. A message extends the struct, and adds both autogenerated and manually created utility methods to manipulate the data within the java runtime.

Structs may extend other structs. So for example UserStruct.java might extend BaseUserStruct.java and add other data elements useful to a particular application. Inheritance allows for common code, while also allowing for customized applications.

Struct declarations are leveraged to generate code for configuration, control, messaging, caching, persistence, serialization etc. The use of structs allows a data element to exist in a variety of forms (XML, relational database row, HTML form data, message object etc) without having to write interfacing and validation code for each format.

TOC

Nodes:

The term "node" is borrowed from graph theory, and reflects the graph formed by the flow of messages in a running SAND system. In SAND, a node is a java package containing code which is responsible for fulfilling a single business logic function.

A node is specified by its declaration class. For example an Authorizer node (in package xxx.xxx.Authorizer) would contain an AuthorizerNodeDecl.java class which is the node declaration. From this, the SAND development environment will automatically create an abstract AuthorizerNodeBase.java class containing the base implementation code. The developer must then override AuthorizerNodeBase in AuthorizerNode.java to form the main processing entry point. All other classes and files in the package are unrestricted.

A node declaration class must:

contain only configuration parameter data, represented as protected data members limited to type
- int, long, double, String, Date, or
- int[], long[], double[], String[], Date[]
declare all input and output messages via @receive, @send, @query, or @subscribe declarations.

A node base class provides:

appropriate messaging methods (typically overloaded onReceive stubs and overloaded send methods)
accessor methods, initialization and maintenance code for configuration parameters
optionally overridable control hook methods called on startup, shutdown, suspend, resume, init, messaging error etc.
other utility methods for input flow control and synchronization, signaling global errors, etc.

Code in a node package:

must implement the specified business logic by overriding the generated incoming message methods.
must access other node instances only through the messaging methods provided.
is explicitly re-entrant. A node will be asked to process as many incoming messages simultaneously as it can handle. It is up to the node to synchronize access to its internal resources if necessary.
is not dependent on sequence. Messages may arrive in any order, and a node may not assume any message grouping or ordering. Each message is self-contained.
is explicitly transactional. If a call to an onReceive method completes normally, then message delivery was complete. Otherwise the message will be redelivered to the node.

TOC

Development:

The SAND development environment (sandbox) takes care of the majority of "plumbing and processes" necessary in a development project. Creating a sandbox is a significant effort, but is both highly flexible and highly re-usable.

A sandbox includes (at a minimum) the basics project, code generators, and the SAND configuration management tool (sandman). It will typically require additional supporting technologies (version management, build tools, J2EE container, database, XML serialization, web server or other UI rendering tools etc) to be fully functional. The total installation of a sandbox is kept as simple as possible.

An installed sandbox includes all directory structures, automated build processes, documentation, and tools. While a complete list of sandbox services and usage is beyond the scope of this document, some of the more noteworthy points that occur during the product lifecycle include:

Creating a new project: a sandbox supports multiple simultaneous project definitions, which may optionally depend on each other. Declaring a new project is kept as simple as possible, usually copying a basic template and adapting it for use. The build processing generates a comprehensive website of the entire sandbox project structure, which is easily navigable via either physical project hierarchy, or project dependency information. The top level is SAND/docs/index.html, which provides links to other documents in the docs directory tree in addition to project navigation.
Creating struct declarations: Run the build to generate the data documentation and code. The javadoc comments for the structs are linked into the generated project documentation, and the generated messages are linked back to the structs and supporting SAND documentation. The struct definitions trigger all the supporting code for the application, including persistency (for example SQL schema declaration, DataManager message validation code, entity beans for messages, session beans for queries and aggregate updates, initial data verification code, database maintenance commands etc), serialization (for example code to convert messages to XML, code to instantiate messages from XML, DTD and/or schema definitions etc), messaging object classes (accessors, mutators, validators, utilities to access JMS or proprietary messaging implementations etc), and other supporting declarations for sandman, install etc.
Creating node declarations: The build also generates a package.html source file containing the node declaration comments, a summary of input and output, node properties (configuration information), overview of module tests etc. which links the node into the project documentation via javadoc. In addition to the base node class, supporting declarations and code for sandman, install etc. are also generated.
Creating a deployment configuration: Run sandman to create a new application deployment for your project(s). Choose from the available node declarations to create the node instances which comprise the configuration, and configure the messaging data flows through the node properties (sandman provides message type checking, and other helpful configuration overview/navigation utilities).
Creating initial data: Using sandman, create the initial data instances for your deployment. For example root users, root nodes for tree representations, referential integrity check instances etc. as needed by your application. Sandman provides an object instance editor generated from the struct definitions, which ensures that the created instances are valid. Only the fields and values explicitly specified for each instance are actually stored, ensuring that subsequent changes to default values in the struct declarations will be reflected in the deployment.
Creating system tests: Using sandman, create one or more test configurations, which include the deployment configuration, and add MessageDriver nodes at appropriate points in the system to generate and monitor message traffic. Use the testscript editor to create the test scripts for each MessageDriver node instance. Add the test configurations to the automated build processing.
Application development: At this point the automated build process is rebuilding the project and running the tests of the deployment. Test failures, test completeness, system features, node behavior, data structures etc all need to be reviewed against release requirements so that successful completion of all the system tests fields a release candidate. On acceptance of a candidate, the release code is typically tagged through a build command, the installation is burned, and formal delivery is made. Some of the more noteworthy aspects in this process include:
- Creating a UI for the system leveraging the I/O code generated for sandman. If other UI I/O technologies are needed, writing the doclets to generate the required interfaces from the struct definitions.
- Examining the throughput requirements of the system and determining how multiple node instances and/or multiple machines can be used to scale up the system. Modifying the configuration and/or creating alternate configurations with additional node instances to optimize appropriately. Node statistics are critical to this process, and this frequently involves modifying the nodes to produce more statistical information, and creating statistical accumulators for analysis. The statistical output from different automated test runs can then be compared.
- Analyzing system capacity. Use sandman to create a configuration involving one or more LoopDriver nodes, and use the loopscript editor to configure the output message load generated by each node. Node statistics are again critical.
- Examining the failover requirements of the system and determining what changes are necessary for a redundant configuration to run in parallel for hot swap failover.
- Defining module tests to verify that the behavior of each node remains consistent across multiple deployment configurations.
- Adapting to new or alternate technologies through modifications to the sandbox and sandman code.
- Porting the system from one sandbox to another by shutting down the deployment, dumping the data to canonical form (using sandman to command the DataManager), and loading it into the new sandbox.

The simplicity of the struct declarations, and the verifiable behavior of the system as a whole, provides enormous leverage at almost all points in the product lifecycle. SAND provides huge advantages in initial development, but its primary focus is on the application over time.

TOC

Appendices:

struct @tag declarations:

The SAND development environment supports at least the following javadoc tags and parameters for structs:

Class level tags:
- @persist:
- @update:
- @collection:
- @query:
- @history:
- @abstract:
Data member tags:

TOC

node @tag declarations:

The SAND development environment supports at least the following javadoc tags and parameters in node declarations:

@receive:
@send:
@query:
@subscribe:

TOC

Limited types for configuration parameters:

Configuration parameters are limited to the basic types, or arrays of basic types, in order to provide a consistent configuration interface for all nodes. Cramming more advanced information into these basic types through serialized object forms, associated arrays etc. is strongly discouraged. Where more advanced information is needed by the node, a struct declaration should be created, with the instances loaded via query at node initialization time.

All configuration parameters are declared locally to a specific node. There is no inheritance between groups of nodes, nor is there any concept of a global configuration attribute. Global values can be specified through system variables.

TOC

Only data members in structs:

If a struct were allowed to declare a public final static member, or other data members that are not recognized as part of the canonical form, then the message class would have access to data not available in other data representations.

If a struct were allowed to declare methods of any kind, then the message class would have access to data manipulation logic that is not specified for other data representations.

While it may seem odd to only allow protected data elements in a java class, this is necessary to produce a canonical data declaration which can manifest itself across more representations than just java classes. Note that additional data members, constants, methods etc can be added to the resulting message through both automated and manual code generation, so in practice this limitation is not a significant development issue.

TOC

Verb messages:

A SAND development environment (sandbox) provides verb messages (generated from struct tag declarations) that have specific processing requirements. At a minimum, this includes:

An Update message format, generated from an @update struct tag. A node receiving an update must return an update containing the complete and latest message information. An update consists of the messageID, the messageVersion, the action, and an AttrVal[] describing all the attribute/values to update. For example:
```
    public class BaseUserUpdate ...
        protected long messageID;
        protected long messageVersion;
        protected int action;                   // add|update|delete
        protected SandAttrVal[] messageData;
        ...
```
Notes:
- It is highly recommended that data never be deleted except in highly specialized circumstances such as data initialization or data archiving procedures. Applications are encouraged to instead mark message instances as deleted by using a status flag.
- "Transactionally safe" means no data anomalies. Including, but not limited to, problems resulting from simultaneous change requests, updates specifying the wrong version number etc.
- Merging of data instances is not required. An update specifying the wrong version can simply be rejected as invalid.
- Information such as who performed the update, the time the update occurred, text indicating the reason for the change etc. must be stored with the underlying message instance. It is not part of the update message.
- The primary key ID field (and other fields necessary for persistence) are created from the @persist struct declaration. The behavior of an update message for a non-persistent message is undefined.
- IDs (primary keys) for message instances can be generated by the update message originator, or by an intermediate processing node assigned this responsibility. Either or both are supported. Modifying an ID is prohibited. Any other field may be modified.
- To perform several updates as a single transaction, send an AggregateUpdate message to the DataManager.
A Collection message format, generated from an @collection struct tag. A collection is typically received in response to a query or history message. It consists of a set of message instances, for example:
```
    public class BaseUserCollection
        protected BaseUser[] baseUsers;
        ...
```
Notes:
- Collections are always sorted by ID first, and then by creation time. The caller is responsible for any other sorting or manipulation.
- Collections are subject to a maximum size. It is the caller's responsibility to re-query for any remainder.
A Query message format, generated from the @query tag struct tag. A node receiving a query must return a (possibly empty) collection message. A query consists of an array of attribute/values where the values are match expressions. For example:
```
    public class BaseUserQuery ...
        protected SandAttrVal[] matchInfo;
        ...
```
Notes:
- Consider an "age" field. In addition to specifying a single value such as "25", a range can be specified through a match expression which corresponds roughly to a SQL "where" clause statement without the attribute references. So to search for 18 to 25 year olds, you would specify age as ">= 18 AND <= 25" (without the quotes).
- String values in a match expression must enclosed in double quotes if they contain spaces or other characters which excessively complicate the tokenization of the match expression.
A History message format, generated from the @history tag struct tag. A node receiving a history must return a collection message containing all matching versions of messages. For example:
```
    public class BaseUserHistory ...
        protected SandAttrVal[] matchInfo;
        ...
```
Notes:
- The history message is essentially a query that can return multiple versions of the same message in the resulting collection.

TOC

SAND development environment effort:

There is a tendency to begrudge the time spent up front in setting up a development environment since it is not tied to a specific deliverable feature. However this one aspect of software development frequently results in an order of magnitude more work than initially estimated, and ends up either getting done or manifesting itself as wasted development time. In short you pay for the time either way, but if you do it up front it takes less total time and development is more productive (and fun).

A quality development environment is part of "doing it right". The test for if something was "done right" is increasing returns for decreasing development effort over time.

TOC

SAND basics:

The "basics" project ships as part of the sandbox, and provides the fundamental declarations and code needed for SAND development. The basics project is set up like any other SAND project so it serves as an example in addition to providing the needed structs and nodes.

A partial sampling of the basics project includes:

Structs:

BaseUserStruct
SandAttrValStruct
StatsStruct
PingStruct
ConfigurationStruct
ServerDeclarationStruct
NodeDeclarationStruct
TestScriptStruct
CallResponseStruct
LoopScriptStruct

Nodes:
messaging interfaces:
- SandMessage (implemented by all messages)
- SandStructMessage (implemented by messages that inherit from structs)
- SandVerbMessage (implemented by verb messages)
- SandUpdateMessage (implemented by update verb messages)
- ConfigParam (implemented by all configuration parameters)
classes:
- RefManager
- AggregateUpdate

TOC

SAND configuration management tool (sandman):

A sandman implementation is a significant application, which among other things obviates the need for a separate "admin" application under most circumstances. It vastly simplifies many development, deployment, and testing tasks, and under most circumstances can be viewed as a "black box". However there are several aspects which are worth explicitly noting for comparison and implementation purposes:

Control aspects of the system, such starting and stopping nodes, can be implemented in a variety of ways. This can be as simple as a program running each node in a separate thread (provided the entire configuration can be run inside of a single JVM) or as advanced as a JMX wrapper running inside of clusterable J2EE container. Communication is two-way here, since nodes will need to signal things like catastrophic system failure through a utility method.

Configuration information must be stored independently of the DataManager, since it is part of bootstrapping the system into existence. By dumping out the message structure via an XML serializer, this information can be captured into a file (e.g. configuration.xml). For a simple deployment running on a single computer, this is enough. However any non-trivial system will likely require either transactionally safe replication of the file on modification, or use of a common reference copy (typically via JNDI). Other technologies are certainly possible.

Messaging interactions are also important to consider. Optimized messaging (especially for synchronous calls) must in many cases be optimized for acceptable system performance. The messaging code may need a way to determine cases where a direct call can be made (i.e. the two node instances are in the same process), unless the messaging and control technologies are already optimized.
A SAND application relies on sandman and the generated code, both of which in turn rely on the supporting technology. This means that the same application can be configured to run on a single small computer, or multiple computers in multiple data centers, depending on the underlying technologies involved.

TOC

SAND directory structure:

A typical sandbox will assign specific semantics to reserved directory names. For example:

the docs directory holds documentation files (typically HTML documents) and the generated javadoc directory if applicable
the src directory holds source code
the build directory holds build automation files
the env directory holds .jar files resulting from the build, and other files needed by the application for configuration or other purposes
the test directory holds test configurations, test scripts and load test scripts generated from sandman (typically as XML files)

So a subsection of an actual SAND installation might look like:
SAND
apps
MyApp
docs
javadoc
env
src
com
epinova
MyApp
MyNode
test

test
deploy
MyDeployment
docs
env
src
test

docs

Pushing the source three levels down is unfortunate, but unfortunately fairly typical. It is possible to factor the source prefix directories, but the build processing is usually slowed down far beyond any gain from a more concise directory structure. The structure may improve as java compilers continue to evolve.

TOC

Copyright © 2002 epinova corp.
All rights reserved.