The TransformationFramework provides a way to perform sets of generic transformation operations on any type of object. For example, a set of transforms can be configured that transform a filename for an EDI document into a set of ParsedLine objects, then into XML segments, then into a structured XML document. By decoupling the pieces, it's possible to transform into several output formats at once, or use a smaller set of transforms when XML segments are needed in one place but structured XML is required in another. Splitting a task into a set of components not only increases reuse but also makes it easier to test each piece in isolation.

What do I get?

Getting Started

How do I get the Objects out of a Transform?


One possible application of this technique is to dynamically figure out which transforms are required. The following two scenarios could make use of common transforms:

  1. A process receives EDI documents and updates a database. A Parser transform is defined to transform a filename into ParsedLine objects. Another transform transforms the ParsedLine objects into XML segments. A final transform transforms the XML segments into an Object representation.
  2. An XPath query engine examines transformed EDI documents to find usage patterns. The previous transforms are used to transform a filename into XML segments. A new transform is defined to transform the XML segments into a structured XML document. An XPath query can then be run on this final form. XPathFind is a utility that can execute XPath queries.
Wait a second, you said convert a filename, not an actual file?

That's right. By starting with a filename, it's possible to separate the finding of the file into its own transform. For example, with this additional layer of indirection, it would be possible to handle a URL with a custom URI scheme (an example of a URI scheme is "http") or the ability to create a parsed stream of data on the fly from a database or other source. There's no requirement that a transform must have a filename as its input type. However, there are convenience methods available for transforming filenames into other types, since that is expected to be the largest use.

In practice, the transform that takes a "Filename" as an input is responsible for handling encoding issues. The sample ParserTransform provided (which takes type "Filename" as input) delegates to a helper class to determine what encoding the file should be loaded with as well as what delimiter is expected. The helper class checks the filename against a properties file. In a more realistic example, the helper class might consult a database to retrieve that information. In order to pass in a database connection or an object that will provide a connection without JNDI, the GlobalProperties and ThreadProperties classes can be used. See the second bullet point in the next section for more details.

What do I get?

The TransformationFramework provides many reusable components.

  • Transforms are loaded dynamically via a configuration file, and are created with reflection. The arguments required for each constructor are recursively created, so arbitrary objects with diverse constructor arguments can be dynamically created.
  • A GlobalProperties and a ThreadProperties object are provided. These can be used to pass extra data into transforms that is only available at runtime. Use of these classes is not required, but can serve as an example of how extra runtime data might be passed into a transform.
  • The type safety of transforms is guaranteed at compile time. It is not necessary for the transformation graph to be loaded before the types can be resolved. The transformation graph loader uses reflection to look for a transform(...) method on each input transform which can take any single argument. Proper use of interfaces to connect the transforms at compile time is possible (and encouraged).
  • A genetic algorithm library written in Java and provided under the GPL, written by Josť Galaviz, is included. See the site or download the code.
  • An optimized implementation to find the optimal Minimum Steiner Tree on directed graphs with a small vertex subset is included. This implementation requires the choice of the root of the tree. Computing the Minimum Steiner Tree allows for the minimum amount of transformation work to be done to produce the desired outputs (if there is more than one output). The Minimum Steiner Tree is the minimum cost tree that connects the input type with all of the output types.
                In general, this problem is NP-Complete, however, a small vertex subset
                makes the constants small enough to tackle directly.
                On the author's AMD Athalon 1800, finding the Minimum Steiner Tree
                on a 60 node complete graph with a vertex subset of 5 nodes (which of
                the 5 nodes is the root is chosen by the user) completes in under
                200 ms.  For k < 5, the entire algorithm runs in 
                O(2k-2∑(k-1)! + (k-1)2∑|E|+|V| log |V|) time,
                where k is the input type (1) plus the number of output types.
                (For 4 output types, k is 5.)  In general, the entire
                algorithm will run in
                O(f(k)∑(k-1)! + (k-1)2∑|E|+|V| log |V|) time,
                where f(k) is the number of mutually non-isomorphic trees of
                size k.
    		    The input graph is pruned before the minimum cost tree is found
    		    with a depth-first search.
    		    Any nodes or edges that are not on one or more paths from the input
    		    node to any output node are (temporarily) removed from the graph.
                If the resulting pruned graph is a tree, it is simply returned
                as the minimum cost tree (since there is only one).
                If the resulting pruned graph is not a tree, then the optimized
                Minimum Steiner Tree algorithm described above will be run on the
                pruned graph.
                If the vertex subset is larger, a near-optimal solution will be
                sought using a genetic algorithm on the pruned graph.  The
                threshold for using the genetic algorithm is configurable.
                For a single input and a single output the algorithm collapses to
                Dijkstra's algorithm.
There is also code for the following:
  • Find and return all mututally non-isomorphic trees of any given size.
  • Find and return all orders of a String array as a List of Lists.
  • Find and return all sets of a given size of positive integers that add up to a given total.
  • A Fibonacci Heap (from the JGraphT library)
  • Dijkstra's algorithm using a Fibonacci heap, which runs in O(|E|+|V| log |V|) time.

Getting Started

The following example code is provided as a starting point to see how the TransformationFramework works, and what code that uses it looks like. First, you'll need to start with a transformation configuration file.
                TransformConfig config = TransformConfig.load("resources/transform-config.xml");

                                "JDOM Documents");
If the input is a filename, performing a transform to a given output type only requires the code above. For transforming something that is not a filename, use the following code:
                TransformConfig config = TransformConfig.load("resources/transform-config.xml");

                TransformGraphCollection transformGraphs = config.getTransformGraphs();
                TransformGraph transformGraph = transformGraphs.getTransformGraph("ediTransforms");

                                "Filename", "JDOM Documents", null);
It is also possible to ask the TransformGraph to return an Object that implements ITransformer, then reuse it for multiple transforms. The methods for doing so are here and here. If you've noticed that the method on ITransformer returns void, and that seems odd, see the next major section.

      The TransformationFramework handles
        finding the shortest path from the input type to the output type given
        all the transforms.  For complicated graphs, an optional "cost" can
        be added to each transform.  This cost represents the amount of "work"
        it takes to get from one type to another.  Any positive integer value
        can be
        specified, and it can represent processing power, memory used, or any
        value desired.  For the best results, each cost should be correct
        relative to the other costs.

        Later, if a transform is
        written that takes some type to another type faster without going
        through intermediate steps, this path will be taken if its cost is
        lower than going through the extra steps.  If a cost is not provided,
        it is assumed to be one.  Leaving the original transform(s) in the
        file means that they can still be used, if they are still desired input
        or output types.


The JavaDoc is available here.

How do I get the Objects out of a Transform?

When a transform is run, there is the possibility that many Objects will be produced. If a large EDI document is being converted to XML, for example, it can be split into multiple units of work. Each unit of work can then be worked on independently, reducing the memory required.

        Let's say that there is a transform chain defined in the transform
        config that takes an EDI filename as input and eventually produces
        "Java objects" as output (through some number of steps, possibly just
        1).  The desired goal is that the information is stored in a database.
        There are two ways of achieving this goal.

        The first way of achieving this goal is to set up a database
        transform that takes "Java objects" as input,
        updates the database, and provides "Database updates" as output.
        This transform doesn't even need to produce something that is
        consumable by another following transform.  When the filename is
        transformed with "Database outputs" as the output type, the database
        updates will happen as part of the transformation process.

        The second way of achieving the goal of database updates is to provide
        a transform reciever.  A transform receiver acts like a regular
        transform, except that it is programmatically defined, and does not
        need to be in the transformation config file.  In this case, a
        transform between the EDI filename and "Java objects" is requested,
        but a transform reciever is passed in.  See the
        getTransformer and transform methods on
        for more details.
		The output transform that produces "Java objects" is then connected to
		the transform receiver, and each Java object produced will cause the
		TransformGraph to call the transform method on the
		transform receiver.  In our example, this transform receiver would then
		execute the database updates based on the Java objects passed into
		its transform method.