Some thoughts on eXtensible Markup Language (XML) and how it functions in different problem domains.  In addition, we consider the current proliferation of data schemas, and how they are being defined.  XML may well represent a far greater change from current technology than is generally realized.  Failure to understand and plan for this may result in improper usage of the technology, or its non-adoption.

There is a rush to apply XML to all sorts of problems, motivated primarily by the desire to make all types of computers and software communicate better.  It is important to consider how the actual problem area reflects on the strengths, weaknesses, and peculiarities of XML itself.

We consider:

In addition we consider question of how the all-important DTDs and schemas are defined.  The current explosion of activity in this regard may be the "explosion" of a new era in progamming or the ultimate proprietary battleground.

Finally we draw what conclusions we can about the nature of XML from these musings.  We are confident that XML will change the industry, but it is possible that the industry currently underestimates just how much change is implicit in the widespread use of XML.

XML as Data Structure

At the heart of any possible use of XML must be its ability to represent data.  XML is rich enough to represent the normal variety of conventional data structures used in modern programming.  The basic data element is more complex than the basic data element of most existing computer applications, such as arrays or record structures.  When XML data elements are compared to these basic structures the XML representation appears to have more sub-elements even in the simplest cases.

We demonstrate by comparing XML data structure to two examples of conventional data structure:

Perl vs. XML

We use Perl data structure syntax as a convenient example of the basic data structures used in conventional programming languages.  The same comparisons can be made with other languages, to the same basic conclusions.  The basic data entities include:

  • Simple
    • numeric (integer and floating)
    • string
    • logical (special binary case of...)
    • enumerated
  • Multiple
    • indexed (arrays)
    • ordered (lists)
    • counted (sets)
  • Named
    • fixed (record)
    • flexible (dictionary, hash table, or symbol table)
    • active (object)

It is possible to represent each of these using XML.  Begin by defining a simple structure with several named fields in Perl:

{
    alpha => 1,
    bravo => 'two'
}
This could represent a record structure, a data dictionary, or the data component of an object.  A naive representation of this structure might be made in XML as:
<struct alpha=1 bravo='two' />

In this case the entire record is represented using a single XML tag.  This is impossible for the second example of Perl data structure:

{
    complex =>
    {
        sub1 => 1,
        sub2 => 'two'
    }
}

In this case, the named field is in fact a complex entity all its own.  In XML, however, it is not possible to have a complex entity as the value of an attribute.  Instead we must use a more intricate representation, such as:

<struct>
    <field id=sub1>1</field>
    <field id=sub2>two</field>
</struct>

or:

<struct>
    <field>
        <key>sub1</key>
        <val>1</val>
    </field>
    <field>
        <key>sub2</key>
        <val>two</val>
    </field>
</struct>

In the latter representation we don't use the attributes of the tags at all.  We might conceive of these attributes as out-of-band with respect to the basic task of representing a multi-level record structure or dictionary in XML.  They are either irrelevant or available for use as meta-information.

Array structures are represented in Perl using a slightly different syntax:

[
    1,
    'one'
]

With XML there is a similar implicit ordering of entities:

<array>
    <cell>1</cell>
    <cell>one</cell>
</array>

or the indexing could be made explicit:

<array>
    <cell index=0>1</cell>
    <cell index=1>one</cell>
</array>

which would support sparsely populated arrays.

One aspect of XML that is not duplicated in many languages is the inline inclusion of text.  This is the heart of content-specification in HTML, and will likely remain so for many applications of XML.  In XML, we may specify a song:

<song>
    Row, row, row your boat
    Gently down the stream.
    Merrily, merrily, merrily, merrily,
    Life is but a dream.
</song>

Most conventional programming languages don't provide a similar mechanism, though Perl is an exception, as shown in this example:

$song = q{
    Row, row, row your boat
    Gently down the stream.
    Merrily, merrily, merrily, merrily,
    Life is but a dream.
}

The entry of text in XML focuses on its real nature.  Essentially, everything between a matched set of start and end XML tags (<tag>...</tag>) is a child, indexed by order within the tags.  The children can be any legal XML objects or text, mixed up in any way.  This means that it is possible to define extremely complex objects, for example:

<dual>
    <field>
        <key>sub1</key>
        <val>1</val>
    </field>
    Some text.
    <cell>
        <index>3</key>
        <value>three</val>
    </cell>
    <field>
        <key>sub2</key>
        <val>two</val>
    </field>
    <cell>
        <index>9</key>
        <value>nine</val>
    </cell>
    Some more text.
</dual>

In this case we have an object with fields (as in a dictionary), cells (as in an array), and some text.  The object is both an array and a structure, not counting the text.

It isn't possible to create an equivalent Perl object without changing the nesting structure.  The following example:

{
    struct =>
        {
        sub1 => 1,
        sub2 => 'two',
        }
    array =>
        [
        undef, undef, undef,
        'three',
        undef, undef, undef, undef, undef,
        'nine'
        ],
    text => q{
        Some text.
        Some more text.
    }
}

somewhat mirrors the XML object, but has an extra layer of nesting.  All this without even using the XML tag attributes, which would yield other data items that would have no equivalents!

Manipulation of XML objects within programs will require either conversion of the XML objects into equivalent internal structures or manipulation of the XML objects in situ as parsed from data stream or file.  The conversion into language data structures may be an issue in some cases, as we have just seen that basic data structures lack the natural complexity of XML structures.

SQL vs. XML

Relational database technology and its primary representation format and manipulation language SQL are well-understood and pervasive elements of modern computer systems.  In this case the basic data structure, a relation or table, is essentially a set (an unordered group) of records.  Records are found by matching the values of fields of these records.  Additional dictionaries may be defined as indices into the basic table.

Evolution

Since a relation can be viewed as a data structure defined of more basic types, all of the preceeding section comparing Perl to XML can be said to apply to databases as well.  In addition, we must consider issues specific to the use of databases.

Current databases are for the most part based on the relational model.  This is a well-known model with logical, if not mathematical, properties.  The relationships between tuples, tables, and views are well understood and Structured Query Language (SQL) provides the dominant model for creating, querying, and updating databases.

Looking at current work posted by the World Wide Web Consortium (W3C), however, we find a set of brand new conventions intended to make the basic XML data structure the primary format for data storage, query, and update.  These new mechanisms are quite different from the current state of the art and, if adopted widely, will result in widespread changes in the way programmers store and manipulate data.

The core of the new technology is intended to be XML-oriented databases (or XMLbases).  Instead of storing records in tables in a database, the programmer would be able to store XML data structures of arbitrary complexity.  Thus the basic element of data would change from a tuble (or record) to something quite a bit more complex.  Objects would be found based on matching patterns against objects stored within the object, in a manner similar to but more complex than the current mechanism of searching for tuples based on matching fields.

For XMLbases, XQUERY is the equivalent of SQL as a tool for querying an XMLbase for results.  Is there an equivalent to multi-table view???  Updates to an XMLbase are handled by a separate protocol, XUPDATE.  Both depend heavily on XPATH to describe how to match objects.

The XPATH protocol under development will support specification of a node within a complex XML data structure.  For example, XPATH could be used to point to the home telephone number attached to a personnel record.  XPATH is complex and necessary because of the nested structure of almost all XML objects of interest.

The reason this is important is the potential complexity of an XML object, as discussed previously.  An XML object, or a collection of XML objects, can be viewed as a tree of individual data elements.  Finding a particular entity within such a structure involves traversing the tree, so that individual items have XPATH patterns that resemble file pathnames or internet URLs.

During the early evolution of databases there was a hierarchical model which functioned in somewhat the same fashion.  The database consisted of nodes which could each have multiple children, so that finding nodes in the database was a traversal process.  For some types of information this was a good match, but its popularity was also due to the physical constraints of early computer hardware, particular for databases stored on tape.

Implications

Replacing the common relational database model, and with it SQL, is a major evolutionary step.  The mapping from relational tables to XML is not immediately straightforward, though research may eventually lead to provable models for specific cases.  Still, it seems a little like throwing the baby out with the bathwater.

In the interim period, which will likely last for decades, existing databases will need to be converted into XML, and vice versa.  The mechanisms for this will likely be manifold.  Some intrepid database companies will develop XMLbase front-ends for existing relational databases.  Likewise, there will probably be products to make the same transitions from a standpoint outside of the database.  Then there will inevitably be many project-specific solutions created to serve particular short-term (inevitably long-term) needs.

Conspicuously absent in all of this is an XML representation of SQL (we would like to think of this as SQML), such as:

<select>
    <column>name</column>
    <column>grade</column>
    <column>salary</column>
    <from>
        <table>personnel</table>
    </from>
    <where>
        <eq col=grade val=g13/>
    </where>
</select>

This is particularly puzzling as any generic solution for connecting XML to relational databases will require specifications for both the target XML structure and the source relational structure.  The choices are either XML representation of SQL or just including SQL queries as text.  The former seems more useful in the long run.

Perhaps this lack is really due to a general understanding of how difficult it will be to write systems to convert between the two.  As an example, consider the Enterprise Object Framework (EOF) software layer developed originally for NextStep.  The sole purpose of EOF was to mediate between complex data structures used within an application and the underlying relational model used for large-scale data storage.

EOF is a large, complex code layer, typically misunderstood by novices, with many hidden features.  In order to accomplish its task the software must keep snapshots of what each query returned originally and compare them to the current state of the database when updates are to be saved to the database.

Multiple layers of pending changes provide power, but require multiple save actions, any of which may result in errors.  Subsidiary information is queried as required instead of being queried all at once, which may be very efficient or very inefficient depending on the application, requiring careful tuning and deep understanding of the software layer.  Any solution to fitting XMLbase front ends to relational databases will likely be just as complex.

XML for Data Transfer

The two main issues for XML as a data transfer mechanism are representative capability and size.  The former issue, as it is directly related to XML as data structure, has already been addressed and is likely to be overly sufficient to any need.  The latter issue relates to bandwidth as a scarce resource and thus to the relative size of XML representations to other options.

There is a growing attitude among web developers that all their users will be connected via fast links, for example DSL or cable modems.  In fact, nothing could be farther from the truth.  The majority of homes are still not connected to anything faster than a telephone line, and it will be years before more than a small number of the more expensive hotels will provide better than a voice line for the use of business travellers.  Thus size of the data transferred is and will remain important for the forseeable future.

Three features of XML contribute to it being somewhat expensive in terms of bandwidth required per data object:

ASCII (or UNICODE) representation
many simple data items expand when converted to ASCII
the tags themselves
the actual <tag>...</tag> sequences add a lot of bytes to the stream
added complexity
when using XML as data structure there can be some complexity added due to the difference between XML and conventional data structures which entails more tags and thus more size

While this may seem like a bad thing initially, it can be dealt with to some degree by compression technology.  The end result of using a data representation that packs well and using a wasteful one with a good compression algorithm is going to be similar.  This is without investigating compression algorithms tuned for XML.

Moreover, we must remember that an XML object may represent a large amount of display code.  This is one of the real promises of XML for data transfer.  Client-side expansion of XML objects, for example by XSLT, would enable pages to be updated by smaller XML objects summarizing what is different about each page instead of duplicating what is the same each time.

As an example, currently sending a web page to a browser means sending all of the page each time.  For an address book application, each address would be displayed on a separate page, necessitating a total page redraw for each new entry.  If a template page could be downloaded to the browser the first time and cached then each subsequent page would only require the significantly smaller XML <address> object.

Transferring just the XML object and expanding it on the client side requires protocols for XML for Data Conversion to be implemented in the browser.

XML for Data Conversion

Another aspect of XML as it is intended to be used is the conversion of one XML object into another.  This may occur for various reasons:

dissimilar XML Data Type Definitions (DTSs)
data must be converted from one format to another related one
expansion of an XML object during formatting
allowing smaller XML objects to be transferred
addition of style information during formatting
providing functionality similar to Cascading Style Sheets (CSS)

The workhorse for these data conversions is XSLT (XSL Transformations), a protocol for converting XML from one form to another.  With XSLT it is possible to define templates for converting specified XML structures into other XML structures, which can be thought of as a form of macro expansion.  XSLT is a part of the more general XSL mechanism for adding style information but it is far more general in capability.

XSLT may turn out to be the swiss army knife of XML.  With an appropriate XSLT engine (a number already exist) XML source can be transformed into different XML structure or arbitrary textual output.  The possibilities may well be endless.

Returning to the reasons cited above, the first was conversion due to dissimilar DTDs.  This might be required in order to link B2B or scheduling systems built by different vendors.  XSLT would be used to specify the mapping from one format to the other, and applied to all messages moving between the differing systems.

An example of the second type of conversion would be the display of relational database records as individual pages on a web site.  The conversion would be from simple tuples taken from the database into XHTML, an XML specification of HTML.

The third reason is the justification for CSS with HTML.  In this case XSL, a combination of XSLT and XPATH, would be used to add styles to content.  As with CSS, different media would be handled by different XSL templates.

XML as code

There seems to be a lag in using XML as a programming language.  We saw this when comparing XML to SQL but that is only one example.

Consider the following:

<for var=i init=0 delta=1 term=100>
    <define var=x init=0/>
    <print>
        <set var=x>
            <add>
                <value var=x/>
                <value var=i/>
            </add>
        </set>
        Loop:  <value var=i/>, sum=<value var=x/>
    </print>
</for>

It is easy to understand the sense of the algorithm and it would be simple to convert it to most modern programming languages.  In short, it represents a fragment of a computer program.  It's obviously way too much trouble to write code like this, but there may be utility in defining the underlying source code representation in this manner.

Given a sample of XML code, it would be possible to generate multiple representations.  Conversion of the above example to various languages would be fairly simple.  Formatting rules could be defined and applied evenly to all code, while still allowing individual programmers to work with idiosyncratic representations.  Graphic representations would be possible as well.

Once programs are represented as data structures it is possible to construct applications that reason about them.  These applications could check for dangerous constructs, replacing them automatically in some cases.  Programs could legitimately reason about themselves, a key feature of many artificial intelligence domains.  Programming language conversions would be possible in many cases, and may in fact be as simple as implementing an appropriate data conversion template.

On the flip side, it would be possible to generate programs more efficiently in this manner.  Graphic or other special-purpose tools could be used to edit programs, with the result generated as XML structure.  Since the structure would remain normalized and parseable (unlike code generated in conventional programming languages) changes could continue to be made using the initial tool kit, even if other changes were being made to the same structure using other tools (such as one presenting the code as a particular language).

Finally, this functionality would allow the complete representation of objects as XML.  In order to fully represent objects it is necessary to encode the behavior associated with the data structure of the object.  XML as code would be that missing link.

All of this presupposes a compiler capable of converting the XML code to executable code.  We assume that this is a reasonable task, and it may in fact be one that is particularly simple for two reasons.  First, there is always a structure similar to that specified here, created by the programming language compiler for its internal use.  It's just the front end parser that is different.

Second, the data conversion capabilities of XML would simplify certain aspects of the parsing process.  Much of the work of a macro assembler is mechanical parsing (already accomplished by the XML parser) and expansion of opcodes and macros, easily accomplished by template expansions. A conversion from Java to Java byte code may well be accomplished almost entirely with a series of macro expansions specified via XSLT.

The Big Bang

A large part of the magic of XML is the way it can describe itself.  Data Type Definitions (DTDs) provide a way to specify the syntax of XML objects, what tags can appear where in the object.  In addition, schemas provide a way to apply limits to the data items, specifying what values a particular item can assume.  Together with namespaces, these two aspects of XML implement the eXtensible part of the protocol, allowing programmers to define any number of domain-specific markup languages.

When we first heard of XML several years ago, the advantages were apparent, but we were somewhat sceptical about the generation of the DTDs and schemas.  The situation seemed to be one in which a large number of divergent standards might develop and compete, and not in a good way.

In the several years since XML was defined, a large number of such languages have been proposed and some have been almost completely defined.  The vast majority of the work seems to have been done in open forum.  While a lot of good work is obviously being done, we are mindful of the many cases in where committees have, through no fault of their own, come up short.  In most cases this is simply because the committee takes too long, and is overtaken by events.  In other cases the committee fails to adequately examine the goals, and misses the target.  As an example, consider the Ada parable.

The moral to the Ada parable is that overly large committees can result in such a proliferation of requirements and features that the result is considered unusable.  We don't wish to point at this time to any particular XML-related effort, but rather to the number of them and their interrelationships.

For a novice, trying to figure out how XPATH, XQUERY, and XUPDATE work together to replace SQL is somewhat unnerving.  Likewise, the generation of graphics is composed of multiple new markup languages including SVG and SMIL.  Then there is XSL, which depends on XMLT and XPATH.  XML itself is simple to understand and use, but XML plus the myriad protocols currently under development encompasses so many aspects of programming that the total picture becomes quite complex.

Proprietary Markup Languages

Our original fear that XML extensions would compete in a bad way seems to have been groundless.  At this time most of the work is being done in open forum.  Still, it is early in the process, working XML systems are only now beginning to proliferate.  Only time will tell.

Worse would be a competition for ownership of domain-specific markup languages.  We can't help but wonder if proprietary markup languages might not be owned in some fashion, especially in light of the current trend of patenting any programming technique that isn't actively running away.  This extends particular to namespaces, which remind us somewhat of domain names.  But perhaps we are simply being paranoid, after all, how many of these formats could there be anyway?

We're also concerned that some markup languages will be defined by large business concernes (need we name them here?) and promulgated by fiat despite existing public offerings.  Even when large companies accept existing protocols and computer languages they often make changes to increase their "ownership," fragmenting playing field.  Unfortunately, this seems as inevitable as smear advertisements during a politcal campaign.

As a minor point, DTDs are loaded from URLs.  This is good in that they are always the same for everyone.  It is bad in the sense that this requires network connectivity in order to access the DTD for a document.  We do all our documentation in HTML and use relative URLs so that it can be read from the local file system.  We will need to figure out what to do about globally defined DTDs.

The Ada Parable

Perhaps the best example of the failure of design by committee is the Ada programming language.  Generated by committee, with the weight of the United States Department of Defense behind it, Ada represented the pinnacle of language design for its time.  It featured all sorts of innovative ideas (some of the data definition concepts foreshadow aspects of XML schema definition), all crammed together in a specification of Byzantine complexity.

Despite the best efforts of all concerned and the weight of the military machine, Ada has not taken the world (or even its country of origin) by storm.  One obvious reason is that it was overtaken by events.  The object-oriented programming movement transformed the way programming was done, and even the addition of objects to Ada '95 wasn't enough to make it attractive to non-DOD projects.  Another reason was the bloated nature of the beast, stuffed as it was with all the things anyone could think of at the time.

After the language was completely defined efforts were begun to create development tools and a methodology to support the language.  One criticism published in this time frame was that the entire process was done backwards.  The methodology should have been developed first, then the development environment, then the language itself.

The COM Parable

Microsoft defined a protocol for communication between program components known as the Common Object Model (COM) as the successor to OLE.  When combined with distributed functionality the protocol became DCOM and had similar functionality to CORBA.  Having the weight of Microsoft behind it, COM has become quite common, not only on Windows platforms but to a lesser extend on other operating systems where Microsoft has a market.

The main problem with COM is that it is extremely complex to learn and use.  Few programmers can construct a COM object from scratch.  As a result of this, Microsoft supplies a number of tools to do all the work of building COM objects.  The easiest to use is Visual Basic, which does everything.  In other languages large blocks of code are created by specifying or editing COM interfaces.  In general these blocks of code are macro statements which expand to generate code that is difficult for novices to fathom or safely change.

A COM object is specified not by a single interface but by a combination of several, an added level of detail not normally found in object-oriented systems.  The COM interface is at a finer level of granularity than an object.  Thus a simple COM object will of necessity be more complex than a simple class in any common object-oriented language.  As with XML, this extra level of complexity seems like it should lend itself to extra capability, but this potential 'extra' value seems to go largely (if not completely) unused.

There is a proliferation of proprietary COM object defintions, mostly by Microsoft and secondarily by other vendors, each identified by a unique non-mnemonic tag generated by a COM tool.  This at least prevents any possible namespace problemsn.  Competing vendors, however, may construct similar but different COM objects so that it is not possible to simply add all word processing programs to a system.  Each such program would work differently and require separate code.

What's it all Mean?

Having made all of these comparisons, what conclusions can we draw?  How does all this help us to understand and use XML?

With respect to XML as a data structure, the obvious conclusion is that XML provides a much richer environment, with all sorts of aspects that don't exist in basic data structures commonly used in programming today.  This doesn't mean we must use XML in a complex way, but there does seem to be a definite increase in the complexity of the data when represented in XML.

This increase in complexity would generally be a bad thing in a programming context, where simplicity often runs faster and is almost always easier to code and maintain.  It is possible, however, that some of this complexity may be of value.  For example, the attributes of XML tags might be used to provide out-of-band information, or data about the data object, such as comments or timestamps.

In the database context, XML provides multi-level structured data similar to the linked records often used within applications.  This is probably a big win, especially in situations where a large number of relational tables would need to be linked together via 1:n  relations in order to create a single object.

On the other hand, the simplicity of SQL in the RDBMS environment allows us to construct new and unanticipated queries quickly with some assurance of getting an appropriate response.  The mathematics of combining XML objects of different types via relational queries as we currently do with relational databases may take years to fully define.  Is XML so much more expressive that relational queries across them won't be required at all?  This doesn't seem likely.

More importantly, consider the history of object-oriented programming systems (OOPS) and object-oriented databases.  OOPS are obviously an integral part of the modern programming landscape.  More recent developments such as aspect-oriented programming don't remove objects, they add new functionality.  Yet object-oriented databases have never really made much headway, the vast majority of existing databases continue to be relational.  Is it possible that this will be the fate of XML-oriented databases as well?

The size issue with XML data transfer is one that will be relatively minor soon due to the steady increase in available computing power and bandwidth and the inevitable use of compression algorithms.  Of course, having said that, we must remain mindful of the creeping bloat that has infected software over the last twenty years due to the steady increase in computing power and data storage capacity.  As programmers we do tend to use all available resources.

In any event, the movement of data conversion technology and tools to client hardware will enable XML to be used to its fullest, sending only the data with a common separate template, which will negate the size issue for multiple pages or calls.  The caveat here is the delay until browsers become fully XML-enabled, which likely can't happen for a while since the relevant protocols are still under development.

We think that XML as Code is inevitable, but only as an internal representation.  It seems obvious that no one would want to write code using XML.  The utility of such a step seems obvious as well, at the very least for XML scripting and fully active XML objects.  We can only hope that this will result eventually in a common set of programming language constructs that easily map to existing languages.

The explosion of extensions to XML, new markup languages, is reassuring.  So far there seems little negative competition in this space, though that may change as XML really takes hold.  We can only hope that XML doesn't prove a tower of Babel, at attempt to tie all programming together that is struck down by its own hubris, resulting in even more diversity and attendant problems than before.

The programming community sees XML as an evolutionary step in the representation of data, with a lot of advantages.  We agree that XML is in fact a step forward, and the advantages are real.  We believe, however, that XML may in fact be a completely transformative technology, one that has such a widespread influence that the entire programming landscape will change over a relatively brief period of time.

As with any such paradigm shift, there is also the chance that the change will be too drastic and the programming community at large will come to resist or reject the transition in favor of alternate solutions that already exist and conform more closely to the large body of existing technology.