Accessibility
 
 
The Emerging Distributed Web, Part 3 of 4.
The Emerging Distributed Web, Part 3 of 4.
The XML Solution: In this article, Jeremy Allaire describes the purist and pragmatic approaches to XML implementation. The pragmatic approach has been at the center of the Web platform's success and evolution, and the specific opportunities created by XML are at the center of Allaire's XML strategy.
By Jeremy Allaire

Editor's Note: This is the third in a four-part article series to be published weekly. Past articles in the series are available in our Columns & Articles archive section.

The XML Solution
Over the past year, the Web industry has begun to usher in a new model for exchanging data over the Web. The Extensible Markup Language, or XML, has emerged as a foundation for expressing and exchanging data of almost any form over the native, HTTP-based Web environment.

XML promises to fill the missing link in the Web platform's future by providing a common glue for data exchange between disparate clients and servers on the Internet. XML embraces the uniqueness of the Web platform; it is based on ASCII, can be easily transported over HTTP, is network, language and platform independent, and is flexible and extensible such that it can scale to solve nearly any data exchange problem.

Two critical problems that XML potentially solves are:

  1. Providing a framework to separate data from presentation or layout, and
  2. Providing a grammar that will allow Web application servers to share and exchange data with other servers across the network.
More generally, XML promises to define common information formats across nearly every major computing endeavor.

As a language environment, XML embraces on the notion that computer languages must evolve to be both human readable and writable, as well as machine readable and writable. XML enables declarative and human understandable data and languages, opening doors to richer forms of abstraction in computer programming and components.

XML Adoption: Two Views
With the surging interest in XML, there is abundant speculation as to what its role will be in the emerging Web and computing landscape. Over the past year, we've begun to see a number of views emerge on the critical role that XML will play in transforming the Internet and data exchange standards.

Purist XML Data
One major view is that XML and the varying forms of XML data and vocabularies that emerge will act as the future of how almost all data is represented and programmed. In this view, XML data becomes a logical and physical storage format for data, replacing even SQL servers as the primary way to represent and access data. Developers define data structures (relational, hierarchical, networked, object-based, etc.) that are then represented in XML formats. Accessing and manipulating that data then requires programming what is referred to as a Document Object Model, or DOM. The DOM is the generic API for accessing and manipulating XML data, much like SQL and ODBC are the generic means for accessing relational data today.

In this new XML world, developers no longer think in terms of tables and joins, and SQL, they think in terms of XML vocabularies and DOM programming. We can see early examples of this with Microsoft's IE4 XML-databinding features. With this, a developer would use a server page, created with ASP or ColdFusion, for example, to query a standard database and dynamically build an XML format for that data, returning it to the client where that XML structure would be 'bound' to an HTML table or other page elements. In this world, the developer must create an XML format for their existing relational data. Instead of just passing the client a recordset, it would create tags and tag attributes containing data that maps to the relational structure.

Corresponding to this view is the idea that vendor consortia would work together to define common XML formats for cross-organizational problems. For instance, health care and insurance companies would work together to define a Patient Markup Language that defines the data and structures associated with a patient, allowing patient data to move easily between relevant business systems on the network. It might even assume that two companies working together to exchange invoice data would establish an XML format to exchange structured invoice data, shielding each other's systems from proprietary formats or platforms.

XML Data Middleware
Another major view -- and not necessarily a competing one -- is that XML provides an underlying glue to application-to-application data exchange, in essence become a Web-native middleware environment for distributed applications. However, in this view XML would be more of a transport protocol for object data than an environment of specific formats for given problem domains.

For example, using XML as middleware would allow a client program --- say a JavaScript browser -- to invoke a server application -- say a ColdFusion page -- and request a set of data or recordsets. ColdFusion would generate relevant recordsets, transform into XML, and pass back to the client, where the client would transform that data back into a native JavaScript format. This would allow cross-language, network-based data exchange. The utility of XML as a data exchange protocol example to the server, where a ColdFusion server could invoke a service on another remote server, say running Java servlets or Perl, and request a set of data -- the other server environment would generate data and transform it into this XML format and pass it to ColdFusion, which would then translate into native ColdFusion data objects.

In the 'XML as middleware' model, the use and visibility of XML becomes transparent to the developer. Instead of defining custom XML formats and programming an XML DOM, the developer thinks and programs in the same high-level constructs of her native development language -- objects, arrays, recordsets, string variables, etc.

Problems With Purist XML Data Approaches
As one might have noted, the differences between these two models are potentially substantial. The crucial difference is in what the model imposes on the developer and corporation in terms of new data formats, architecture and programming approaches. The purist approach to XML Data assumes that corporations and developers will rapidly move to store and expose all of their content and data in custom XML-based formats. Secondarily, it assumes that the primary data access and manipulation language will shift from object and SQL based to DOM based programming.

This shift is both major and potentially unnecessary for the substantial majority of applications. SQL-based vocabularies for data access and management have been adopted widely because of SQL's simplicity and declarative nature. Likewise, relational database design fundamentals are broadly understood and easily created using products like Access and database CASE tools such as Power Designer.

Corporations have made billions of dollars of investments in the relational database model, and have built programming arsenals to support handling these systems. It is extremely unlikely that there will be a total shift away from them overnight, given both the current skills and infrastructure investment and the lack of replacement skills and infrastructure.

Furthermore, it assumes that developers will want to shift from a data structure model based on simple constructs such as variables, arrays and recordsets, to one based on the DOM, and that they are willing and able to impose a custom translation layer -- one that requires a new model of thinking about and representing data -- between their existing native database systems (most likely SQL-based) and other client and server systems.

Finally, the Purist XML Data approach is overly optimistic about different vendor-neutral standards and bodies' ability to define shared XML formats for data exchange. While clearly a requirement in the long-run, corporations wanting to do business-to-business integration over the Internet will need a more accessible and rapid approach to structured data exchange. For example, an HMO seeking to share data with regional health care providers over the Web can't wait to define a shared XML vocabulary, let alone wait for health care standards bodies to define XML formats for patient and billing information.

Clearly, both the XML Data Purist and XML Middleware approaches will both play important roles, though it seems that existing models and approaches to data storage, access and programming, combined with the unique platform requirements of the Web will drive companies towards a more pragmatist approach that shields developers from a large base of required knowledge of underlying XML data structures.

A Pragmatist's Approach to XML
The real world of Web development and Web applications demands a pragmatist's approach to XML; an approach that can open up the range of opportunities that XML affords without imposing the intellectual and technical shift that a purist approach would require.

Returning to our earlier discussion of the evolving Web platform landscape, it is clear that an XML-based architecture is required to take browser-based applications to another level, and more importantly to begin globally exposing Web application servers as general distributed services and interfaces to the rest of the Web.

XML carries the same syntax and language advantages that HTML and CFML bring to the Web platform: high-level, declarative, human readable and human writable languages. As a system for encapsulating data and logic, and bringing abstraction into human computer programming, XML portends to be a breakthrough. With this in mind, it is clear that Web developers are seeking the use of XML for client and server-side component encapsulation and browser extensibility.

With this approach, XML-based tags and components encapsulate ranges of client and server side scripts, layout code, and data into intelligent Web components. This would allow a development team to create XML components that combine CSS, HTML and JavaScript which are in turn used by page designers who are thinking and working at a higher-level of abstraction. XML becomes the language and syntax glue that makes this possible.

Allaire has been pioneering this approach for the past two years through the use of ColdFusion Extensions, or CFXs. These XML-based components can encapsulate data, logic, and layout, whether on the client or server-side of the Web platform environment. Over 500 third-party tags are available which embrace this architecture. More recently, Microsoft has introduced XML Behaviors in IE5, and Netscape plans to introduce XML-based ActionSheets. Both of these provide a client-side mechanism for using XML-based components to encapsulate browser behavior. In addition, simply binding XML tags to CSS layout definitions will provide publishing-oriented developers with a cleaner, more reusable architecture.

Clearly, XML as a Web-centric language and component architecture will continue to evolve and grow, as pragmatist approaches continues to win the Web platform war.

A Pragmatist Approach to XML Data Exchange
Even as XML blazes a path as a language and component architecture, this still does not explicitly address the broader and more important need of finding a model and use for XML as a distributed data infrastructure. What would a pragmatist approach to XML data exchange look like?

First, a pragmatist approach to XML data exchange would need to acknowledge and embrace the real-world environments driving the Web platform. In particular, the approach would need to easily accommodate and interoperate within key programming environments in use on Web client and server platforms. These include ECMAScript (a.k.a. JavaScript), ColdFusion, Perl, ASP, and Java, and at times even Windows clients implemented using ActiveX.

Second, it would need to maintain the core data storage and manipulation environments driving corporate applications today. What this means in particular is that it would need to operate in the world of relational databases -- SQL, recordsets -- and the core data structures used in the above languages -- associative arrays, lists, recordsets, strings, etc.

Third, it would need to be transparent to its users. Developers wanting to exchange structured data across the Web, whether from servers to browsers and back, or from servers to other servers, should not have to think about XML parsing, data access and programming. Instead, the developers should work with data in their language and platform environments, assuming transparent exchange across the network.

Finally, it would need to embrace a simple design that worked across a lowest common denominator of data structures used in the Web platform. Relating to the first goal of interoperating between key Web languages, a lowest common denominator approach would realize that the majority of Web programming and applications takes place in higher-level scripting environments such as Perl, ColdFusion and JavaScript, not lower-level object-oriented languages such as Java and C++. It is critical that data exchange on the Web is modeled on the same principals that have made these languages successful -- loose typing, interpreted, relatively easy to learn, ASCII and HTTP friendly, etc.

Outstanding Issues and Scenarios
Clearly there are models and applications where a less pragmatist approach is required. In particular, it will become important to develop and support XML vocabularies for common data formats and industries. These custom XML vocabularies should be used on an as needed basis, as opposed to being the core of how people take advantage of HTTP-based structured data exchange.

For instance, XML vocabularies to represent EDI related data could become important for certain classes of business-to-business data exchange, and having common formats for these matters would be beneficial. Or, for specific classes of information storage and retrieval where marked-up data becomes important for shared search capabilities, one can imagine successful common XML formats.

However, for distributed application-to-application data exchange and communication -- XML middleware -- it's clear that the use and visibility of XML should be both transparent to and integrated with basic programming data structures used in key web application environments.

A second outstanding issue is the desire and need to use XML middleware to supplement, if not supplant, traditional distributed object protocols such as DCOM and Corba. Clearly, for binary oriented object protocols and languages (such as Java and C++) there will be benefits to using XML middleware for distributed object programming. Efforts such as Microsoft's SOAP, or DataChannel's WebBroker will be important in this regard.

However, one should note that these solutions fail to directly address the more pragmatic need facing the actual environments driving the Web platform. Most distributed Web applications are:

  • Page-based and not binary or object based,
  • Built on non-object-oriented languages such as Perl, JavaScript and ColdFusion, and
  • Require a data exchange model that is more stateless in nature -- e.g. passing data back and forth between clients and servers, as opposed to using an RPC-style distributed programming layer.

This pragmatist approach that has been at the center of the Web platform's success and evolution, and the specific opportunity created by XML, are at the center of Allaire's XML strategy.  Next week I'll lay out an open-standards based architecture that Allaire has created to answer the call for distributed Web applications.

Continue to Part 4 of 4: Web Distributed Data Exchange (WDDX)

-Jeremy

Jeremy Allaire is co-founder and Vice President of Technology Strategy at Allaire Corp. Please direct comments on this column to talkback@allaire.com.