The Emerging Distributed Web, Part 3 of 4.
The XML Solution: In this article, Jeremy Allaire describes
the purist and pragmatic approaches to XML implementation.
The pragmatic approach has been at the center of the Web platform's
success and evolution, and the specific opportunities created
by XML are at the center of Allaire's XML strategy.
By Jeremy Allaire
Editor's Note: This is the third in a four-part article
series to be published weekly. Past articles in the series
are available in our Columns & Articles archive
section.
The XML Solution
Over the past year, the Web industry has begun to usher in
a new model for exchanging data over the Web. The Extensible
Markup Language, or XML, has emerged as a foundation for expressing
and exchanging data of almost any form over the native, HTTP-based
Web environment.
XML promises to fill the missing link in the Web platform's
future by providing a common glue for data exchange between
disparate clients and servers on the Internet. XML embraces
the uniqueness of the Web platform; it is based on ASCII,
can be easily transported over HTTP, is network, language
and platform independent, and is flexible and extensible
such that it can scale to solve nearly any data exchange
problem.
Two critical problems that XML potentially solves are:
- Providing a framework to separate data from presentation
or layout, and
- Providing a grammar that will allow Web application
servers to share and exchange data with other servers
across the network.
More generally, XML promises to define common information
formats across nearly every major computing endeavor.
As a language environment, XML embraces on the notion
that computer languages must evolve to be both human readable
and writable, as well as machine readable and writable.
XML enables declarative and human understandable data and
languages, opening doors to richer forms of abstraction
in computer programming and components.
XML Adoption: Two Views
With the surging interest in XML, there is abundant speculation
as to what its role will be in the emerging Web and computing
landscape. Over the past year, we've begun to see a number
of views emerge on the critical role that XML will play
in transforming the Internet and data exchange standards.
Purist XML Data
One major view is that XML and the varying forms of XML
data and vocabularies that emerge will act as the future
of how almost all data is represented and programmed. In
this view, XML data becomes a logical and physical storage
format for data, replacing even SQL servers as the primary
way to represent and access data. Developers define data
structures (relational, hierarchical, networked, object-based,
etc.) that are then represented in XML formats. Accessing
and manipulating that data then requires programming what
is referred to as a Document Object Model, or DOM. The DOM
is the generic API for accessing and manipulating XML data,
much like SQL and ODBC are the generic means for accessing
relational data today.
In this new XML world, developers no longer think in terms
of tables and joins, and SQL, they think in terms of XML
vocabularies and DOM programming. We can see early examples
of this with Microsoft's IE4 XML-databinding features. With
this, a developer would use a server page, created with
ASP or ColdFusion, for example, to query a standard database
and dynamically build an XML format for that data, returning
it to the client where that XML structure would be 'bound'
to an HTML table or other page elements. In this world,
the developer must create an XML format for their existing
relational data. Instead of just passing the client a recordset,
it would create tags and tag attributes containing data
that maps to the relational structure.
Corresponding to this view is the idea that vendor consortia
would work together to define common XML formats for cross-organizational
problems. For instance, health care and insurance companies
would work together to define a Patient Markup Language
that defines the data and structures associated with a patient,
allowing patient data to move easily between relevant business
systems on the network. It might even assume that two companies
working together to exchange invoice data would establish
an XML format to exchange structured invoice data, shielding
each other's systems from proprietary formats or platforms.
XML Data Middleware
Another major view -- and not necessarily a competing one
-- is that XML provides an underlying glue to application-to-application
data exchange, in essence become a Web-native middleware
environment for distributed applications. However, in this
view XML would be more of a transport protocol for object
data than an environment of specific formats for given problem
domains.
For example, using XML as middleware would allow a client
program --- say a JavaScript browser -- to invoke a server
application -- say a ColdFusion page -- and request a set
of data or recordsets. ColdFusion would generate relevant
recordsets, transform into XML, and pass back to the client,
where the client would transform that data back into a native
JavaScript format. This would allow cross-language, network-based
data exchange. The utility of XML as a data exchange protocol
example to the server, where a ColdFusion server could invoke
a service on another remote server, say running Java servlets
or Perl, and request a set of data -- the other server environment
would generate data and transform it into this XML format
and pass it to ColdFusion, which would then translate into
native ColdFusion data objects.
In the 'XML as middleware' model, the use and visibility
of XML becomes transparent to the developer. Instead of
defining custom XML formats and programming an XML DOM,
the developer thinks and programs in the same high-level
constructs of her native development language -- objects,
arrays, recordsets, string variables, etc.
Problems With Purist XML Data Approaches
As one might have noted, the differences between these two
models are potentially substantial. The crucial difference
is in what the model imposes on the developer and corporation
in terms of new data formats, architecture and programming
approaches. The purist approach to XML Data assumes that
corporations and developers will rapidly move to store and
expose all of their content and data in custom XML-based
formats. Secondarily, it assumes that the primary data access
and manipulation language will shift from object and SQL
based to DOM based programming.
This shift is both major and potentially unnecessary for
the substantial majority of applications. SQL-based vocabularies
for data access and management have been adopted widely
because of SQL's simplicity and declarative nature. Likewise,
relational database design fundamentals are broadly understood
and easily created using products like Access and database
CASE tools such as Power Designer.
Corporations have made billions of dollars of investments
in the relational database model, and have built programming
arsenals to support handling these systems. It is extremely
unlikely that there will be a total shift away from them
overnight, given both the current skills and infrastructure
investment and the lack of replacement skills and infrastructure.
Furthermore, it assumes that developers will want to shift
from a data structure model based on simple constructs such
as variables, arrays and recordsets, to one based on the
DOM, and that they are willing and able to impose a custom
translation layer -- one that requires a new model of thinking
about and representing data -- between their existing native
database systems (most likely SQL-based) and other client
and server systems.
Finally, the Purist XML Data approach is overly optimistic
about different vendor-neutral standards and bodies' ability
to define shared XML formats for data exchange. While clearly
a requirement in the long-run, corporations wanting to do
business-to-business integration over the Internet will
need a more accessible and rapid approach to structured
data exchange. For example, an HMO seeking to share data
with regional health care providers over the Web can't wait
to define a shared XML vocabulary, let alone wait for health
care standards bodies to define XML formats for patient
and billing information.
Clearly, both the XML Data Purist and XML Middleware approaches
will both play important roles, though it seems that existing
models and approaches to data storage, access and programming,
combined with the unique platform requirements of the Web
will drive companies towards a more pragmatist approach
that shields developers from a large base of required knowledge
of underlying XML data structures.
A Pragmatist's Approach to XML
The real world of Web development and Web applications demands
a pragmatist's approach to XML; an approach that can open
up the range of opportunities that XML affords without imposing
the intellectual and technical shift that a purist approach
would require.
Returning to our earlier discussion of the evolving Web
platform landscape, it is clear that an XML-based architecture
is required to take browser-based applications to another
level, and more importantly to begin globally exposing Web
application servers as general distributed services and
interfaces to the rest of the Web.
XML carries the same syntax and language advantages that
HTML and CFML bring to the Web platform: high-level, declarative,
human readable and human writable languages. As a system
for encapsulating data and logic, and bringing abstraction
into human computer programming, XML portends to be a breakthrough.
With this in mind, it is clear that Web developers are seeking
the use of XML for client and server-side component encapsulation
and browser extensibility.
With this approach, XML-based tags and components encapsulate
ranges of client and server side scripts, layout code, and
data into intelligent Web components. This would allow a
development team to create XML components that combine CSS,
HTML and JavaScript which are in turn used by page designers
who are thinking and working at a higher-level of abstraction.
XML becomes the language and syntax glue that makes this
possible.
Allaire has been pioneering this approach for the past
two years through the use of ColdFusion Extensions, or CFXs.
These XML-based components can encapsulate data, logic,
and layout, whether on the client or server-side of the
Web platform environment. Over 500 third-party tags are
available which embrace this architecture. More recently,
Microsoft has introduced XML Behaviors in IE5, and Netscape
plans to introduce XML-based ActionSheets. Both of these
provide a client-side mechanism for using XML-based components
to encapsulate browser behavior. In addition, simply binding
XML tags to CSS layout definitions will provide publishing-oriented
developers with a cleaner, more reusable architecture.
Clearly, XML as a Web-centric language and component architecture
will continue to evolve and grow, as pragmatist approaches
continues to win the Web platform war.
A Pragmatist Approach to XML Data Exchange
Even as XML blazes a path as a language and component architecture,
this still does not explicitly address the broader and more
important need of finding a model and use for XML as a distributed
data infrastructure. What would a pragmatist approach to
XML data exchange look like?
First, a pragmatist approach to XML data
exchange would need to acknowledge and embrace the real-world
environments driving the Web platform. In particular, the
approach would need to easily accommodate and interoperate
within key programming environments in use on Web client
and server platforms. These include ECMAScript (a.k.a. JavaScript),
ColdFusion, Perl, ASP, and Java, and at times even Windows
clients implemented using ActiveX.
Second, it would need to maintain the core
data storage and manipulation environments driving corporate
applications today. What this means in particular is that
it would need to operate in the world of relational databases
-- SQL, recordsets -- and the core data structures used
in the above languages -- associative arrays, lists, recordsets,
strings, etc.
Third, it would need to be transparent to
its users. Developers wanting to exchange structured data
across the Web, whether from servers to browsers and back,
or from servers to other servers, should not have to think
about XML parsing, data access and programming. Instead,
the developers should work with data in their language and
platform environments, assuming transparent exchange across
the network.
Finally, it would need to embrace a simple
design that worked across a lowest common denominator of
data structures used in the Web platform. Relating to the
first goal of interoperating between key Web languages,
a lowest common denominator approach would realize that
the majority of Web programming and applications takes place
in higher-level scripting environments such as Perl, ColdFusion
and JavaScript, not lower-level object-oriented languages
such as Java and C++. It is critical that data exchange
on the Web is modeled on the same principals that have made
these languages successful -- loose typing, interpreted,
relatively easy to learn, ASCII and HTTP friendly, etc.
Outstanding Issues and Scenarios
Clearly there are models and applications where a less pragmatist
approach is required. In particular, it will become important
to develop and support XML vocabularies for common data
formats and industries. These custom XML vocabularies should
be used on an as needed basis, as opposed to being the core
of how people take advantage of HTTP-based structured data
exchange.
For instance, XML vocabularies to represent EDI related
data could become important for certain classes of business-to-business
data exchange, and having common formats for these matters
would be beneficial. Or, for specific classes of information
storage and retrieval where marked-up data becomes important
for shared search capabilities, one can imagine successful
common XML formats.
However, for distributed application-to-application data
exchange and communication -- XML middleware -- it's clear
that the use and visibility of XML should be both transparent
to and integrated with basic programming data structures
used in key web application environments.
A second outstanding issue is the desire and need to use
XML middleware to supplement, if not supplant, traditional
distributed object protocols such as DCOM and Corba. Clearly,
for binary oriented object protocols and languages (such
as Java and C++) there will be benefits to using XML middleware
for distributed object programming. Efforts such as Microsoft's
SOAP, or DataChannel's WebBroker will be important in this
regard.
However, one should note that these solutions fail to
directly address the more pragmatic need facing the actual
environments driving the Web platform. Most distributed
Web applications are:
- Page-based and not binary or object based,
- Built on non-object-oriented languages such as Perl,
JavaScript and ColdFusion, and
- Require a data exchange model that is more stateless
in nature -- e.g. passing data back and forth between
clients and servers, as opposed to using an RPC-style
distributed programming layer.
This pragmatist approach that has been at the center of
the Web platform's success and evolution, and the specific
opportunity created by XML, are at the center of Allaire's
XML strategy. Next week I'll lay out an open-standards
based architecture that Allaire has created to answer the
call for distributed Web applications.
Continue to Part
4 of 4: Web Distributed Data Exchange (WDDX)
-Jeremy
Jeremy Allaire is co-founder and Vice President of Technology
Strategy at Allaire Corp. Please direct comments on this
column to talkback@allaire.com.