You, COBOL, PLI And XML

By

Don Fowler, MCE, Inc., July 2004

 

What’s XML All About And Why Do I Care?

In an e-mail we recently received, one of our clients made the following observations about their total Enterprise Computing environment:

"In topologies that we are considering, XML plays a big role.  It wraps the data structure in its DDL and lets it be seamlessly referenced across the peers. Kinda like SQL for databases! So it is the data's common structure from the web front end to client/server applications, to mainframe applications and other web housed applications structured data language (DSL). Wow!"

The idea of universal data formats is not new. Programmers have been trying to find ways to exchange information between different computer programs since before the earth cooled.

As Don Estes, a Senior Consultant with Cutter Consortium, stated in his XML As Glue paper:

"What is needed is a universal integration solution that will equally satisfy requirements for web/client-server integration and web/mainframe integration, as well as satisfying any other requirement including mainframe/mainframe integration of 30-year-old COBOL applications.

When the problem is viewed as one of data exchange, there is no significant difference in the requirements of one platform from another. XML comes into the equation because it was designed from the ground up to satisfy all data exchange requirements.

XML can and should be the glue to use to bind multiple applications together. In this regard, we can confidently predict that XML will be as revolutionary as the introduction of SQL databases 20 years ago, and will have as profound an impact as the databases have had, as the technology is absorbed over time."

Standard Generalized Markup Language (SGML) was developed to achieve part of this within the web world.

SGML, a meta-language, can be used to mark up data, that is, to add metadata in a way that allows data to be self-describing. The markup process involves using tags to identify pieces of information in a document. Tags are names (strings of characters) surrounded by arrow brackets (< and >). Every piece of data that is encoded will have a start tag and an end tag, for example, <town> Seminole </town>. The start and end tags make it easy for software to process (parse) the encoded information, as it clearly delineates where certain pieces of information start and where they end.

SGML does not prescribe any particular markup; instead, it defines how any markup language can be formally specified.

The most popular SGML application is HTML (Hypertext Markup Language), the markup language that dominates Web user presentation management. However, different browser vendors introduced a number of incompatible tags to HTML, which are outside the scope of the original HTML specifications. These tags create problems for developers when they author Web pages because they must consider what browser will display the pages. And, although HTML has been very successful for displaying information on browsers, it was not found to be useful in describing the data that it represents, meaning it did not have the metadata capability that is essential for a self-describing data document.

But SGML is quite inefficient and cumbersome when it is used to encode complex data structure. Hence, there arose a need to develop a more lightweight markup language, so W3C (world-wide web consortium) developed the specification for XML (eXtensible Markup Language). XML is similar to SGML in that it preserves the concept of general markup with very few optional features.

 

 

XML Families

There are two general families of application of XML technology. The first family relates to document-centric applications, and the second family to data-centric applications.

The document-centric application family outputs are primarily meant for human consumption. Some examples of such documents are legal briefs, manuals, product catalogs, and the like. These documents are always considered semi-structured marked-up text.

Data-centric application family XML is used to mark up highly structured information such as data structures in a programming language, relational data from databases, financial transactions and the like.

Data-centric XML is typically generated for internals (by machines and for machines) use. The ability of XML to nest and repeat markup makes it an excellent and viable solution for representing these types of data. With the introduction of XML Schema, a user can add data type attributes to the tags, which makes data-centric XML a very powerful mechanism to represent enterprise data, especially for data exchange and e-business.

XML is a system-independent standard for the representation of data. XML is not just some new version of HTML; it is different from HTML. Like HTML, XML has tags, and in these tags it encloses data. The difference is that HTML uses its tags to display the enclosed text, and these tags are standard and fixed.

In document-centric XML the user can create the tags wanted, with only a small number of restrictions. These tags will be identified by a program (parser), which will process the data enclosed between them.

Text is system-independent, and since XML is very flexible and is based only on text, it is used as the main way to transport data between different environments.

Often, XML documents are automatically generated by tools, and in many situations we need these XML documents to follow rules we create. We use other documents, containing XML data definitions in which we specify our restrictions, to accomplish this.

The most widely used rules language is Document Type Definition (DTD). A document type definition, or DTD, specifies the kinds of tags that can be included in your XML document, the valid arrangements of those tags, and the structure of the XML document. The DTD defines the type of elements, attributes, and entities allowed in the documents, and may also specify some limitations to their arrangement. You can use the DTD to make sure you don't create an invalid XML structure since the DTD defines how elements relate to one another within the document’s tree structure. You can also use it to define which attributes can be used to define an element and which ones are not allowed.

In other words, a DTD defines our own language for a specific application. The DTD can be either stored in a separate file or embedded within the same XML file. If it is stored in a separate file it may be shared with other documents. XML documents referencing a DTD will contain a <!DOCTYPE> declaration, which either contains the entire DTD declaration if this is the case of an internal DTD, or specifies the location of an external DTD.

For the data-centric XML family, XML Schema is another rules language that aims to provide more complex semantic rules. It also introduces new semantic capabilities, such as support for namespaces and type-checking within an XML document.

The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. A Schema is similar to a DTD in that it defines which elements an XML document can contain, how they are organized, and which attributes and attribute types elements can be assigned.

Therefore it is a method to check the validity of well-formed XML documents. The main advantages of Schemas over DTDs are:

_ Schemas use XML syntax.

_ It is possible to specify data types.

_ Schemas are extensible.

XML Schema, a W3C Recommendation as of May 2001, aims to provide such functionality; it also introduces new semantic capabilities, such as support for namespaces and type-checking.

 

 

Namespaces & XML Schema

Namespaces are used when there is a need to have different elements with different attributes but with the same name. Depending of the context, a tag is related to an element or to another one.

W3C XML Schema allows us to define data types and use these types to define our attributes and elements. It also allows the definition of groups of elements and attributes. In addition, there are several ways to arrange relationships between elements.

 

Document validity and well-formedness

XML is reminiscent of HTML since they are both derived from SGML, which was defined in 1986. But unlike HTML, XML tags identify the data, rather than specifying how to display it. Where an HTML tag says something like "display this data in bold font" (<b>...</b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example: <message>...</message>). This is the first of a number of differences between the languages.

XML documents can be well-formed, or they can be well-formed and valid.

These are two very important rules that do not exist for HTML documents. These iron-clad rules contrast with the more free-style nature of a lot of the concepts in XML. The rules can be defined briefly as follows:

_ A well-formed document carries out the basic design rules for XML

documents.

_ A valid document respects the rules written in its DTD.

_ A document might be well-formed but still not be valid.

These examples illustrate the difference between well-formedness and validity:

_ Documents that adhere to rules described in the associated DTD or XML

Schema are valid.

_ Documents that carry out the syntactical rules for XML documents are

well-formed. These rules have to do with attribute names, which should be

unique within an element, and attribute values, which must not contain the

character <, and so on.

All of the constraints are defined in the XML 1.0 recommendation. For more information refer to the following Web site:

http://www.w3.org/XML

Also see Professor Airi Salminen’s paper on the XML Family of Languages at URL

http://www.cs.jyu.fi/~airi/xmlfamily.html for additional information on XML families and standards.

Determining whether a particular document is in compliance with these rules is a two step process. Well-formedness insures that XML parsers will be able to read the document, validity determines whether an XML document adheres to a DTD or schema. An XML application will check for and reject documents that are not well-formed before checking whether they comply with validity constraints.

 

The Toolkit You Need

Begin with your high-level languages.

IBM Enterprise COBOL for z/OS V3 and Enterprise PLI for z/OS V3 let you integrate your traditional programs into the e-business world by enabling you to invoke an XML parser from the program and to operate with Java components across distributed applications.

The ability to have an XML parser that is accessible to your programs and that can be invoked with one progamming language source line, is the key for all business processes. This XML parser lets you promote the exchange and use of data in standardized formats, including XML and Unicode, and enables you to reuse existing applications in WebSphere and traditional z/OS environments.

The IBM 'Enterprise' level compilers and run-times provide object-oriented syntax to facilitate the interoperation of these languages with other languages, such as Java and C++. For example, you can instantiate Java classes from COBOL or PLI programs, invoke methods on Java objects, and define Java classes that can be instantiated in Java or COBOL or PLI.

Summary

Embracing XML as part of the organization’s total computing plan is wise. Read Forecross Corporation’s Kim Jones article on XML: The Future of Data for an excellent management level discussion of the topic, issues and solutions. This article may be found at URL:

http://www.forecross.com/white-papers/xml_future_of_data.htm

Use your browser BACK button to return from this article.