XML Overview

Objective: Learn what eXtensible Markup Language (XML) is in general, how it used, and why it is important.

Learn:

Web Readings: links comes in various categories like   (XML References : will be useful as we move along)

Notes

You do not learn any coding or real application yet. These notes are to get a grasp on some basic concepts of XML like why it is important, how it fits into technology, and how to display it.

The links at the top just give an idea of the many resources available for XML. You do not need to go through each in detail but should peruse and remember they are here because many will be useful as we move along.

Definition

XML officially is eXtensible Markup Language
but perhaps the better term is "information exchange language" since it is the de-factor standard for exchanging information and files

XML is

XML & HTML are  simplified subsets of SGML (Standard Generalized Markup Language) which are open systems for defining document formats with self describing tags. HTML is a small, hardwired set of tags making it easy to build document. SGML makes it possible to define your own formats and make large information documents, but full SGML contains many optional features not needed for Web applications. XML is a  subset of SGML specially designed for Web applications. Like HTML it is a text file format that follows special markup rules for coding. In contrast to html it allows:

Goals: 10 design goals for XML can be seen at w3.org/TR/REC-xml as XML:

  1. is usable over the Internet.
  2. supports a wide variety of applications.
  3. is compatible with SGML.
  4. is easy to write programs which process XML documents.
  5. optional features shall be kept to the absolute minimum, ideally zero.
  6. documents should be human-legible and reasonably clear.
  7. design should be prepared quickly.
  8. design shall be formal and concise.
  9. documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

I would say the main goal is a common data way to exchange data using text data especially across the web.

We will cover XML details later. For now realize xml files:

XML vs HTML vs DBMS

You do not need to be an expert in HTML or database management systems (DBMS) to understand XML or this course. However, our premise is the concepts of XML, HTML and DBMS are related and overlap. So one aspect of this course is understanding where each one fits in. They are all about information and delivering it to a user. More specifically, we are interested in "data exchange" across the which has key points of

Various means have evolved over the years including

The standard way to exchange data across the web today is called a "web service" and the backbone of web services is XML. Everyone knows what a web site is but many do not know what a web service is. A web service is like a web site but instead of sending out a web page as html for display they send out a data file in XML to any other web site that wants to pick it up and reprocess for display.  An example is stock quotes, so many web sites show stock quotes. Do you think every site gets their own stock quotes...possibly but more likely a central Wall Street database sends out XML that any site can pick up and re-format as they see fit to incorporate select stock quotes and display as  HTML along with other HTML on their web page.

Another example of the concept of web data exchange is, a book publisher has a Microsoft Access database of  books with fields like author, book title, ISBN, etc. The publisher wants to exchange that data with users across the web. The target users are not necessarily the entire web community but may be a subset like libraries, sales reps, colleges, etc.  Further, each user may be interested in certain data (some may want isbn, some may not) and they may be interested in different display (some may want to sort alphabetically, some may only want authors from Antarctica). In the past, the publisher probably would have written a custom web application (maybe in Java or ASP) to give any user options to retrieve different options and retrieve in different ways, so the end result was a web page in HTML with a nice display but there was no true data exchange and users were limited to whatever options the application had. Another option, and the one we want, is the publisher sends out a standard XML file (generated from their database) and any user can use their own application to process the XML. The users now have an actual data exchange (not just a web display) and it is different than a custom application because the user can user any xml enabled software to process in a generic way or they can further customize it if need be. In many cases the user may actually be oblivious to the fact that xml was used since at some point software is used to display the data

In truth, everyone has probably used web sites or applications based on XML but often it is transparent and the end result may be a standard display in some other technology. For example, the book publisher may send xml through a web service but some users have applications that display it as HTML in a browser while other users may import the xml into Excel and view as a spreadsheet.

Another point is that most technologies are confined to specific software for the end user. For example, HTML really is for browsers, XLS is usually only for Excel, DOC is only for Word, database files are only for that database etc. XML is the one format that almost all software now supports and can use. Another way to look at is to go into Microsoft office and see what files types are supported (using menu File | Save AS) . Each Office 2003 app (Word, Excel, Access etc) has its own native format listed first but XML is listed second which indicates it is a high priority format for every app.

It is helpful to know the history and concepts. HTML and DBMS evolved long before XML. At first glance HTML and DBMS do not seem related. However, they are both about getting information to the user but they do so in very different ways, such that each has serious flaws in today's world. HTML excels at display and portability but has poor capability for data exchange or management. DBMS excels at data management but is poor for portability and web use where data is exchanged (transferred) across  the web amongst many users who may be using different platforms. XML bridges the gap so while HTML and DBMS are specialized for either display or data structure, XML can do both.

In more detail, the concepts of each are:

Although XML evolved for web data exchange, it is now used as the main file format for many technologies and not just web. So you will see XML files used in operating systems and in custom apps instead of things like text files, INI files, or database file


More about HTML and XML...

If Html is like using Word, then XML is like using Access

Does XML replace HTML: No, html is still the main way to format documents for display. Html is easy and well suited for display instructions. In fact many apps may take raw xml files and convert into html for actual display. XML extends capability of web pages beyond just display to:

What Does XML replace: Nothing really, basically it fills a gap and overlaps the function of various technologies. Originally it was designed for use on the internet but has expanded to much more including a data format for OS and databases. Html is an easy way to format displays but is terrible for representing and exchanging data. Databases and spreadsheets are designed to represent and process data but use proprietary formats that are not conducive to exchanging data over the web. An excel file is fine if you are and all your users have Excel but put that file on the web and there will certainly be users who cannot read it. So XML is a jack of all trades, in that it can:

Why Use XML

3 top reasons

  1. XML is a hot topic
  2. XML can get you a job
  3. XML is very useful in computers and web because of its open standard data capability

XML is hot: not just for the web but for computer software in general, including databases. To understand why, you must realize that traditional web pages were based on HTML which works well to format a simple document and integrate with scripts to make a page dynamic and animated. So html web pages today are quite dazzling on pc's. However for many business purposes and for certain future technologies html is not well designed for:

  1. Web based devices beyond pc's and browser
  2. Sharing data not just documents

Jobs: XML knowledge by itself may not get you a job but is quick becoming one of those essential, must have skills.

Its all about data...One of the major business needs is to pass data (basically database type information) from one site to another. XML is hot because it can package data into web pages and can make the web work better on other devices like wireless. Sometimes the usefulness of XML is not readily apparent, partly because the web on wireless devices has not taken off yet and many people do not find data sharing interesting.

In some ways XML is  similar to data-basing; it is not colorful and dazzling but is extremely important because it drives business applications. So it has the promise to be a very useful and paying skill. Technologies like html and flash have more instant gratification because you can quickly create something that is animated and has lots of color and eye catching appeal. XML on the other hand is more tedious, more structured, and more geared toward sharing data than to attracting someone to your web page. 

XML is text designed to be human or machine understandable. XML may be harder for your eye to read (and edit) than html but it is still text based. In many ways it is like a text version of a database. A main purpose is describing hierarchical data for things like databases, e-commerce, web development, searching, etc. Custom tags enable the definition, validation, and interpretation of data between applications and between organizations. Html is not  suitable for deployment of commercial web based data transactions and prevalence of sloppy markup makes it hard  for user agents make sense of the web. XML  is a structured set of rules to define data to be shared more than just showing a document visually in a browser. XML is not a single, predefined markup language; instead you make customized tags for different classes of document.

Why not just use a database? XML is better for sharing data in many cases, especially across the web since it

The use of XML has sky rocketed since it was introduced. Originally, designed for exchanging data,  but many other uses have been identified. Microsoft made their Office Suite applications fully XML enabled. MS SQLServer  (database management system) is XML enabled. Future Windows versions will be more and more XML based.

There is no question XML is becoming the backbone of many web applications. What is not clear is how much detail one needs to know. It is evolving like html in that in the beginning everyone wants to know it inside and out. Of course now most html is generated by editors like Dreamweaver rather than typing it directly. Similarly, much of the XML in the future may be handled behind the scenes and generated by programs and editors rather than someone who knows the minute XML details. For example, Microsoft Visual Studio .NET programming package uses XML as its underlying technology but the user mainly knows .NET programming and not XML. Regardless, it will always be useful to know the concepts, terms, and some detail even if you edit and manipulate XML via some higher level software.

XML uses are many, some general uses are:

Summary of XML purpose:

Applications

Broad applications of XML are applications that:

  1. require the Web client to interface between 2 different databases.
  2. distribute significant processing load from the Web server to the Web client.
  3. require the Web client to present different views of the same data to different users.
  4. require portability to non-browser clients like handhelds, tv's, telephones, etc which cannot support large software like browsers
  5. Web Services: another hot new technology; web services are XML based, but again the user does not need to know much about XML.  There are many web services articles like TechEd 2003

XML is also extended to make custom languages for example:

XML is a text document with user defined tags that follows strict rules. Unlike Html, an XML document by itself is not that useful. Instead an XML application combines a document structure with various elements. Using an XML applications to develop a document enables a document to be shared by users  who display & process the document using software developed to recognize the application. Various parts of an application are (all but the document are optional):

So XML is more than a markup language; it's a framework for building applications. It is much more technical than html and is rapidly evolving while HTML is stagnant and will not be refined in the future.

Real World Applications: Many xml applications are industry specific. You can see a list of applications at  oasis-open.org  Some general and specific applications are:

  1. Storing databases with field and record type identifiers and ways to query, sort, and display
  2. Web Services: communicating among applications over the web independent of operating systems or languages using SOAP (simple object access protocol)
  3. Integrate Web across portable, wireless,  and non-pc devices: Wireless Markup Language
  4. Replace configuration files (like .ini files)
  5. Structure documents in treelike fashion for example marking an on-line book with chapters, headings, pages, etc.
  6. Vector Graphics: SVG graphics are a new web standard and are presented using xml.
  7. Multimedia presentations: which are quite limited in html
  8. Create custom languages like
  9. The list goes on and on

Basically an industry decides on standards for an xml application so users in the industry can exchange documents, share data, and display data. Of course the application is given a nice acronym usually with ML in it. So maybe someday there will be a HaccML to allow on-line students to exchange data with something other than WebCt.

For many applications, XML is not used to create a "static" markup page; instead it is used to dynamically generate a message to communicate data between a server and client. So the information is conveyed across the web via XML but is not a regular page that anybody can link to and display.

Future: XML is already used for custom markup languages and exchanging data via web services, and these will only increase. Other uses already taking place include (with mainly Microsoft examples):

Security: Although XML completes the internet for data exchange, potential new security holes open. High end databases applications have plenty of encryption and authorization tools. Because XML packages the data and data definitions in a text file, anybody can see your data. XML in transit across the web is secured by SSL and HTTPS so the more likely potential hack is for data residing as XML on servers. Fortunately that potential has not been exploited widely yet.