XML - Markup Basics

Objective: Learn basic markup structure and syntax of an XML document.

Learn:

Notes

Structure

XML was designed to exchange data; not necessarily to create nice displays. The structure reflects its data centric focus.  The basic markup concepts along with some jargon are: The text of a basic XML document ( note7b_example_Basics.xml)  looks like

<?xml version="1.0"?>

<!-- File Name: note3_basic_example1.xml -->


<BOOK>

<!-- My 1st xml -->
<TITLE>XML for Smarties</TITLE>
<AUTHOR>Ed Van</AUTHOR>
<PRICE unit="$">5.99</PRICE>

<TITLE>Advanced XML</TITLE>
<AUTHOR>Mark Twain</AUTHOR>
<PRICE unit="$">6.49</PRICE>

<!-- Note html tags dont display -->
<img src="image_music.gif"></img>
<h1>Is this a heading?</h1>
</BOOK>

There are 2 parts:

  1. prolog shown in green goes at top
  2. document (or root) element between the tags <book> </book> (where book can be any valid name you want)

The prolog consists of 3 lines all of which are optional

  1. xml declaration specifies the xml version (latest is 1.0); although optional it is suggested to always have this
  2. line 2 is blank
  3. 3rd line is a comment (remember these are optional so don't need them)

Although example above has no other options, a prolog can also contain other options (that we will cover later) like:

A root element is required. The root is akin to a database name. In general the root contains all the content but realize there are two types of content:

  1. text or data
  2. element (or other markup) content

For example
<AUTHOR>Ed Van</AUTHOR>
AUTHOR is the element name and "Ed Van" is the text data
The root usually does not contain text content (although it can) and instead has child elements which have data. The root may contain various optional objects such as:

The above example simply contains some child elements and text, but none of the other options listed (since we cover these later).

Elements

Elements are the most important part of a document, some concepts are:

Elements in above example are:

Syntax

Well formed xml documents have correct syntax, which is different than a document that is valid or that makes sense. The syntax rules are:

Below shows nesting
<root>
<child> can add content here
<subchild>your content goes here</subchild>
</child>
</root>

Your data can actually be either:

  1. character data between element start & end tags
  2. attribute values

The choice is yours, however, later will learn some technical details that suggest do not use attributes and instead make all of your data be element character data. Virtually every XML document has character data, but may or may not have attributes. Either is okay, but realize attributes cause problems with some associated technologies, for example writing a javascript program to process/display xml is way more complicated if there are attributes.

Display

XML can be viewed in several ways using

Of course XML is text so any text enabled software including Notepad can display the document in raw form. However the intent is to display XML usually use a

XML may appear 3 different ways in a browser as described in w3schools.com/xml/xml_view.asp

  1. plain xml code
  2. nicely formatted using some display technique or application program
  3. an error if not well formed

XML without a style sheet will display in newer versions of browsers like Internet Explorer (IE) because they have a default xml style sheet and the Microsoft parser (msxml) is built-in to IE. XML may not display in older browsers.  For now we'll use IE and investigate other parsers and how to load parsers using Javascript later. Displaying XML in IE:

The display of above example in IE looks like (right side shows collapsed element)

So the default display is not pretty at all. To make XML pretty, you must create your own style or some other extra feature. Remember, you make up the elements so the browser has no idea how to format content. To format a display that are various techniques which we cover later like

For now you can see a crude display of  XML just by opening in a browser  (remember to View | Source to see the actual XML). To test your browser works okay with xml click on
noteX_XmInBrowser.htm

Look at links below to see display of XML.

Creating XML Documents

Can create document with text editor or special software to edit xml. It is easier to use software than to code manually, and just like web editors there are various software to do xml

Typical steps for a new document are:

  1. Copy & paste prolog from an existing document (like top 3 lines in my example at top)
  2. Type start and end tag for root element
  3. Type the child elements and content inside the root. For any element usually best to type start tag then copy-paste it and add / to make the end tag, this avoids typing errors or forgetting end tag.
  4. Decide on how you will nest elements and use attributes, if at all
  5. Save the text file (with .XML extension) , then open file in a browser to see if any syntax errors. also check that the nesting is what you intended (try collapsing and expanding the elements). If no errors then all done, otherwise correct errors and repeat this step

Errors

There are many types of syntax errors you may encounter since XML has many rigid rules. If a document has errors the software (like IE) will indicate the error. Look closely at the line# for error that browser displays but realize some line#'s actually are due to errors in the line(s) before it. For example if you miss a closing ">"IE may indicate the error is on a line# below the one missing the >. Some common errors are:

CDATA Sections

CDATA is an alternative to using entities like &gt; to show special characters (< > & ' ") in content. A CDATA section is not parsed and so can contain special characters. CDATA is an easier way to show text that has many special characters. It's especially useful in teaching documents that contain examples of script code or xml code since these by nature have lots of < > & characters. CDATA can

Example is
<![CDATA[

  function button_onclick() {
    if (myval=0 && total < 0) alert("error");
  }
]]>

Processing Instructions

Processing instructions contain information in xml document that applications (i.e., software separate from the xml) use to process it. Processing itself is advanced and we cover later, however, you should have some notion what instructions are since  you may see these in examples. The general form is below where target value is a file or instruction that is up to the application to process otherwise it is ignored
 
<?target value?>
Examples you may see with IE are to include a style sheet or to provide information to your own script like:

Summary

XML document:

First impressions: