XHTML and validation

Objective: HTML versus XHTML, and validation

Learn:

  1. XHTML
  2. DTD and DOCTYPE
  3. Validation and display

Book: not covered in the book in one place

Web Readings:

Movies

To simulate in-class lectures

Name MB Description
Validate.wmv 6 Validation

Notes

You code to the rules of a certain standard like HTML 4.01. There are different standards since HTML has evolved and its successor XHTML continues to evolve. As a result, some rules have changed and some elements have been deprecated meaning at some future point they may no longer be supported. In a sense there are two standards:

  1. official markup standards (defined by W3C)
  2. unofficially what browsers support (defined by whoever creates the browser)

As of today, browsers continue to support legacy code even if it does not conform to the latest official standard. Indeed, browsers will not likely dis-continue support for HTML any time soon. But at some point a browser may not support or display the old code correctly. In one way it's similar to HD-TV so at some point everyone needs to convert, but the difference is with HTML there is no deadline set so nobody knows how long legacy code will work okay. XHTML offers several advantages over HTML which are:

  1. Employers ask for it so knowing XHTML will help get a job
  2. XHTML is more future flexible since can be used like xml with things like xsl (html cannot be used like xml)
  3. XHTML is  more future compatible for the day when browsers do require it although that will not happen anytime soon, unlike 2 reasons above which are happening now

There is no right or wrong but the best advice is probably:

  1. if you want a quick web page for personal use that may not be around long, then any code will likely display okay...after all many pages on the web today are plan old html and work fine
  2. otherwise write code for the "latest" standard which ensures it will be supported long into the future and looks better for your resume/portfolio

The path is not so clear however since the question is what should be the latest standard you write for as explained below

XML is another markup language with a strict structure meaning if any piece of code had one error then the whole document is deemed bad (not well-formed) and nothing displays. However the biggest differences between html and xml are:

Although XHTML implies it is HTML + XML,  that is misleading. XHTML is really like HTML since it uses pre-defined elements, however it follows the syntax rules of XML .We  cover XML later so our choice for now is HTML and XHTML. But XML has two concepts that carry over to XHTML:

  1. well formed: means the document code has no syntax errors; for example if an element is missing the closing tag the entire document is not well formed
  2. validation: means the document (markup and content) conforms to a schema or DTD that indicates what can be in the markup and content

So using XHTML allows us to ensure documents are well formed and are valid against some standard. The different versions to date are:

When creating new pages your best choice is code for XHTML 1.1 since it is more forward compatible than other versions. Even 1.1 may not even be compatible with the future, however, it will be much easier to convert strict XHTML than html into whatever the future is. Further, companies insist on competent coders even though non-standards compliant code (sloppy and deprecated code ) work just fine with today's browsers. Using XHTML strict code does NOT ensure more browsers today will render the code better, since strict requires CSS which very old browsers cannot handle. But most people use fairly modern browsers so probably not much difference display-wise whether use XHTML or not

Code for XHTML 1.1 (or strict) means

  1. must follow the syntax rules, this is no big deal since the rules are not that hard and are the basic XML rules, like use proper nesting, and quite simply are good coding practices in any event
  2. use styles (CSS) which is really a good idea since styles are the way to go
  3. the hard part is must be careful not to use deprecated elements, actually this would be easy if there were no examples available of old code but the problem is many web pages and examples (like my notes and the book) still show code that is deprecated and sometimes you are not aware what is allowed

For this course, you can use any code you want unless the assignment instructs otherwise. You still should understand HTML code because HTML and XHTML are quite similar in many respects and when you go to update old pages you need to know what the old HTML is. The book does not cover XHTML very well so must rely on web readings.

The most confusing thing is some elements have been deprecated and should not be used in XHTML. The hard part is realizing which tags are obsolete and which are not. Realize CSS (style sheets, which we cover later) are the preferred way to control layout and formats in XHTML. Although some html tags have been deprecated in XHTML, they still work in browsers.
For example, there are various ways to do things like show a background image:
  1. use background attribute in body tag (will automatically tile, i.e., repeat)
  2. use bgimage tag (obsolete so avoid it)
  3. use a style
The preferred way is #3 because as stated at
w3schools.com/html/html_backgrounds.asp 
#1-2 are deprecated meaning they are no longer part of html standard. However #1-2  may still work because browsers may still support them (then again they may not). So  you will see lots of info about #1-2 on the web but the modern way is #3. We will cover some of the deprecated tags because:

  1. the book does
  2. if you ever look at legacy code, you should know what the tags are doing

XHTML rules

XHTML is not very different from HTML in that most of the same markup exists but there are some new rules. In technical terms, a valid XHTML document must be well formed (no syntax errors) and conform to a certain DTD (we will cover this more in XML) as specified in the DOCTYPE.  There are many syntax rules but the important points are that in XHTML (realize there may be  other differences but they are minor compared to below):

Indeed most of these are no big deal since you probably coded that way in HTML also. XHTML 1.1 has even more rules like

DTD/Doctype/Validation

Validation involves determining if code has any syntax errors. It has little to  do with whether a document is artistically well designed and looks nice in a browser. It simple determines if code follows the syntax standards.

An XHTML document is validated against a Document Type Definition (DTD). Unlike XML where you can make your own DTD, with XHTML the DTD's are built into the standard. So for validation you really need 2 things:

  1. a  DTD specified in the doctype as the first line of code in the file
  2. <html> element should include attribute xmlns which is a namespace...you really dont need to know the details of DOCTYPE and html attributes but can learn more about them at reference sites online

 for example

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

<head>
<title>whatever</title>
</head>
<body>
<p>your stuff goes here</p>
</body>
</html>

The main DTD and doctypes are below and can just copy & paste the doctype you want

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3c.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

NOTE: even though xhtml tags must be lowercase the DOCTYPE tag must be uppercase to validate

Display and Validation

Every web page  should be tested. There are 2 potential ways:

  1. view in a browser: this is a must and is the real world test...the catch is what browser(s) to use since there are different ones, both old and new, and users have different screen resolutions
  2. use a validator: this is optional and checks that the code conforms to the DTD in the doctype. Validators give errors for code that is not valid, although such code may in fact display okay in the browser

So to ensure code is well formed and valid for a certain standard, test it in a validator. But always test whether your page works in browsers. The issue becomes what to do when your page generates lots of crazy validator errors and yet still looks okay in a browser which is often the case

Ideally, test a page in every browser your users might use. However, some browsers are awfully old and for casual use you likely do not have more than the latest IE or Mozilla.

As for validators, there are many software and web sites available to do this, for example

Validators run your code against a known standard (like XML 1.1) and return any errors. Keep in mind errors

So in order of priority

  1. test in a browser
  2. test in several modern browsers
  3. test in more browsers
  4. try a validator but consider it optional for html

If you want to create valid XHTML, there are some differences in what elements can be used versus HTML since over the years some elements evolved that did the same thing and some worked in certain browsers only, for example embed versus object (Netscape versus IE browsers). So creating valid XHTML sometimes means the code actually will not work in some browsers. On the other hand, if validity is that important to you, you are probably already reconciled to leaving older browsers behind.

The code for this page should be valid so try it and see at
http://validator.w3.org/#validate_by_input+with_options

Also see examples at:

MIME type

Browser display is more affected by MIME type than the DOCTYPE. Validation uses DOCTYPE to see what standard to compare code against. However, the browser displays a file based on its MIME type and not on what type of code it really has.

What does this mean...one one hand a lot and on other very little?  If the MIME type is not correct then unexpected results may occur. But for everything except XHTML, the MIME is pretty well defined (it is like a file type association with filename extensions in windows) meaning there is only 1 type you would ever use.  However, for XHTML you can actually use a MIME of either html or xml.

Read more at xml.com or at w3.org  The summary is

For XHTML as xml, use something like this (where xml-stylesheet is optional):

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="/style.css" media="screen,projection"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict //EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>

For the latest browsers, using <xml> prolog or not makes no difference, but for older browsers it can be a problem, which is why many xhtml pages use type html. XHTML served as html is not much different than HTML as html. If XHTML is served as xml like standards suggest then will find xhtml has less browser support than html.

In technical detail...the MIME type (sent in the Content-Type HTTP header) tells the browser that the document is an application of XML if it is application/xhtml+xml, or can also use application/xml or even text/xml although text/xml is not recommended. Most  XHTML documents are served with a MIME type of text/html, which means that they are to be considered as HTML documents. With such a MIME type, you are not using XHTML as far as browsers are concerned but are using HTML.

The XML namespace declaration (xmlns) in the <html> tag tells user agents that it is XHTML (rather than any other application of XML). You must use one of the three aforementioned XML MIME types, or user agents will ignore the XML namespace. If, for instance, you serve your document as text/html, the namespace is ignored, since HTML does not support XML namespaces.

Meta tags can also be used for character encoding, stylesheet, or Content-Type HTTP equivalent but really no need to use meta cause:

However some browsers have trouble with a prolog so you can specify character encoding by inserting a Content-Type element into the <head> of your document to avoid troublesome prolog.

The fact that XHTML may be served as HTML or XML makes a difference to the way encoding information needs to be declared. Current browsers may display an HTML file in either standards mode or quirks mode. This means that different rules are applied to the display of the file, one conforming to the W3C standards interpretation of expected behavior, the other to expectations based on the non-standard behavior of older browsers.

In recent browsers such as Internet Explorer 7, Firefox, Opera, and others, a page served with a DOCTYPE declaration will be rendered in standards mode with or without the XML declaration,

With Internet Explorer 6, however, if anything  (like xml prolog) appears before the DOCTYPE declaration the page is rendered in quirks mode. Because Internet Explorer 6 users still count for a many users, this is a significant issue. If you want to ensure that your pages are rendered in the same way on all standards-compliant browsers, you need to think carefully about how you deal with this. It is a good idea to use a DOCTYPE declaration at the top of an HTML or XHTML file so that the document is rendered in standards mode by more recent user agents. The presence of an XML declaration in an XHTML file served as HTML will cause your file to be rendered in quirks mode on Internet Explorer 6 (and therefore for a potentially large proportion of your audience).