Validation and Parsers

Objective: Validation purpose and approaches using DTD and Schemas

Learn:

Web Readings:

There are many different software tools to validate. They should all give the same answer but alas some seem better than others. In this course I usually run assignments and exams against one of the following online validators

That does mean the above are best, instead simply means I use them to grade so you probably want to use them too as a double check even if you use some other tool.

Notes

XML validation is different than HTML validation. So in HTML you validate code against a published standard that the whole world uses and that may be enforced by browsers in display. In XML, simply being well-formed means it conforms to world-wide standard and can display in browser. Thus xml validation defines what elements and attributes are acceptable to use in a particular xml document or application. So XML validation is mainly when more than one person uses the same xml application, and is a way for all the users know what the elements are. XML validation is very similar to a database schema or table definition.

If you simply create your own xml file and display on the web, there is no reason to use validation. If however you are part of a company or a user group where multiple people might create similar xml files, then you need a schema and validation so everyone follows the same rules. Again, xml validation/schema simply defines rules an xml document must follow for element names, data types, nesting etc.

When all done reading make sure you understand:

To validate xml against a schema must run software that can validate. There are many such software and what you use depends on what you like and what type of schema you have (DTD, XSD, or XDR); There are several general choices and how you do it depends on the tool:

  1. Easy way but have to buy: use an xml editor tool like Stylus Studio.
  2. Even easier and free: use online validator
  3. Hard way: tools that need to be installed or integrated into a program;  like Xerces, MSXML, XSV, etc

Custom Markup

The basic dilemma of xml is if you make up element names, how will anyone know what to do with your xml file? Essentially each xml document is part of a custom markup language, meaning someone made it up. For custom markup to be useful, everyone should utilize the same elements/attributes and that structure should be published. In technical terms, it means a schema is defined that describes the structure.

Validation

Data modeling involves defining a schema which are specifications that describe:

Usually the specifications (schema) is made up by someone so that any xml document that conforms to it is part of that custom markup language. So:

Specifications for a valid xml document can be in 2 types of files that do same thing and usually do not use both at same time:

XSD Schema is the suggested newer approach which is more flexible because it uses xml markup (XSD file is well formed xml document) and handles more data types

Dtd's are the original method and nice to know mainly because many existing documents use them. But DTD's are rather obsolete since they use a  non-xml language and not as useful as a schema. Use a XSD schema if given a choice.

DTD & Schema Purpose & Design

How to Create DTD & XSD Schema

DTD and XSD schemas details will be later; for now realize you can create these

Schemas are complicated so using software with schema capability saves lots of time (over Notepad). The important thing is understanding the jargon and structure, not the code details which the software generates

Validation and Parsers

Parsers are software to check that an XML document is well formed and may check for validation. Parser details:

Validating parsers can work with DTD's and/or schemas. You can run your xml document against a DTD or XSD schema to see if your xml is valid. If not most validators will give line numbers and messages what your error is.

However, validators differ in reliability and how well they support the W3C standards. Differences are usually for schemas more so than DTDs. Sometimes the one schema validator will not give an error while another one will, or vice-versa. That is because there are many schema options (like <unique>) and not every tool has every option implemented. To be safe, probably best to try 2 different ones. Many parsers are built on Apache's Xerces-J; over time they all should be suitable.  For more info:

There are many validation tools. recall that validation is done with software. So how do you get software. Basically there are program codes available in various languages (Java, VB, Javascript) like a library of validation and you can either

in any event, the validation code and programs may be specific to a type of validation, for example some are only for DTD...so pick what you use carefully.

A  list of schema tools is at

Online schema validation

To check an xml document against an XSD schema there are many online validators. Realize that schemas may be local (on your computer) or published on some web site so validators typically let you browser for local files or use a url. Some validators include:

In 1st two above can do:

  1. validate XML document against Schema file
  2. check xsd alone (for valid syntax)
  3. check either xml alone (well-formed)
  4. Allows browsing for local files. Can validate local files .xml against .xsd but if so can do various ways

Online validation for xml syntax and/or DTD:

To check an xml document for well formed or against DTD there are various choices including online validators or programs you can run on your own machine. Realize that most of these require DTD either be internal or published on some web site so you cannot use a local xml file with a local external DTD

The list below is not for XSD. Some DTD validators include: