HTML Overview

Objective: Learn the concepts of HTML including how it used, why it is important, and some of the key terms and concepts.

Learn:

Book: Project 1

Web Readings:

Movies

To simulate in-class lectures

Name MB Description
VuSource 2 View Html in browser
Xml 2 Xml example

Notes

This week is different from the rest of the course in that there is no coding or hands-on. Rather it is mostly reading about concepts and terms. Indeed, many of these concepts and terms are what we cover in detail the rest of the semester so you may not understand all of these fully just by reading and instead will understand better once we cover the hands-on part.

Overview

A web site contains multiple, related web pages and supporting files like images. Most pages are HyperText Mark-up Language (HTML) files which are normal text files with many markup <tags>. The html (or source code) gets interpreted by a browser. In Internet Explorer (IE) browser, the source code can be viewed using menu View | Source. A simple outline of how the web works & where html fits in is:

Actually web sites are just a collection of files that can be accessed over the internet. The files can be of any type but what happens on the user end depends on the file types which can be categorized as

  1. Client files the browser can decode and display, so the server simply transfers these files to the user client with no processing, and the types may be
  2. Server programs or scripts: these files cannot be decoded by the browser and instead the server processes the files (runs the programs) which generate a result (like html) that the browser can decode. Server programs include ASP, PHP, JSP which are proprietary languages like ASP or PHP which only work on certain platforms. Even with server programs, one must know html since you must program the server page to generate html cause again the browser only wants html and not the server program.
  3. any other type of file which means the user needs a special program to decode/display them with popular types being Flash SWF and Acrobat PDF

Html (client) type pages are the most popular because:

However, html is limited especially which is why other programs are used in some case for things like

Other file types, like Word, Excel, etc., are rarely used on the web since they require programs that are not readily available for free.

In summary, markup languages (html, xhtml, xml) are the most popular files on the web since they are the only files the browser understands and are easy, free, and do not require any special software or server platform. Html is especially prevalent for informational, static type web pages that do not need to get information from a database. Server programs are widely used when the page is data driven (i.e., comes from a database like what items are on backorder) and are often used by larger companies or anyone who is selling something. In either case (html client page or server page) one must be able to generate html code since that is all the browser can understand. Other files, mainly flash and PDF, are used for special cases and usually not for complete web sites. The point is almost every web site uses html even if some of the site uses other files like server pages, Flash, etc

HTML structure

The book and Prof Yoxheimer notes at the end of this page give a brief overview of html structure, but I prefer to go into more details next week. For now  I will just mention a few things:

You can view the html code for a web page in a browser by using menu View | Source (in IE). Sometimes the code will be very hard to follow especially if JavaScript is mixed in

Many people use web editors to generate html. So why learn html? The reasons vary:

Html and Xhtml are basically the same structure and purpose in that

XML on the other hand is quite different. XML has the same structure as html in that it uses elements and content, but the purpose is quite different. XML elements are not pre-defined in the browser so the browser has no idea by default how to display xml elements. Instead you make up your own elements and xml is often used for data exchange instead of formatting web pages.

below are notes from Prof Yoxheimer

Important Definitions and Terms:

The Internet is a collection of computer networks. A computer network is a collection of computers of various types and other devices that can share data across a communications channel. In one respect a computer network is a communications channel where devices can share data and programs (An exe file is really data, in Binary, that a computer processor can utilize as instructions on what to do). A network in generally is a collection of people and resources devoted to accomplish a simple or complex task or set of tasks. A computer network simply carries this concept into the electronic realm.

The Internet is vast. At present there are roughly 1,500,000,000 computers and other devices connected to the internet at any one time. With devices, including computers, going on and off line all the time. Content on the Internet is changing all the time. And the functions performed by various devices on the Internet is changing all the time. One truly major aspect of the Internet is the push towards wireless technology all the time. Now you can virtually be connected to the internet anywhere at any time. The shear amount of data that can be shared is increasing all the time.

When I was a kid the holy grail of technology was the video phone. When we had the video phone we would really be an advanced society. Never did we realize you would be able to carry a phone with you where-ever you wanted. Now not only can we share voice and pictures but data in just about any other format you want instantly or just about instantly anyway. And never did we comprehend being able to do this to any country in the world as easily as we do now. Also computers were not common and a good computer had roughly 4000 bytes of RAM and a tape unit and maybe a punched card reader. Now computers are very common, in fact we don't know how to get rid of them when their no-longer of use and a good PC today has 4.000.000.000 bytes of RAM. RAM by the way is working memory or a scratchpad for a CPU. While I'm on my horse; another aspect of computers which is extremely important is the rate at which data is communicated. Data communications today enjoys the same rate of increase as data storage, RAM and cost.

Another factor of the strength of the Internet as with any technology is cost. For a technology to be truly powerful it must be cheap enough for everyone to afford. In 1976 a meg of RAM for an IBM mainframe was about $1.000.000.00. Now a meg of ram is about 75 cents retail. At this point no-one but the manufacturers can complain about the cost of technology.

It is also important to remember, not just for this class, but for all time. Is that the Internet has so impacted the way we socialize and do business that the effects are incalculable. For example you can go to college and never see a class room. The amount of information overload is great. We are never out of touch with each other. True manual labor is almost a thing of the past, at least in this country. Its getting to the point that we interact more on-line than in person.

The WWW is just one service provided over the Internet. HTML is the language of the WWW not the Internet. I point this out because the Internet is a platform for many types of services not just the World Wide Web. Now on the flip side we see the WWW more than anything else on the Internet.

HTML stand of course for HyperText Markup Language, which is really just a markup language. So what's a markup language??? well think of MS Word; you type in text and insert images and so forth. These are the elements data in a word document. You can apply display characteristics to the various data in a word document such as bolding. In a word document, you can select any sentence and select the bold button on the formatting toolbar and your sentence appears highlighted or stronger than the other sentences so it stands out. In the actual doc file you are creating, MS Word puts in some markers around the sentence, in binary form, that tells it (the word program) to bold or emphasize that sentence. In other words it puts in formatting marks inside the doc file. HTML is a set of plain text markers (also called tags and elements) to tell a web browser how to display or render various parts or data in an HTML document. So in other words you markup up your data with special tags or elements to tell the web browser how to display that data. Or to tell the web browser to include an image.

Just a note on HTML tags and/or elements. HTML tags or HTML commands, as they can also be thought of, also include attributes, which you will see as you learn HTML. HTML attributes are usually
attributename = some_value pairs
note the format, that supply additional information for the HTML tag/command to do its job. An HTML tag can have any number of attributes.

I should also mentions not all browsers work the same. For example IE and firefox will render the same web page differently. They may also have tags/elements unique to that particular browser. Rule is just because a page renders the way you want it to in one browser does not mean it will render that way in a different browser.

I will also mention that not only do HTML documents contain markup (In the form of HTML tags) and data, but they also can contain program code (usually in the form of JavaScript) to initiate behavior and respond to the viewers actions on the document. For example a user may click on an image to initiate a news feed or get further information.

The development of the World Wide Web:

A worldwide network of computer networks has been around in one form or another since 1961 in one form or another.

By the mid-1970s, many government agencies, research facilities, and universities were on this network of networks (called the ARPAnet), but each was running on its own internal network developed by different vendors and used different protocols altogether. For example, the Army's system was built by DEC, the Air Force's by IBM, and the Navy's by Unisys. All were capable networks, but all spoke different languages. What was clearly needed to make things work smoothly was a set of networking protocols that would tie together disparate networks and enable them to communicate with each other.

The Department of Defense decided the TCP/IP suite of networking protocols would be the standard for all military computer networking. TCP/IP has been ported to most computer systems, including personal computers, and has become the new standard in internetworking. It is the TCP/IP protocol set that provides the infrastructure for the Internet today.

TCP/IP comprises over 100 different protocols. It includes services for remote logon, file transfers, and data indexing and retrieval, among others. The most common protocol in use on the TCP/IP suite is HTTP or Hypertext Transfer Protocol, which is the protocol that is used to transfer HTML pages or simply web pages.

Web Servers and Web Browsers also called Web Clients:

To understand what a server is and a client are: A server is a computer or other device on a network that provides services to consumers or clients. Such services include:

When you enter a URL into the address bar or your web browser that URL must be sent to a DNS server to be translated into an IP address. This IP address is used to address a service request to the computer or device on the Internet that will provide the service (or web page). Once the browser has the IP address, it creates a request for the resource required. A resource may be a web page, a file, a piece of music, an email, whatever. The request is received by the server machine and it essentially either provides the service or sends back a denial message.

I recommend the Student understand this process.

Another term I'll introduce at this time is the idea of a port. A port is simply a number ranging from 0 to 65535. On a computer connected to the Internet, the computer is identified by its IP address. Or actually to be more specific the network card connecting the computer to the Internet is identified by an IP address. A computer can of course have many applications installed on the computer. Each application can have an address or ID number as well. This way when a message does arrive on the destination computer (Identified by IP address). The message or data arriving on the computer can be sent to the application that can make sense of the data, addressed by port number. So the port number uniquely identifies an application in a computer.

HTML The Language of the Web:

HTML is simply a markup language or formatting language that directs the web browser on how to format or render the data in the HTML document or web page as they are called. Markup is simply commands included in the file or document along with the data. These commands are used by the application reading and processing the data to give it direction on how to process the data. An example of this would be a MS word document. In a word document is the data you type in or insert. Along with the data are bits to tell word how to display that data, hence the formatting commands, you specify from the toolbars of MS Word.

HTML is simply a human readable formatting language to direct a browser on how to render or display the data/content of a web page on the browser window. These commands are called HTML tags or HTML elements.

There are also HTML tags or elements that direct the web browser to include certain data such as images or sounds.

Tools for Creating HTML documents:

There are any number of tools for creating HTML documents. Some are very sophisticated others are not. There is a rule that says: the best tool is the one you know and are comfortable with. I strongly believe in this rule.

For this course I want you to use MS Notepad to start out with. I believe this is the best tool to allow you to understand HTML, because it does not provide handy services for you. If you are using Notepad you must understand the HTML to get the document formatted and working/looking correct. HTML is very easy to learn, but you can't learn it if you don't write it yourself.

It is also important you learn to write it yourself; because no matter how good your HTML editor/converter is you will always need to tweak it yourself with a simple text editor.

After the course progresses I will allow you to use more sophisticated tools.

I do however at this time recommend that if you haven't looked at other HTML tools you starting look at tools like:

I also recommend you look at other browsers such as firefox and Maya from the W3C themselves.

To see some examples of an HTML document download any web page and from the IE menu select View/Source to see the raw HTML.

Marking Elements with Tags:

Notice that the tags or elements begin with a less-than symbol and end with a greater than symbol. Also note that an opening tag must be closed with an ending tag, which is the same as the opening tag with a forward slash after the less-than symbol. See the examples in the book.

HTML is not case sensitive, however I recommend when you create an opening tag the closing tag be in exactly the same case.

XHTML is case sensitive. So look to the XHTML  standard found in at
http://www.w3.org/TR/xhtml1 

Empty elements are HTML tags that are not meant to contain data. So instead of have an opening and closing tag with is the rule, there is a short-hand notation. Simply include the element with the less-than as the first character of the element and include a forward-slash followed by a greater-than symbol at the end of the element name or after the attribute list if there is one.

White Space and HTML:

White space are characters in a document that don't display in the document. Such characters include the space and the tab key. The New-line or Enter key is also considered white space.

When an HTML document is processed or read by a web browser white space is simply ignored, so use white space as you see fit to make your HTML documents more readable. For example I put the opening and closing Head and Body tags on their own lines to make them easy to spot.

Element Attributes:

Many HTML tags have attributes that modify the behavior of an HTML element. For example the body tag has an attribute that lets you set the back-ground color of your HTML document. Attributes take on the form of attribute-name = value.

An HTML tag can contain many attributes.

The Structure of an HTML document:

A computer is a device for manipulating bits and bytes. All files in a computer are composed of and are a sequence of bits. A bit is a 1 or a 0, more specifically a bit is an indicator is a circuit is on or off. If a circuit is on, on a computer, the bit it represents is a 1. If a circuit is off the bit it represents is said to be 0. Files are nothing more than a stream of bits. The term byte is simply a group of eight bits.

In order for a computer application to work, the application must be able to translate the bits in a file into data and information of a higher level. The data or information can be things like numbers, letters, graphical data, music, what have you. The important point here is that bits are used to encode different, and any type of, information such as much, word documents, photographical or video. These bits or binary data can then be manipulated by a computer.

An HTML document is a plain text document. this means that the bits that make up an HTML file are decoded using the ASCII table. The ASCII table is a table with 256 entries. Remember that a byte is eight bits. A bit is a number which is either 0 or 1. There are 256 bit patterns in the range of 00000000 to 11111111. Each bit pattern in the ASCII table represents a number, letter (upper and lower case), punctuation mark, or special symbol used in the English language. So to decode an ASCII or plain text file, you simply take eight bits, find that pattern in the ASCII table, and use that pattern's corresponding letter or symbol.