The complete history of Html programming language

A Brief History of HTML

1993 - Present

The first version of HTML was written by Tim Berners-Lee in 1993. Since then, there have been many different versions of HTML. The most widely used version throughout the 2000s was HTML 4.01, which became an official standard in December 1999.

Another version, XHTML, was a rewrite of HTML as an XML language. XML is a standard markup language that is used to create other markup languages. Hundreds of XML languages are in use today, including GML (Geography Markup Language), MathML, MusicML, and RSS (Really Simple Syndication). Since each of these languages was written in a common language (XML), their content can easily be shared across applications. This makes XML potentially very powerful, and it's no surprise that the W3C would create an XML version of HTML (again, called XHTML). XHTML became an official standard in 2000 and was updated in 2002. XHTML is very similar to HTML but has stricter rules. Strict rules are necessary for all XML languages because, without them, interoperability between applications would be impossible. You'll learn more about the differences between HTML and XHTML in this article.

Most pages on the Web today were built using either HTML 4.01 or XHTML 1.0. However, in recent years, the W3C (in collaboration with another organization, the WHATWG), has been working on a brand new version of HTML, HTML5. Currently (2011), HTML5 is still a draft specification and is not yet an official standard. However, it is already widely supported by browsers and other web-enabled devices and is the way of the future. Therefore, HTML5 is the primary language of web development.

HTML is an evolving language. It doesn’t stay the same for long before a revised set of standards and specifications are brought in to allow the easier creation of prettier and more efficient sites. Let’s start at the beginning...

HTML 1.O

HTML 1.0 was the first release of HTML to the world. Not many people were involved in website creation at the time, and the language was very limiting. There really wasn’t much you could do with it bar getting some simple text onto the web. But then, just that got the beardos a-foamin’ back in the day.

Things have changed quite a lot since the creation of the web, but the remnants of its history are still online.

Today, everyone wants and loves HTML5, but there was a time when things were quite different. We are talking about the old internet, as early as the first half of the 1990s, before the dot-com bubble, where web pages were a niche for researchers and enthusiasts with long beards managing servers with Unix-like operating systems in dark rooms. HTML stands for Hypertext Markup Language.

The Origins

When HTTP was finally seeing the light, there was the need to establish a common ground for the creation of hypertext documents, that allowed linkage from one document to the other in which would become the dawn of a new era in the access to information and communications. Nowadays, the institutions that manage the HTML standards are the World Wide Web Consortium (W3C) and the Web Hypertext Application Technology Working Group (WHATWG). The first is a formal non-profit organization, while the second is more of a community of experts interested in the development of the technology. The objective of establishing these open standards is that all computers were able to understand the common components of the web with the goal of interoperability and independence of the machine, operating system, and implementation. The first standard in which they got involved was HTML 2.0. Back in the time, it was more of a one-team work, lead by Tim Berners-Lee at the European Organization for Nuclear Research Laboratory (CERN), to share structured information through communication networks.

There was some previous work, called Standard Generalized Markup Language (SGML), standardized by the International Organization for Standardization (ISO). The designers decided to use the SGML framework to design HTML, so the early versions of the language could be called SGML-based languages. As of today, HTML5 is not an SGML-based language anymore but some ideas are still very present, for example, the concept of tags. On the other hand, SGML is so general that the use of the "less than" and "more than" symbols for tag delimitation can be configured on an SGML-compliant document but was fixed for HTML documents.

Elements

Some of the tags that we are more than used to were present on the very limited first version of HTML. For example, <TITLE>, the headings <H1> ... <H6> or the paragraph tag <P> and some others have been deprecated, like <LISTING>, which is obsolete since HTML 3.2. Back then, all the tags were written in capital letters, at least in the documentation and the examples. The first page that was created for the world wide web is still accessible, with the original markup, which still looks as it is supposed to look in modern browsers.

Browser

At the same time that HTML was being developed, the need to have a browser arose, and so was addressed by the same team. They created what was called "WorldWideWeb", which happened to be the first browser in history, and the first web page editor as well, for the NeXT computer. The initial release happened on December 25, 1990. As can be easily imagined, it was the only browser in existence. It was later renamed "Nexus", to avoid confusion with the whole concept of the WWW, and the source code was released onto the public domain on April 30, 1993. After the release, some web browsers appeared for different families of computers and operating systems, which helped to amplify the impact of the WWW. Due to technical difficulties, many implementations of new browsers dropped the editor feature, shaping what we understand now as a standard browser.

The Aftermath

We all know the aftermath. As of June 2018, it is estimated that there are more than 4.5 billion websites and between 40 and 50 billion web pages indexed by Google, making the WWW one of the technologies with the biggest impact in human history. It is to be noted that while the Internet was developed in the USA, the Web was developed in Europe, making it as well, one of the biggest European contributions to mankind in modern history. Tim Berners-Lee, a British national, has been granted the most prestigious awards, including the Turing Prize, which could be considered the equivalent of the Nobel Prize for Computer Science and is given by the Association for Computing Machinery (ACM), the Millenium Technology Prize, which is awarded by Technology Academy Finland for innovations that lead to a better life and an induction into the Internet Hall of Fame, which is managed by the Internet Society.

It has been a long road towards browsing web pages on our smartphones, but it has been run through very fast. The future looks bright.

Image: HTML 1.0

HTML 2.0

HTML 2.0 included everything from the original 1.0 specifications but added a few new features to the mix. » HTML 2.0 was the standard for website design until January 1997 and defined many-core HTML features for the first time.

Want to see a demo page? Which build by HTML 2.0

Click HTML 2.0

March 1993: Lou Montulli releases the Lynx browser version 2.0a

Lou Montulli was one of the first people to write a text-based browser, Lynx. The Lynx browser was a text-based browser for terminals and for computers that used DOS without Windows. Lou Montulli was later recruited to work with Netscape Communications Corp. but nonetheless remained partially loyal to the idea of developing HTML as an open standard, proving a real asset to the HTML working group and the HTML Editorial Board in years to come. Lou's enthusiasm for good, expensive wine and his knowledge of excellent restaurants in the Silicon Valley area were to make the standardization of HTML a much more pleasurable process.

Early 1993: Dave Raggett begins to write his own browser

While Eric Bina and the NCSA Mosaic gang were hard at it hacking through the night, Dave Raggett of Hewlett-Packard Labs in Bristol was working part-time on his Arena browser, on which he hoped to demonstrate all sorts of newly invented features for HTML.

April 1993: The Mosaic browser is released

In April 1993, version 1 of the Mosaic browser was released for Sun Microsystems Inc.'s workstation, a computer used in software development running the UNIX operating system. Mosaic extended the features specified by Tim Berners-Lee; for example, it added images, nested lists, and fill-out forms. Academics and software engineers later would argue that many of these extensions were very ad hoc and not properly designed.

Late 1993: Large companies underestimate the importance of the Web

Dave Raggett's work on the Arena browser was slow because he had to develop much of it single-handedly: no money was available to pay for a team of developers. This was because Hewlett-Packard, in common with many other large computer companies, was quite unconvinced that the Internet would be a success; indeed, the need for a global hypertext system simply passed them by. For many large corporations, the question of whether or not any money could be made from the Web was unclear from the outset.

There was also a misconception that the Internet was mostly for academics. In some companies, senior management was assured that the telephone companies would provide the technology for global communications of this sort, anyway. The result was that individuals working in research labs in the commercial sector were unable to devote much time to Web development. This was a bitter disappointment to some researchers, who gratefully would have committed nearly every waking moment toward shaping what they envisioned would be the communications system of the future.

Dave Raggett, realizing that there were not enough working hours left for him to succeed at what he felt was an immensely important task, continued writing his browser at home. There he would sit at a large computer that occupied a fair portion of the dining room table, sharing its slightly sticky surface with paper, crayons, Lego bricks, and bits of half-eaten cookies left by the children. Dave also used the browser to show text flow around images, forms, and other aspects of HTML at the First WWW Conference in Geneva in 1994. The Arena browser was later used for development work at CERN.

May 1994: NCSA assigns commercial rights for Mosaic browser to Spyglass, Inc.

In May 1994, Spyglass, Inc. signed a multi-million dollar licensing agreement with NCSA to distribute a commercially enhanced version of Mosaic. In August of that same year, the University of Illinois at Champaign-Urbana, the home of NCSA, assigned all future commercial rights for NCSA Mosaic to Spyglass.

May 1994: The first World Wide Web conference is held in Geneva, with HTML+ on show

Although Marc Andreessen and Jim Clark had commercial interests in mind, the rest of the World Wide Web community had quite a different attitude: they saw themselves as joint creators of a wonderful new technology, which certainly would benefit the world. They were jiggling with excitement. Even quiet and retiring academics became animated in the discussion, and many seemed evangelical about their newfound god of the Web.

At the first World Wide Web conference organized by CERN in May 1994, all was merry with 380 attendees - who mostly were from Europe but also included many from the United States. You might have thought that Marc Andreessen, Jim Clark, and Eric Bina surely would be there, but they were not. For the most part, participants were from the academic community, from institutions such as the World Meteorological Organization, the International Center for Theoretical Physics, the University of Iceland, and so on. Later conferences had much more of a commercial feel, but this one was for technology enthusiasts who instinctively knew that this was the start of something big.

At the World Wide Web conference in Geneva. Left to right: Joseph Hardin from NCSA, Robert Cailliau from CERN, Tim Berners-Lee from CERN, and Dan Connolly (of HTML 2 fame) then working for Hal software.

During the course of that week, awards were presented for notable achievements on the Web; these awards were given to Marc Andreessen, Lou Montulli, Eric Bina, Rob Hartill, and Kevin Hughes. Dan Connolly, who proceeded to define HTML 2, gave a slide presentation entitled Interoperability: Why Everyone Wins, which explained why it was important that the Web operated with a proper HTML specification. Strange to think that at least three of the people who received awards at the conference were later to fly in the face of Dan's idea that adopting a cross-company uniform standard for HTML was essential.

Dave Raggett had been working on some new HTML ideas, which he called HTML+. At the conference, it was agreed that the work on HTML+ should be carried forward to lead to the development of an HTML 3 standard. Dave Raggett, together with CERN, developed Arena further as a proof-of-concept browser for this work. Using Arena, Dave Raggett, Henrik Frystyk Nielsen, Håkon Lie, and others demonstrated text flow around a figure with captions, resizable tables, image backgrounds, math, and other features.

A panel discussion at the Geneva conference. Kevin Altis from Intel, Dave Raggett from HP Labs, Rick `Channing' Rodgers from the National Library of Medicine.

The conference ended with a glorious evening cruise on board a paddle steamer around Lake Geneva with Wolfgang and the Werewolves providing Jazz accompaniment.

September 1994: The Internet Engineering Task Force (IETF) sets up an HTML working group

In early 1994, an Internet Engineering Task Force working group was set up to deal with HTML.

The Internet Engineering Task Force is the international standards and development body of the Internet and is a large, open community of network designers, operators, vendors, and researchers concerned with the evolution and smooth operation of Internet architecture. The technical work of the IETF is done in working groups, which are organized by topic into several areas; for example, security, network routing, and applications. The IETF is, in general, part of a culture that sees the Internet as belonging to The People. This was even more so in the early days of the Web.

The feelings of the good `ole days of early Web development are captured in the song, The Net Flag, which can be found `somewhere on the Internet. The first verse runs as follows:

The people's web is deepest red,
And oft it's killed our routers dead.
But ere the bugs grew ten days old,
The patches fixed the broken code.
Chorus:
So raise the open standard high
Within its codes we'll live or die
Though cowards flinch and Bill Gates sneers
We'll keep the net flag flying here.

In keeping with normal IETF practices, the HTML working group was open to anyone in the engineering community: any interested computer scientist could potentially become a member and, once on its mailing list, could take part in an email debate. The HTML working group met approximately three times a year, during which time they would enjoy a good haggle about HTML features present and future, be pleasantly suffused with coffee and beer, striding about plush hotel lobbies sporting ponytails, T-shirts, and jeans without the slightest care.

July 1994: HTML specification for HTML 2 is released

During 1993 and early 1994, lots of browsers had added their own bits to HTML; the language was becoming ill-defined. To make sense of the chaos, Dan Connolly and colleagues collected all the HTML tags that were widely used and collated them into a draft document that defined the breadth of what Tim Berners-Lee called HTML 2. The draft was then circulated through the Internet community for comment. With the patience of a saint, Dan took into account numerous suggestions from HTML enthusiasts far and wide, ensuring that all would be happy with the eventual HTML 2 definition. He also wrote a Document Type Definition for HTML 2, a kind of mathematically precise description of the language.

November 1994: Netscape is formed

During 1993, Marc Andreessen apparently felt increasingly irritated at simply being on the Mosaic project rather than in charge of it. Upon graduating, he decided to leave NCSA and head for California where he met Jim Clark, who was already well known in Silicon Valley and who had money to invest. Together they formed Mosaic Communications, which then became Netscape Communications Corp. in November 1994. What they planned to do was create and market their very own browser.

The browser they designed was immensely successful - so much so in fact, that for some time to come, many users would mistakenly think that Netscape invented the Web. Netscape did its best to make sure that even those who were relying on a low-bandwidth connection - that is, even those who only had a modem-link from a home personal computer - were able to access the Web effectively. This was great to the company's credit.

Following a predictable path, Netscape began inventing its own HTML tags as it pleased without first openly discussing them with the Web community. Netscape rarely made an appearance at the big International WWW conferences, but it seemed to be driving the HTML standard. It was a curious situation and one that the inner core of the HTML community felt they must redress.

Late 1994: The World Wide Web Consortium forms

The World Wide Web Consortium was formed in late 1994 to fulfill the potential of the Web through the development of open standards. They had a strong interest in HTML. Just as an orchestra insists on the best musicians, so the consortium recruited many of the best-known names in the Web community. Headed up by Tim Berners-Lee, here are just some of the players in the band today (1997):

Members of the World Wide Web Consortium at the MIT site. From left to right are Henrick Frystyk Neilsen, Anselm Baird-Smith, Jay Sekora, Rohit Khare, Dan Connolly, Jim Gettys, Tim Berners-Lee, Susan Hardy, Jim Miller, Dave Raggett, Tom Greene, Arthur Secret, Karen MacArthur.

Dave Raggett on HTML; from the United Kingdom.
Arnaud le Hors on HTML; from France.
Dan Connolly on HTML; from the United States.
Henrik Frystyk Nielsen on HTTP and on enabling the Web to go faster; from Denmark.
Håkon Lie on style sheets; from Norway. He is located in France, working at INRIA.
Bert Bos on style sheets and layout; from the Netherlands.
Jim Miller on investigating technologies that could be used in rating the content of Web pages; from the United States.
Chris Lilley on style sheets and font support; from the United Kingdom.

The W3 Consortium is based in part at the Laboratory of Computer Science at Massachusetts Institute of Technology in Cambridge, Massachusetts, in the United States; and in part at INRIA, the Institut National de Recherche en Informatique et en Automatique, a French governmental research institute. The W3 Consortium is also located in part at Keio University in Japan. You can look at the Consortium's Web pages on `www.w3.org'.

The consortium is sponsored by some companies that directly benefit from its work on standards and other technology for the Web. The member companies include Digital Equipment Corp.; Hewlett-Packard Co.; IBM Corp.; Microsoft Corp.; Netscape Communications Corp.; and Sun Microsystems Inc., among many others.

Through 1995: HTML is extended with many new tags

In 1995, all kinds of new HTML tags emerged. Some, like the BGCOLOR attribute of the BODY element and FONT FACE, which control stylistic aspects of a document, found themselves in the black books of the academic engineering community. `You're not supposed to be able to do things like that in HTML,' they would protest. It was their belief that such things as text color, background texture, font size, and font face were definitely outside the scope of a language when their only intent was to specify how a document would be organized.

HTML 3.0

March 1995: HTML 3 is published as an Internet-Draft

Dave Raggett had been working for some time on his new ideas for HTML, and at last, he formalized them in a document published as an Internet-Draft in March 1995. All manner of HTML features were covered. A new tag for inserting images called FIG was introduced, which Dave hoped would supersede IMG, as well as a whole gambit of features for marking up math and scientific documents. Dave dealt with HTML tables and tabs, footnotes, and forms. He also added support for style sheets by including a STYLE tag and a CLASS attribute. The latter was to be available on every element to encourage authors to give HTML elements styles, much as you do in desktop publishing.

Although the HTML 3 draft was very well received, it was somewhat difficult to get it ratified by the IETF. The belief was that the draft was too large and too full of new proposals. To get consensus on a draft 150 pages long and about which everyone wanted to voice an opinion was optimistic - to say the least. In the end, Dave and the inner circle of the HTML community decided to call it a day.

Of course, browser writers were very keen on supporting HTML 3 - in theory. Inevitably, each browser writer chose to implement a different subset of HTML 3's features as they were so inclined, and then proudly proclaimed to support the standard. The confusion was mind-boggling, especially as browsers even came out with extensions to HTML 3, implying to the ordinary gent that normal HTML 3 was, of course, already supported. Was there an official HTML 3 standard or not? The truth was that there was not, but reading the computer press you might never have known the difference.

March 1995: A furor over the HTML Tables specification

Dave Raggett's HTML 3 draft had tackled the tabular organization of information in HTML. Arguments over this aspect of the language had continued for some time, but now it was time to really get going. At the 32nd meeting of the IETF in Danvers, Massachusetts, Dave found a group from the SGML brethren who were up in arms over part of the tables specification because it contradicted the CALS table model. Groups such as the US Navy use the CALS table model in complex documentation. After a long negotiation, Dave managed to placate the CALS table delegates and altered the draft to suit their needs. HTML tables, which were not in HTML originally, finally surfaced from the HTML 3 draft to appear in HTML 3.2. They continue to be used extensively to provide a layout grid for organizing pictures and text on the screen.

August 1995: Microsoft's Internet Explorer browser comes out

Version 1.0 of Microsoft Corp.'s Internet Explorer browser was announced. This browser was eventually to compete with Netscape's browser, and to evolve its own HTML features. To a certain extent, Microsoft built its business on the Web by extending HTML features. The ActiveX feature made Microsoft's browser unique, and Netscape developed a plug-in called Ncompass to handle ActiveX. This whole idea whereby one browser experiments with an extension to HTML only to find others adding support to keep even continues to the present.

In November 1995, Microsoft's Internet Explorer version 2.0 arrived for its Windows NT and Windows 95 operating systems.

September 1995: Netscape submits a proposal for frames

By this time, Netscape submitted a proposal for frames, which involved the screen is divided into independent, scrollable areas. The proposal was implemented on Netscape's Navigator browser before anyone really had time to comment on it, but nobody was surprised.

November 1995: The HTML working group runs into problems

The HTML working group was an excellent idea in theory, but in practice, things did not go quite as expected. With the immense popularity of the Web, the HTML working group grew larger and larger, and the volume of associated emails soared exponentially. Imagine one hundred people trying to design a house. `I want the windows to be double-glazed,' says one. `Yes, but shouldn't we make them smaller, while we're at it,' questions another. Still, others chime in: `What material do you propose for the frames - I'm not having them in plastic, that's for sure'; `I suggest that we don't have windows, as such, but include small, circular port-holes on the Southern elevation...' and so on.

You get the idea. The HTML working group emailed each other in a frenzy of electronic activity. In the end, its members became so snowed under with email that no time was left for programming. For software engineers, this was a sorry state of affairs, indeed: `I came back after just three days away to find over 2000 messages waiting,' was the unhappy lament of the HTML enthusiast.

Anyway, the HTML working group still was losing ground to the browser vendors. The group was notably slow in coming to a consensus on a given HTML feature, and commercial organizations were hardly going to sit around having tea, pleasantly conversing on the weather whilst waiting for the results of debates. And they did not.

November 1995: Vendors unite to form a new group dedicated to developing an HTML standard

In November 1995 Dave Raggett called together representatives of the browser companies and suggested they meet as a small group dedicated to standardizing HTML. Imagine his surprise when it worked! Lou Montulli from Netscape, Charlie Kindel from Microsoft, Eric Sink from Spyglass, Wayne Gramlich from Sun Microsystems, Dave Raggett, Tim Berners-Lee, and Dan Connolly from the W3 Consortium, and Jonathan Hirschman from Pathfinder convened near Chicago and made quick and effective decisions about HTML.

November 1995: Style sheets for HTML documents begin to take shape

Bert Bos, Håkon Lie, Dave Raggett, Chris Lilley, and others from the World Wide Web Consortium and others met in Versailles near Paris to discuss the deployment of Cascading Style Sheets. The name Cascading Style Sheets implies that more than one style sheet can interact to produce the final look of the document. Using a special language, the CSS group advocated that everyone would soon be able to write simple styles for HTML, as one would do in Microsoft Word and other desktop publishing software packages. The SGML contingent, who preferred a Lisp-like language called DSSSL - it rhymes with a whistle - seemed out of the race when Microsoft promised to implement CSS on its Internet Explorer browser.

November 1995: Internationalization of HTML Internet Draft

Gavin Nicol, Gavin Adams, and others presented a long paper on the internationalization of the Web. Their idea was to extend the capabilities of HTML 2, primarily by removing the restriction on the character set used. This would mean that HTML could be used to mark up languages other than those that use the Latin-1 character set to include a wider variety of alphabets and character sets, such as those that read from right to left.

December 1995: The HTML working group is dismantled

Since the IETF HTML working group was having difficulties coming to a consensus swiftly enough to cope with such a fast-evolving standard, it was eventually dismantled.

February 1996: The HTML ERB is formed

Following the success of the November 1995 meeting, the World Wide Web Consortium formed the HTML Editorial Review Board to help with the standardization process. This board consisted of representatives from IBM, Microsoft, Netscape, Novell, Softquad, and the W3 Consortium, and did its business via telephone conference and email exchanges, meeting approximately once every three months. Its aim was to collaborate and agree upon a common standard for HTML, thus putting an end to the era when browsers each implemented a different subset of the language. The bad fairy of incompatibility was to be banished from the HTML kingdom forever, or one could hope so, perhaps.

Dan Connolly of the W3 Consortium, also the author of HTML 2, deftly accomplished the feat of chairing what could be quite a raucous meeting of the clans. Dan managed to make sure that all representatives had their say and listened to each other's point of view in an orderly manner. A strong chair was absolutely essential in these meetings.

In preparation for an ERB meeting, specifications describing new aspects of HTML were made electronically available for ERB members to read. Then, at the meeting itself, the proponent explained some of the rationales behind the specification, and then dearly hoped that all who were present also concurred that the encapsulated ideas were sound. Questions such as, `shoulda particular feature be included, or should we kick it out,' would be considered. Each representative would air his point of view. If all went well, the specification might eventually see daylight and become a standard. At the time of writing, the next HTML standard, code-named Cougar, has begun its long journey in this direction.

The BLINK tag was ousted in an HTML ERB meeting. Netscape would only abolish it if Microsoft agreed to get rid of MARQUEE; the deal was struck and both tags disappeared. Both of these extensions have always been considered slightly goofy by all parties. Many tough decisions were to be made about the OBJECT specification. Out of the chaos of several different tags - EMBED, APP, APPLET, DYNSRC, and so on - all associated with embedding different types of information in HTML documents, a single OBJECT tag was chosen in April 1996. This OBJECT tag becomes part of the HTML standard, but not until 1997.

April 1996: The W3 Consortium working draft on Scripting comes out

Based on an initial draft by Charlie Kindel, and, in turn, derived from Netscape's extensions for JavaScript, a W3C working draft on the subject of Scripting was written by Dave Raggett. In one form or another, this draft should eventually become part of standard HTML.

July 1996: Microsoft seems more interested than first imagined in open standards

In April 1996, Microsoft's Internet Explorer became available for Macintosh and Windows 3.1 systems.

Thomas Reardon had been excited by the Web even at the second WWW conference held in Darmstadt, Germany in 1995. One year later, he seemed very interested in the standardization process and apparently wanted Microsoft to do things the right way with the W3C and with the IETF. Traditionally, developers are somewhat disparaging about Microsoft, so this was an interesting turn of events. It should be said that Microsoft did, of course, invent tags of their own, just as did Netscape. These included the remarkable MARQUEE tag that caused great mirth among the more academic HTML community. The MARQUEE tag made text dance about all over the screen - not exactly a feature you would expect from a serious language concerned with structural mark-ups such as paragraphs, headings, and lists.

The worry that a massive introduction of proprietary products would kill the Web continued. Netscape acknowledged that vendors needed to push ahead of the standards process and innovate. They pointed out that, if users like a particular Netscape innovation, then the market would drive it to become a de-facto standard. This seemed quite true at the time and, indeed, Netscape has innovated on top of that standard again. It's precisely this sequence of events that Dave Raggett and the World Wide Web Consortium were trying to avoid.

December 1996: Work on `Cougar' is begun

The HTML ERB became the HTML Working Group and began to work on `Cougar', the next version of HTML with completion late Spring, 1997, eventually to become HTML 4. With all sorts of innovations for the disabled and support for international languages, as well as providing style sheet support, extensions to forms, scripting, and much more, HTML 4 breaks away from the simplicity and charm of HTML of earlier years!

Dave Raggett, co-editor of the HTML 4 specification, is at work composing at the keyboard at his home in Boston.

More and more people were getting into the HTML game around now, and while the previous standards offered some decent abilities to webmasters (as they became known), they thirsted for more abilities and tags. They wanted to enhance the look of their sites.

This is where the trouble started. A company called Netscape was the clear leader in the browser market at the time, with a browser called Netscape Navigator. To appease the cries of the HTML authors, they introduced new proprietary tags and attributes into their Netscape Navigator browser. These new abilities were called Netscape extension tags. This caused big problems as other browsers tried to replicate the effects of these tags so as not to be left behind but could not get their browsers to display things the same way. This meant that if you designed a page with Netscape ETs, the page would look bad in other browsers. This caused confusion and irritation for the markup pioneers.

At this time, an HTML working group, led by a man named » Dave Raggett introduced a new HTML draft, HTML 3.0. It included many new and improved abilities for HTML and promised far more powerful opportunities for webmasters to design their pages. Sadly, the browsers were awfully slow in implementing any of the new improvements, only adding in a few and leaving out the rest. Partly, this failure can be attributed to the size of the overhaul; and so the HTML 3.0 spec was abandoned.

Thankfully, the people in charge noted this and so future improvements were always designed to be modular. This meant they could be added in stages, which makes it easier on the browser companies.

January 1997: HTML 3.2 is ready

Success! In January 1997, the W3 Consortium formally endorsed HTML 3.2 as an HTML cross-industry specification. HTML 3.2 had been reviewed by all member organizations, including major browser vendors such as Netscape and Microsoft. This meant that the specification was now stable and approved by most Web players. By providing a neutral forum, the W3 Consortium had successfully obtained agreement upon a standard version of HTML. There was great rejoicing, indeed. HTML 3.2 took the existing IETF HTML 2 standard and incorporated features from HTML+ and HTML 3. HTML 3.2 included tables, applets, text flow around images, subscripts, and superscripts.

One might well ask why HTML 3.2 was called HTML 3.2 and not, let's say, HTML 3.1 or HTML 3.5. The version number is open to the discussion just as much as is any other aspect of HTML. The version number is often one of the last details to be decided.

Want to see a demo page? Which build by HTML 3.2

Click HTML 3.2

HTML 4.01

HTML 4.01 was a large shake-up to the HTML standards that arrived in April 1998. The HTML language you have learned is constantly evolving to meet the needs of a growing Internet. Things get added, some things get taken away and still more elements are asked to fade out gracefully. These changes ensure that designers have the freedom and power available to create increasingly complex websites and can achieve this efficiently.

It only happens every few years, and the changes are made by the » World Wide Web Consortium (W3C), who is HTML’s governing body, as it were. They convene and design the specifications that we all work with when creating websites (CSS was designed by the W3C too). They look for weaknesses in HTML that are holding the web back and sort them out, which makes creating compelling websites easier for everybody.

The standard we were all working with before this was HTML 3.2. That was used for a while before the W3C decided to step it up another notch a few years ago. They released HTML 4. Sometime later, when some minor errors in the specification were uncovered, they fixed these and called the final specification HTML 4.01. As of now, HTML 4.01 is the accepted standard, and the majority of web users do have browsers that support it fully. Some of the more peripheral new elements have yet to gain full support in the latest round of browsers, but they’re on the way. Modern browsers will generally have no problem with anything in these specs.

Versions

If you have used any software you will have undoubtedly noticed how every few months it advances its number. I used to use Firefox 2 until they improved it and it became Firefox 2.1. Adding a decimal to the version number signifies a minor change to the original. When major changes are made to a software project, they will move up a whole number to version 3. This is the same way most dynamic things work. As you can see, the original HTML 3.0 spec was revised to version 3.2 before the big change to 4, and a minor change to 4.01.

There was some confusion when HTML 4 started being discussed, as at the time version 4 browsers like Internet Explorer 4 were making their appearance and people thought there was some connection. In reality, the two separate things had just reached those versions simultaneously, not because of each other. As you know, browser technology has advanced to version 7 stages and beyond by now, and HTML is still at level 4. So there’s no real connection; though, that said, it was in the version 4 browsers that HTML 4 started being incorporated properly. Glad that’s cleared up.

A few months after HTML 4.0 was released, its documentation was updated to correct some minor problems, and its version number was bumped up slightly. So the final final version of this standard is HTML 4.01.

DOCTYPEs Ahoy

Nowadays the Document Type Declaration (DTD) at the top of your document is very important if you want the browsers to render your page correctly. Without it, browsers might interpret your code more loosely, and you may have display errors. The HTML 4.01 DTDs are below. Take your pick.

Use the strict DTD if you’re using pure, structural code with no hacks:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

The transitional DOCTYPE, below, is the most commonly used, and still permits you to use certain old elements that we will eventually stop using altogether. It is probably the best choice until you’ve gotten to know HTML really well. Once you’re ready, you can start using the stricter DOCTYPE above.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

Finally, for frameset pages, use the frameset DTD:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">

Simply add this line of code to the very top of your HTML pages (before the opening tag), and you’re away. You will also need to specify the character encoding of your page. The best encoding to use is called Unicode and allows you to type almost any character you want (like punctuation, letters with accents, etc.) directly into your content. Add this element in between your page’s <head> and </head>:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

Once you have added your DOCTYPE and encoding, run your page through the HTML validator to see if you’re obeying the rules.

The New Elements In HTML 4

There are 22 elements new to HTML in the 4 specifications, and they cover all the areas, from text-formatting to tables to frames and the rest. Most of these elements have been covered in tutorials elsewhere herein HTML Source, and so where appropriate there are links to these in-depth pages. Most of the text formatting elements will make the text they mark up the look a little different. You can see the effects of these elements in the text formatting list.

<abbr>

This is used to show an ABBReviated version of a word, and to offer the full version. When a reader leaves their mouse on the word, the full version pops up.
The code would be <abbr title="abbreviation">abbr.</abbr>

<acronym>

Similar to the one above, this is used for special abbreviations called acronyms (initialed abbreviations that can be spoken as a word themselves). It works the same way.
<acronym title="North Atlantic Treaty Organisation">NATO</acronym>

<bdo>

Most text is read from left to right, but some languages are the opposite, like Hebrew for example. This new tag allows the browser to display your page correctly if you use one of these languages, and allows you to pull off cool effects like this: cool effects. That’s typed in normally in the source code, but the tag changes its direction. RTL means ‘Right To Left’.
Code: <bdo dir="rtl">crazy text</bdo>

<button>

This is a simple way of adding form buttons to your pages. What’s more, you can now format the text and put images and other elements on the button.
<button><b>click</b> me</button>

<colgroup>

A table tag that allows you to affect the attributes of an entire column with one line of code.

<del>

Wrapping this around some text creates a strike-through effect to signify DELeted text, so you can show readers what once existed without cutting it out altogether.
<del>waffle</del>

<fieldset>

This allows you to group buttons and things together, giving you a framed container to hold them in. It works together with the tag below.

<frame>

Strangely, this has only been made part of the official specifications now, despite it having been in use for years. It has been given new style attributes but overall it’s the same as the tag before.

<frameset>

This tag is in the same boat as its friend above. Nothing’s really new.

<iframe>

Once a proprietary Internet Explorer tag, this was such a smart idea that it has been assimilated into HTML proper.

<ins>

This stands for INSerted text. It works in conjunction with the tag, and the inserted text appears with an underline.
<del>waffle</del><ins>quality literature</ins>

<label>

This allows you to give form elements a label. Clicking the label functions like clicking the element (a radio button, for example) itself. This improves Forms Accessibility.
<label for="choice1">Choice 1</label> <input type="radio" id="choice1">

<legend>

When using a fieldset element, this element must come first before any other content inside the fieldset. It gives the title of the group.
<fieldset><legend>Contact Info</legend> Email:<input type="text"> Address:<input type="text"></fieldset>

<noframes>

Another part of the frames umbrella is being formally added to HTML 4.

<noscript>

The same as above, this is for people who can’t do JavaScript.

<object>

This is set to become the do-it-all tag for inserting multimedia into your page and is supposed to take over from img, ismap, applet, script and any others.
<object data="picture.gif" type="image/gif"></object>

<optgroup>

With this tag, you can group together many elements which are part of a select field, and give the groups titles.

<param>

This tag is used to set PARAMeters for ActiveX, Applets and objects. It existed before HTML 4.01 but now is official code.

<span>

This tag was brought in specifically to work with stylesheets in applying classes and ids. It does nothing on its own, but it is great for applying your styles to text.

<tbody>

A new table tag that allows you to give attributes to a block of cells with this one tag.

<tfoot>

Allows you to add a footer to the part of your tables.

<thead>

This allows you to add a header to the part of a table. It comes before, while comes after in the code.

<q>

If you’ve ever used the blockquote tag, you’ll know it’s a big tag. How many letters are in that, ten?! This is much more like it and is suitable for shorter quotations. Plus, it adds in the quotation marks for you. It will not add in the line breaks you get with blockquote.

The new Attributes

These new attributes are here to allow stylesheet implementation, with two more reflecting the new international concerns that the W3C has taken on board in this new draught. They can all be applied to any element.

class

This is how you give your page elements and text their es from your stylesheet.

dir

This is the attribute that is used mostly with the new tag above. Your possible values are rtl (right-to-left) or the default ltr (left-to-right).

id

ids are just like classes but can be used with JavaScript and DHTML.

lang

This attribute sets off a block of text as text typed in a foreign LANGuage, so that search engines and browsers know, and don’t just take it as badly spelled English. It will not translate anything for you, it’s just some behind-the-scenes help for things other than readers.

You can denote the text using the span tag, like

<span lang="fr">Bonjour!</span>.

If you’re going to use it, have a look at the common language codes.

title

This is one of my favorite things that came with HTML 4.01. It allows you to add in tooltip text, like the attribute; but now you can add it to absolutely anything. You can give table cells titles, add extra information to your links, and even hide jokes in your code that will only appear when a reader is on a specific word or sentence in your text.

Deprecated Elements

A deprecated element is on the way out, but one which has been given a few more months to live before its life fully ends. There are much better elements than these available now, so your usage of them should be downscaled as much as possible.

<applet>: Used to add Java applets. Use the new <object> element instead.
<basefont>: Used to affect text on the whole page. Use stylesheets instead.
<center>: Used to center elements. Use <div align="center"> or stylesheets.
<dir>: Used to make lists. Use s instead. Lists tutorial.
<font>: Ah, the classic font element. Still good for small things, but stylesheets have taken over. This is one element you should really try to avoid using.
<isindex>: Just use the input tag.
<menu>: Another type of list that is redundant thanks to the ul element.
<s>: Creates strike-through effects. Use the stylesheets again, or the new del element.
<strike>: Same as above, use style.
<u>: The underlining element, use stylesheets or ins instead.

Dead Elements

These are the elements that were so useless that they’re out on their asses for good. Never use these, you can’t guarantee the browsers will continue to support them. All three of these elements have been replaced by one new element — so you can see how they were useless.

<listing>, <plaintext>, and <xmp>.
Use pre instead. This creates PREformatted text (text which follows its layout in your code).

XHTML Explained

HTML began as a simple way to transfer data between any computer across the Internet; designed for scientists and researchers with no publishing experience. Over time the web became mainstream entertainment and new tags were brought in by the browser companies that didn't go along with this original aim — presentation became hugely important and structure and compatibility started to take a back seat. This meant that some pages were not accessible for people with the 'wrong' browser or computer setup.

Thankfully, the use of much of the extraneous presentational tags has receded in use in recent times, mainly due to the innovation of CSS code. Ideal HTML would be purely structural, with every element concerning how a page is displayed being controlled by a stylesheet. The W3C (HTML's overseers, whom you should know something about by now) have spearheaded this desire with XHTML.

Further to all that, in recent times the Internet has begun to be accessed through new devices other than the classic computer and web browser arrangement. Things like PDAs, phones, and, er, fridges with Internet access are going to become common soon. There's an estimate going around that sometime soon, 75% of Internet viewing will be carried out on one of these many new platforms. The custom-made browsers used in these systems need to be small for cost-effectiveness. For every markup error that a browser has to deal with, more code has to be added to the program. XHTML is a very, very strict way of coding, which means the system makers don't have to accommodate for bad markup.

What is XHTML?

Before I describe XHTML, it is probably best to understand where it has come from. All web Markup languages are based on SGML, a horrendously complicated language that is not designed for humans to write. SGML is what is called a metalanguage; that is, a language that is used to define other languages. To make its power available to web developers, SGML was used to create XML (eXtensible Markup Language), a simplified version, and also a metalanguage.

XML is a powerful format — you create your own tags and attributes to suit the type of document you're writing. By using a set group of tags and attributes and following the rules of XML, you've created a new Markup language.

This is what has been done to create XHTML (eXtensible HyperText Markup Language) — which is why you'll see XHTML being called a subset of applications of XML. The pre-existing HTML 4.01 tags and attributes were used as the vocabulary of this new Markup language, with XML providing the rules of how they are put together.

So, using XHTML, you are really writing XML code, but restricting yourself to a predetermined set of elements. This gives you all the benefits of XML (see below) while avoiding the complications of true XML; bridging the gap for developers who might not fancy taking on something as tricky as full-on XML. As you're coming under the guise of XHTML, all of the tags available to you should be familiar. Writing XHTML requires that you follow the rules of conformant XML, such as correct syntax and structure. As XHTML looks so much like classic HTML, it faces no compatibility problems as long as some simple coding guidelines are followed.

If all of this sounds a bit heavy, don't worry. Transitioning to XHTML is quite a simple process, with only a few rules to remember.

Benefits of XHTML

The benefits of adopting XHTML now or migrating your existing site to the new standards are many. First of all, they ensure excellent forward compatibility for your creations. XHTML is the new set of standards that the web will be built on in the years to come, so future-proofing your work early will save you much trouble later on. Future browser versions might stop supporting deprecated elements from old HTML drafts, and so many old basic HTML sites may start displaying incorrectly and unpredictably.

Once you have used XHTML for a short time, it is no more difficult to use than HTML ever was, and in ways is easier since it is built on a more simplified set of standards. Writing code is a more streamlined experience, as gone are the days of browser hacks and display tricks. Editing your existing code is also a nicer experience as it is infinitely cleaner and more self-explanatory. Browsers can also interpret and display a clean XHTML page quicker than one with errors that the browser may have to handle.

A well-written XHTML page is more accessible than an old-style HTML page and is guaranteed to work in any standards-compliant browser (which the latest round have finally become) due to the insistence on rules and sticking to accepted W3C specifications. As mentioned above, XHTML allows greater access to configurations other than a computer and browser. This interoperability is another aspect of XHTML's greater accessibility.

XHTML Coding

The first thing you need to know about changing over to XHTML as the new standard is that there really isn't much new to learn. No new tags or attributes have been added into your repertoire, like HTML 4 (although a few have been deprecated); this is just a move towards good, valid, and efficient coding. XHTML documents stress logical structure and simplicity and use CSS for nearly all presentational concerns. It just means you have to change the way you write code. Even if you always wrote great code before, there're a few new practices you need to add.

What's even more quality about it though, is that a page written entirely in XHTML will still work fine in the current generation of browsers, so you shouldn't have any problems migrating your site across.

XML Declaration

An XML declaration at the very top of your document defines both the version of XML you're using as well as the character encoding. It is recommended but not required; as a few old browsers will choke on a page that begins this way. For this reason, I advise against using the correct line:

<?xml version="1.0" encoding="UTF-8"?>

and instead of using a meta tag in your document. If you're using Unicode,

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

And if you're using the more common ISO-8859-1 encoding, use

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

XHTML DTDs

Whether you use the XML declaration or not, every XHTML document must be defined as such by a line of code at the start of the page, and some attributes in the main tag, which tell the browser what language the text is is in. The opening line is the DTD (Document Type Declaration). This tells your browser and validators the nature of your page.

A DTD is a file your browser reads with the names and attributes of all of the possible tags that you can use in your markup defined in it. Newer browsers will usually have the latest specs written into their DTDs. The official XHTML Strict DTD is available for you to attempt to read. Declare it by putting this at the very top of your code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

That DTD is the one you use if you're committed to writing entirely correct XHTML code. Strict XHTML dispenses with a whole lot of presentational tags and attributes and is indeed very strict.

If you choose to use it, you're going to have to become close friends with the W3C validator. You won't be permitted to use the font tag at all, nor will attributes like width and height be allowed in your tables. You won't be able to use the border attribute on images and will have to use the alt attribute on all images if you want to validate. You get the idea — almost all presentational attributes are restricted in favor of wider CSS utilization, so unless you know your stuff in this regard, it'd be best to use XHTML Transitional below.

If you're going to hover between HTML and XHTML use the next DTD, which is a bit looser, and if you're putting together a frameset page, use the last one.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

Most people will opt for XHTML Transitional, as changing straight to Strict can be a daunting prospect. If you feel you're able to work within Strict's constraints, by all means, go for it.

A correct DTD allows the browser to go into standards mode, which will render your page correctly, and similarly across browsers. Without a full DTD, your browser enters ‘compatibility’, or ‘quirks’ mode, behaving like a version 4 browser, including all of their associated quirks and inconsistencies. Also, these declarations are all case-sensitive, so don't change them in any way.

Finally, you need to define the XML Namespace your document uses. Don't stress about this — it's simply a definition of which set of tags you're going to be using, and concerns the modular properties of XHTML. It's set by adding an attribute to the tag. While we're at it, we specify the language of our pages too. Modify your tags to this:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> </html>

XHTML Coding Practices

And now the moment you've been waiting for — the different styles of coding used by an XHTML author compared to the old HTML methods. You shouldn't have any problems adopting these new techniques, so long as you work carefully. It should be noted, even if it is an obvious point, that you really must hand code to be able to write valid XHTML. No current visual editor comes close to the compliance required.

source tip: Even though your code is changing, your filenames won't have to — you end your files with .html as always.

1. Tags and attributes have to be lowercase
Whereas before it used to come down to preference whether you used <B> or <b>, now all of your tags and attributes have to be in lowercase. This is because XML is case-sensitive — i.e. a tag in capitals is a different tag to one in small letters.

2. All tags must be closed
Now all of those once-optional </p> and </li> tags are essential for your XHTML documents to validate. Even empty elements like img, hr, and br must now be closed. You can use a standard forward-slashed end tag, or just add in a forward slash to the end of the tag.

<br /> or <br></br>

It's recommended that you use the former method here, and leave a space before the slash so older browsers aren't confused. They'll just ignore the trailing slash as an unrecognized attribute.

3. Documents must be well-formed
'Well-formedness' is a dream that you were meant to try and make real from the start, but many coders write badly-syntax code. You have to open and close tags correctly in XHTML and nest them properly.

Bad: <p>My coding is <b>bad</p></b>
Good: <p>But my coding is <b>good</b></p>

Remember the simple rule you should have been taught at the very start: The first tag you open is the last tag you close.

4. Attribute values must be quoted
Back in HTML, you could leave out the quotes on a number value, like HEIGHT=3, but now all values have to have quotation marks around them, so that would become height="3".

5. Attribute Minimisation
Some HTML tags had one-word attributes, like HR's NOSHADE. You can't use these anymore, and must add the attribute in as its own value, like so:

<hr noshade="noshade" />

Any browser compatible with HTML 4.01 shouldn't have a problem with markup like this.

6. Internal Links
Internal links in HTML were made using a combination of the <a> tag and the attribute. In XHTML, to go along with XML, you use the attribute to make these links instead of the attribute. For a while, you should probably include both so that your links still work on older browsers, but this will be the method used in the future. The attribute has been deprecated.

<a href="#section">link</a> <p id="section" name="section"></p>

Since all tags can take the attribute, you can now make links to any element on your page. Most helpful if you add the link to a heading or specific paragraph.

7. Alternative text in images
While it has always been good practice to add the attribute to your images, now you must add some alternate text to every image on your page. If your image is purely decorative you can give it a null alt attribute with space:

<img src="header.gif" alt=" " />

You could also try adding the attribute to as many elements as possible. It's a good accessibility aid, especially on links.

8. Ampersands in URLs
Ampersand characters are frequently used in page addresses to carry variables, like in PHP. When coding these addresses into your XHTML, you must escape them using the entity value &. They'll be displayed as ampersand characters (&) on screen, of course.

<a href="reviews.php?page=27&style=blue">link</a>
becomes
<a href="reviews.php?page=27&style=blue">link</a>

9. Content must be wrapped in a block-level element
In XHTML Strict, when you add text to your page, you can’t add it directly into the body element. All text needs to be within a suitable containing block-level element, such as a p, a ul or a div.

HTML5

After HTML 4.01 and XHTML 1.0, the guys who were in control of HTML’s direction got sidetracked working on a new proposal for XHTML 2. At the same time, clever web developers were innovating constantly, hacking new functionality into websites and browsers. The path that XHTML 2 was taking started to look both boring and unrealistic, and it became pretty clear that a new approach was needed.
It was around this time that a bunch of pragmatic web technology fans, browser programmers, and specification writers started building something of their own, outside of the usual W3C procedures. They called themselves the Web Hypertext Application Technology Working Group (WHATWG) and developed a new spec. After some soul-searching, the W3C decided that HTML was still the future of the web. XHTML 2 was discontinued and HTML5 became the new specification that everyone’s effort should be poured into.
HTML5 is designed for the web, both now and in the future. This is the specification that we will be working with for the next decade at least, so the process of its development is relatively slow and considered. Many parts will be familiar, but there’s also plenty of new elements, attributes, and abilities to get excited about. You can check the latest version of the spec if you want all the detail. A full tutorial on HTML Source about the changes in HTML5 is forthcoming.