Javascript

Searchable Dynamic Content With AJAX Crawling





 



 


Google Search likes simple, easy-to-crawl websites. You like dynamic websites that show off your work and that really pop. But search engines can’t run your JavaScript. That cool AJAX routine that loads your content is hurting your SEO.

Google’s robots parse HTML with ease; they can pull apart Word documents, PDFs and even images from the far corners of your website. But as far as they’re concerned, AJAX content is invisible.

The Problem With AJAX

AJAX has revolutionized the Web, but it has also hidden its content. If you have a Twitter account, try viewing the source of your profile page. There are no tweets there — just code! Almost everything on a Twitter page is built dynamically through JavaScript, and the crawlers can’t see any of it. That’s why Google developed AJAX crawling.

Because Google can’t get dynamic content from HTML, you will need to provide it another way. But there are two big problems: Google won’t run your JavaScript, and it doesn’t trust you.

Google indexes the entire Web, but it doesn’t run JavaScript. Modern websites are little applications that run in the browser, but running those applications as they index is just too slow for Google and everyone else.

The trust problem is trickier. Every website wants to come out first in search results; your website competes with everyone else’s for the top position. Google can’t just give you an API to return your content because some websites use dirty tricks like cloaking to try to rank higher. Search engines can’t trust that you’ll do the right thing.

Google needs a way to let you serve AJAX content to browsers while serving simple HTML to crawlers. In other words, you need the same content in multiple formats.

Two URLs For The Same Content

Let’s start with a simple example. I’m part of an open-source project called Spiffy UI. It’s a Google Web Toolkit (GWT) framework for REST and rapid development. We wanted to show off our framework, so we made SpiffyUI.org using GWT.

GWT is a dynamic framework that puts all of our content in JavaScript. Our index.html file looks like this:

<body>
   <script type="text/javascript" language="javascript"
   src="org.spiffyui.spsample.index.nocache.js"></script>
</body>

Everything is added to the page with JavaScript, and we control our content with hash tags (I’ll explain why a little later). Every time you move to another page in our application, you get a new hash tag. Click on the “CSS� link and you’ll end up here:

http://www.spiffyui.org#css

The URL in the address bar will look like this in most browsers:

http://www.spiffyui.org/?css

We’ve fixed it up with HTML5. I’ll show you how later in this article.

This simple hash works well for our application and makes it bookmarkable, but it isn’t crawlable. Google doesn’t know what a hash tag means or how to get the content from it, but it does provide an alternate method for a website to return content. So, we let Google know that our hash is really JavaScript code instead of just an anchor on the page by adding an exclamation point (a “bang�), like this:

http://www.spiffyui.org#!css

This hash bang is the secret sauce in the whole AJAX crawling scheme. When Google sees these two characters together, it knows that more content is hidden by JavaScript. It gives us a chance to return the full content by making a second request to a special URL:

http://www.spiffyui.org?_escaped_fragment_=css

The new URL has replaced the #! with ?_escaped_fragment_=. Using a URL parameter instead of a hash tag is important, because parameters are sent to the server, whereas hash tags are available only to the browser.

That new URL lets us return the same content in HTML format when Google’s crawler requests it. Confused? Let’s look at how it works, step by step.

Snippets Of HTML

The whole page is rendered in JavaScript. We needed to get that content into HTML so that it is accessible to Google. The first step was to separate SpiffyUI.org into snippets of HTML.

Google still thinks of a website as a set of pages, so we needed to serve our content that way. This was pretty easy with our application, because we have a set of pages, and each one is a separate logical section. The first step was to make the pages bookmarkable.

Bookmarking

Most of the time, JavaScript just changes something within the page: when you click that button or pop up that panel, the URL of the page does not change. That’s fine for simple pages, but when you’re serving content through JavaScript, you want give users unique URLs so that they can bookmark certain areas of your application.

JavaScript applications can change the URL of the current page, so they usually support bookmarking via the addition of hash tags. Hash tags work better than any other URL mechanism because they’re not sent to the server; they’re the only part of the URL that can be changed without having to refresh the page.

The hash tag is essentially a value that makes sense in the context of your application. Choose a tag that is logical for the area of your application that it represents, and add it to the hash like this:

http://www.spiffyui.org#css

When a user accesses this URL again, we use JavaScript to read the hash tag and send the user to the page that contains the CSS.

You can choose anything you want for your hash tag, but try to keep it readable, because users will be looking at it. We give our hashes tags like css, rest and security.

Because you can name the hash tag anything you want, adding the extra bang for Google is easy. Just slide it between the hash and the tag, like this:

http://www.spiffyui.org#!css

You can manage all of your hash tags manually, but most JavaScript history frameworks will do it for you. All of the plug-ins that support HTML4 use hash tags, and many of them have options for making URLs bookmarkable. We use History.js by Ben Lupton. It’s easy to use, it’s open source, and it has excellent support for HTML5 history integration. We’ll talk more about that shortly.

Serving Up Snippets

The hash tag makes an application bookmarkable, and the bang makes it crawlable. Now Google can ask for special escaped-fragment URLs like so:

screenshot

When the crawler accesses our ugly URL, we need to return simple HTML. We can’t handle that in JavaScript because the crawler doesn’t run JavaScript in the crawler. So, it all has to come from the server.

You can implement your server in PHP, Ruby or any other language, as long as it delivers HTML. SpiffyUI.org is a Java application, so we deliver our content with a Java servlet.

The escaped fragment tells us what to serve, and the servlet gives us a place to serve it from. Now we need the actual content.

Getting the content to serve is tricky. Most applications mix the content in with the code; but we don’t want to parse the readable text out of the JavaScript. Luckily, Spiffy UI has an HTML-templating mechanism. The templates are embedded in the JavaScript but also included on the server. When the escaped fragment looks for the ID css, we just have to serve CSSPanel.html.

The template without any styling looks very plain, but Google just needs the content. Users see our page with all of the styles and dynamic features:

screenshot

Google gets only the unstyled version:

screenshot

You can see all of the source code for our SiteMapServlet.java servlet. This servlet is mostly just a look-up table that takes an ID and serves the associated content from somewhere on our server. It’s called SiteMapServlet.java because this class also handles the generation of our site map.

Tying It All Together With A Site Map

Our site map tells the crawler what’s available in our application. Every website should have a site map; AJAX crawling doesn’t work without one.

Site maps are simple XML documents that list the URLs in an application. They can also include data about the priority and update frequency of the app’s pages. Normal entries for site maps look like this:

<url>
   <loc>http://www.spiffyui.org/</loc>
   <lastmod>2011-07-26</lastmod>
   <changefreq>daily</changefreq>
   <priority>1.0</priority>
</url>

Our AJAX-crawlable entries look like this:

<url>
   <loc>http://www.spiffyui.org/#!css</loc>
   <lastmod>2011-07-26</lastmod>
   <changefreq>daily</changefreq>
   <priority>0.8</priority>
</url>

The hash bang tells Google that this is an escaped fragment, and the rest works like any other page. You can mix and match AJAX URLs and regular URLs, and you can use only one site map for everything.

You could write your site map by hand, but there are tools that will save you a lot of time. The key is to format the site map well and submit it to Google Webmaster Tools.

Google Webmaster Tools

Google Webmaster Tools gives you the chance to tell Google about your website. Log in with your Google ID, or create a new account, and then verify your website.

screenshot

Once you’ve verified, you can submit your site map and then Google will start indexing your URLs.

And then you wait. This part is maddening. It took about two weeks for SpiffyUI.org to show up properly in Google Search. I posted to the help forums half a dozen times, thinking it was broken.

There’s no easy way to make sure everything is working, but there are a few tools to help you see what’s going on. The best one is Fetch as Googlebot, which shows you exactly what Google sees when it crawls your website. You can access it in your dashboard in Google Webmaster Tools under “Diagnostics.�

screenshot

Enter a hash bang URL from your website, and click “Fetch.� Google will tell you whether the fetch has succeeded and, if it has, will show you the content it sees.

screenshot

If Fetch as Googlebot works as expected, then you’re returning the escaped URLs correctly. But you should check a few more things:

  • Validate your site map.
  • Manually try the URLs in your site map. Make sure to try the hash-bang and escaped versions.
  • Check the Google result for your website by searching for site:www.yoursite.com.

Making Pretty URLs With HTML5

Twitter leaves the hash bang visible in its URLs, like this:

http://twitter.com/#!/ZackGrossbart

This works well for AJAX crawling, but again, it’s slightly ugly. You can make your URLs prettier by integrating HTML5 history.

Spiffy UI uses HTML5 history integration to turn a hash-bang URL like this…

http://www.spiffyui.org#!css

… into a pretty URL like this:

http://www.spiffyui.org?css

HTML5 history makes it possible to change this URL parameter, because the hash tag is the only part of the URL that you can change in HTML4. If you change anything else, the entire page reloads. HTML5 history changes the entire URL without refreshing the page, and we can make the URL look any way we want.

This nicer URL works in our application, but we still list the hash-bang version on our site map. And when browsers access the hash-bang URL, we change it to the nicer one with a little JavaScript.

Cloaking

Earlier, I mentioned cloaking. It is the practice of trying to boost a website’s ranking in search results by showing one set of pages to Google and another to regular browsers. Google doesn’t like cloaking and may remove offending websites from its search index.

AJAX-crawling applications always show different results to Google than to regular browsers, but it isn’t cloaking if the HTML snippets contain the same content that the user would see in the browser. The real mystery is how Google can tell whether a website is cloaking or not; crawlers can’t compare content programmatically because they don’t run JavaScript. It’s all part of Google’s Googley power.

Regardless of how it’s detected, cloaking is a bad idea. You might not get caught, but if you do, you’ll be removed from the search index.

Hash Bang Is A Little Ugly, But It Works

I’m an engineer, and my first response to this scheme is “Yuck!� It just feels wrong; we’re warping the purpose of URLs and relying on magic strings. But I understand where Google is coming from; the problem is extremely difficult. Search engines need to get useful information from inherently untrustworthy sources: us.

Hash bangs shouldn’t replace every URL on the Web. Some websites have had serious problems with hash-bang URLs because they rely on JavaScript to serve content. Simple pages don’t need hash bangs, but AJAX pages do. The URLs do look a bit ugly, but you can fix that with HTML5.

Further Reading

We’ve covered a lot in this article. Supporting AJAX crawling means that you need to change your client’s code and your server’s code. Here are some links to find out more:

Thanks to Kristen Riley for help with some of the images in this article.

(al)


© Zack Grossbart for Smashing Magazine, 2011.


Useful Node.js Tools, Tutorials And Resources


  

Created by Ryan Dahl in 2009, Node.js is a relatively new technology which has gained a lot of popularity among Web developers recently. However, not everyone knows what it really is. Node.js is essentially a server-side JavaScript environment that uses an asynchronous event-driven model. What this means is simple: it’s an environment which is intended for writing scalable, high performance network applications. It’s like Ruby’s Event Machine or Python’s Twisted, but it takes the event model a bit further—it presents the event loop as a language construct instead of as a library.

And that’s not all: what’s really great about Node.js is the thousands of modules available for any purpose, as well as the vibrant community behind this young project. In this round-up, you will find the most useful resources for Node.js, from handy tools to detailed tutorials, not to mention in-depth articles and resources on this promising technology. Do you use Node.js already? Let us know in the comments to this post!

Useful Node.js Tools

Node Express Boilerplate
Node Express Boilerplate gives the developer a clean slate, while bundling enough useful features to remove all of those redundant tasks that can derail a project before it even gets started.

Node Express Boilerplate

Socket.IO
Socket.IO is a cross-browser Web socket that aims to make real-time apps possible in every browser and mobile device, blurring the distinctions between the various transport mechanisms. It’s care-free real time, in JavaScript.

Socket.IO: Cross-browser WebSocket for realtime apps.

Mastering Node
With Mastering Node, you can write high-concurrency Web servers, using the CommonJS module system, Node.js’s core libraries, third-party modules, high-level Web development and more.

Mastering Node

Log.io
Your infrastructure may have hundreds of log files spread across dozens of machines. To help you monitor deployments and troubleshoot, Log.io lets you instantly see composite streams of log messages in a single user interface.

Log.io

Formaline
Formaline is a low-level, full-featured (Node.js) module for handling form requests (HTTP POSTs and PUTs) and for parsing uploaded files quickly. It is also ready to use with, for example, middleware such as Connect.

Formaline

LDAPjs
LDAPjs is a pure-JavaScript, from-scratch framework for implementing LDAP clients and servers in Node.js. It is intended for developers who are used to interacting with HTTP services in Node.js and Express.

ldapjs

Node Supervisor
This is a little supervisor script for Node.js. It runs your program and watches for code changes, so you can have hot-code reloading-ish behavior without worrying about memory leaks or having to clean up all of the inter-module references, and without a whole new require system.

Node Supervisor

Jade – Template Engine
Jade is a template engine for Node.js applications. It combines great power and flexibility with a nice and clean syntax.

Jade - Template Engine

Express
This is a Sinatra-inspired Web development framework for Node.js: fast, flexible and sexy.

Express - Node web framework

Hook.io
hook.io creates a distributed node.js EventEmitter that works cross-process / cross-platform / cross-browser. Think of it like a real-time event bus that works anywhere JavaScript is supported.

Hook.io

Node Package Manager
NPM is a package manager for node. You can use it to install and publish your node programs. It manages dependencies and does other cool stuff.

Node Package Manager

Node-QRcode
Despite being quite young, Node.js already has a huge number of libraries for every possible application. This one is a QR code generator.

Node QRCode Generator

NWM
NWM is a dynamic window manager for X that was written at NodeKO 2011. It uses libev to interface with X11, and it allows you to lay out windows in Node.js.

NWM - Node Window Manager

Bricks.js
Bricks.js is an advanced modular Web framework built on Node.js. It is highly flexible. Bricks.js can be used as a standalone static Web server, a basic routing framework or a multi-level Apache-like routing system; and it is modular enough to have the capability to completely switch out its routing engine.

Bricks.js

Node.js Modules
A list of almost all the Node.js most famous modules organized by category. This list definitively is worth a look.

Node.js Modules

90 open-source Node.js modules
Recently, Browserling released over 90 Node.js modules to the open-source community. Some of them are small and strange modules, others might be pretty useful for your next Node.js project.

90 Opensource Node.js modules

Calipso
Calipso is a content management system (CMS) based on the NodeJS server.

Calipso - A NodeJS CMS

PDFKit
PDFKit is a PDF document-generation library for Node.js that makes it easy to create complex, multi-page, printable documents. It is written in pure CoffeeScript, but you can use the API in plain ’ol JavaScript if you like. The API embraces chain-ability, and it includes both low-level functions as well as abstractions for higher-level functionality.

PDFKit - A PDF Generation Library for Node

Forever
A simple CLI tool to ensure that a given script runs continuously (i.e. forever).

Forever - Make Scripts run Forever

Introducing Node.js

Node.js Step by Step
Node.js is an amazing new technology, but unless you’re a JavaScript developer, the process of becoming acquainted with it can quickly become a bit overwhelming. If you want to learn how to use Node.js, this set of articles and screencasts might do the trick.

Node.js Step by Step

What Is Node.js?
Another interesting discussion on StackOverflow about what Node.js is and is not. Recommended for those who are approaching Node.js for the first time.

What is node.js? - Stack Overflow

Learning Server-Side JavaScript
Node.js is all the buzz at the moment, and it makes creating high-performance, real-time Web applications easy. It allows JavaScript to be used end to end, on both the server and client. This tutorial walks you through from installing Node.js and writing your first “Hello World� program to building a scalable streaming Twitter server.

Learning Server-Side JavaScript

Node.js Is Important: An Introduction
“Once in a while, you come across a technology and are blown away by it. You feel that something like this should have been around much earlier and that it will be a significant milestone, not just in your own life as a developer but in general.

Node.js is Important. An Introduction

The Secrets of Node’s Success
In the short time since its initial release in late 2009, Node.js has captured the interest of thousands of experienced developers, grown a package manager and a corpus of interesting modules and applications, and even spawned a number of start-ups. What is it about this technology that makes it interesting to developers? And why has it succeeded while other server-side JavaScript implementations linger in obscurity or fail altogether?

The secrets of Node’s success

Asynchronous Code Design with Node.js
The asynchronous event-driven I/O of Node.js is currently evaluated by many enterprises as a high-performance alternative to the traditional synchronous I/O of multi-threaded enterprise application server. The asynchronous nature means that enterprise developers have to learn new programming patterns, and unlearn old ones

Asynchronous Code Design with Node.js

A Giant Step Backwards?
In this article, Fenn Bailey expresses his opinion of Node.js and why he sometimes thinks Node.js is a step backward compared to other solutions.

A giant step backwards?

Node.js Is Backwards
A hot topic in computing is parallel programming in languages such as Erlang. Will JavaScript join the party?

Node.js is backwards

Videos And Screencasts On Node.js

Node.js Meetup: Distributed Web Architectures
A series of videos from the Node.js Meetup at Joyent headquarters, discussing how to build distributed Web architectures with Node.js.

Node.js Meetup: Distributed Web Architectures

Introduction to Node.js with Ryan Dahl
In this presentation Ryan Dahl, the man behind Node.js will introduce you to this event-driven I/O framework with a few examples showing Node.js in action.

Introduction to Node.js with Ryan Dahl

SenchaCon 2010: Server-side JavaScript with Node, Connect & Express on Vimeo
Node.js has unleashed a new wave of interest in server side Javascript. In this session, you’ll learn how to get productive with node.js by leveraging Connect and Express node middleware.

SenchaCon 2010: Server-side JavaScript with Node, Connect & Express on Vimeo

Technical Articles And Tutorials On Node.js

Proxying HTTP and Web Sockets in Node
This guide is geared to beginners and people who are unfamiliar with reverse HTTP proxying, Web socket proxying, load balancing, virtual host configuration, request forwarding and other Web proxying concepts.

Proxying HTTP and Websockets in Node

Bulletproof Node.js Coding
“Right around the time that I started the third refactoring/rewrite of the code, I felt like I had gotten a feel for how to write bulletproof code, and I thought it would be worth sharing some of the style and conventions I came to adopt.�

Bulletproof Node.js Coding

How to Write a Native Node.js Extension
In this tutorial, you will learn how to write a native Node.js extension the right way, from the very basics to packaging the extension for NPM.

How to write a native Node.js extension

Let’s Make a Web App: Nodepad
This series will walk you through building a Web app with Node.js, covering all of the major areas you’ll face when building your own applications.

Let’s Make a Web App: Nodepad

HTML5 Canvas Drawing with Web Sockets, Node.JS and Socket.io
Web sockets and canvas are two really cool features that are currently being implemented in browsers. This tutorial gives you a quick rundown of how they both work, and you’ll create a real-time drawing canvas that is powered by Node.js and Web sockets.

HTML5 Canvas Drawing with WebSockets, Node.JS & Socket.io

Developing Multiplayer HTML5 Games with Node.js
Inspired by the famous iOS game Osmos, developer Boris Smus has created an alternative version of the game using HTML5 canvas and Node.js. This article explains the main phases of the project.

Developing Multiplayer HTML5 Games with Node.js

Deploying Node.js on Amazon EC2
Amazon’s EC2 is a popular choice for cloud applications. This tutorial shows how Node.js can be deployed on an EC2 instance.

Deploying node.js on Amazon EC2

A Simple Node.js + CouchDB Calendar
In this tutorial by Chris Storm, you will learn how to build a Web calendar with Node.js and CouchDB.

A Simple Node.js + CouchDB Calendar

IIS7
The IISnode project provides a native IIS 7.x module that enables hosting of Node.js applications on IIS. The project uses the Windows build of node.exe, which has recently seen major improvements.

Hosting node.js applications on IIS

Node.js + Phone to Control a Browser Game
Someone wondered how easily a smart phone – specifically using its gyroscopes and accelerometers – could be used as a controller for a multi-player game on a larger screen. With a bit of Node.js and HTML5 magic, it turned out to be pretty simple.

Node.js + Phone to Control a Browser Game

Is There a Template Engine for Node.js?
An engaging discussion appeared on StackOverflow about the template engines that are available for Node.js. Really useful arguments came out of this discussion.

Blogs, Podcasts, Resources On Node.js

How to Node
How to Node is a community-supported blog created by Tim Caswell. Its purpose is to teach how to do various tasks in Node.js and the fundamental concepts needed to write effective code.

How To Node

Nodejitsu
A really interesting blog about scaling Node.js apps in the cloud and about the Node.js events in general.

Nodejitsu Blog

Node Up
A podcast that reviews Node.js, explains its philosophy and goes over many of its popular libraries.

Node Up: Node.js Podcast

Node Tuts
Free screencast tutorials.

Node Tuts - Node.js Free screencast tutorials

Minute With Node.js
Node.js is constantly changing and growing with each new version. New libraries and frameworks are coming out daily that allow you to write JavaScript for new and exciting projects that were previously impossible. This is a one-stop shop for news updates on the entire Node.js eco-system, with a heavy slant on hardcore nerdery.

Minute With Node.js

Felix’s Node.js Guide
Over the past few months, Felix have given a lot of talks and done a lot of consulting on Node.js. He found himself repeating a lot of things over and over, so he used some of his recent vacation to start this opinionated and unofficial guide to help people getting started in Node.js.

Felix’s Node.js Guide

Node.js Knockout
Node.js Knockout is a 48-hour hackathon for Node.js. It’s an online virtual competition, with contestants worldwide.

Node.js Knockout

References And Books

Node.JS Help Sheet
“Node.JS is an evented I/O framework for the V8 JavaScript engine. It’s ideal for writing scalable network programs, such as Web servers. We’ve been working on some exciting things with Node.js, and we felt it was only fair to share our knowledge in the form of an easy-to-read Help Sheet.�

Node.JS Help Sheet

The Node Beginner Book
The aim of this document is to get you started with developing applications for Node.js. It teaches you everything you need to know about advanced JavaScript along the way. It goes way beyond your typical “Hello World� tutorial.

The Node Beginner Book

Up and Running With Node.js
“Many people use the JavaScript programming languages extensively for programming the interfaces of websites. Node.js allows this popular programming language to be applied in many more contexts, in particular on Web servers. There are several notable features about Node.js that make it worthy of interest.�

Up and Running with Node.js

Poll: Do You Use Node.js In Your Projects?

How often have you used Node.js in your projects? Have you found some particular tools or articles useful? Share your experience in the comments to this post. Thank you.


Related Posts

You might be interested in the following related posts:

(al)


© Luca Degasperi for Smashing Magazine, 2011.


Behind The Scenes Of Nike Better World

Advertisement in Behind The Scenes Of Nike Better World
 in Behind The Scenes Of Nike Better World  in Behind The Scenes Of Nike Better World  in Behind The Scenes Of Nike Better World

Perhaps one of the most talked about websites in the last 12 months has been Nike Better World. It’s been featured in countless Web design galleries, and it still stands as an example of what a great idea and some clever design and development techniques can produce.

In this article, we’ll talk to the team behind Nike Better World to find out how the website was made. We’ll look at exactly how it was put together, and then use similar techniques to create our own parallax scrolling website. Finally, we’ll look at other websites that employ this technique to hopefully inspire you to build on these ideas and create your own variation.

Nike Better World

Nike-Better-World in Behind The Scenes Of Nike Better World

Nike Better World is a glimpse into how Nike’s brand and products are helping to promote sports and its benefits around the world. It is a website that has to be viewed in a browser (preferably a latest-generation browser, although it degrades well) rather than as a static image, because it uses JavaScript extensively to create a parallax scrolling effect.

A good deal of HTML5 is used to power this immersive brand experience and, whatever your views on Nike and its products, this website has clearly been a labor of love for the agency behind it. Although parallax scrolling effects are nothing new, few websites have been able to sew together so many different design elements so seamlessly. There is much to learn here.

An “Interactive Storytelling Experience�

In our opinion, technologies are independent of concept. Our primary focus was on creating a great interactive storytelling experience.

– Widen+Kennedy

Nike turned to long-time collaborator Widen+Kennedy (W+K), one of the largest independent advertising agencies in the world, to put together a team of four people who would create Nike Better World: Seth Weisfeld was the interactive creative director, Ryan Bolls produced, while Ian Coyle and Duane King worked on the design and interaction.

Wiedenkennedy in Behind The Scenes Of Nike Better World

I started by asking the team whether the initial concept for the website pointed to the technologies they would use. As the quote above reveals, they in fact always start by focusing on the concept. This is a great point. Too many of us read about a wonderful new technique and then craft an idea around it. W+K walk in the opposite direction: they create the idea first, and sculpt the available technologies around it.

So, with the concept decided on, did they consciously do the first build as an “HTML5 website,� or did this decision come later?

There were some considerations that led us to HTML5. We knew we wanted to have a mobile- and tablet-friendly version. And we liked the idea of being able to design and build the creative only once to reach all the screens we needed to be on. HTML5 offered a great balance of creativity and technology for us to communicate the Nike Better World brand message in a fresh and compelling way.

– W+K

HTML5 is still not fully supported in all browsers (read “in IE�) without JavaScript polyfills, so just how cross-browser compatible did the website have to be?

The primary technical objectives were for the site to be lightweight, optimized for both Web and devices, as well as to be scalable for future ideas and platforms.

– W+K

To achieve these goals, the website leans on JavaScript for much of the interactivity and scrolling effects. Later, we’ll look at how to create our own parallax scrolling effect with CSS and jQuery. But first, we should start with the template and HTML.

The Starting Boilerplate

It’s worth pointing out the obvious first: Nike Better World is original work and should not be copied. However, we can look at how the website was put together and learn from those techniques. We can also look at other websites that employ parallax scrolling and then create our own page, with our own code and method, and build on these effects.

I asked W+K if it starts with a template.

We started without a framework, with only reset styles. In certain cases, particularly with experimental interfaces, it ensures that complete control of implementation lies in your hands.

– W+K

If you look through some of the code on Nike Better World, you’ll see some fairly advanced JavaScript in a class-like structure. However, for our purposes, let’s keep things fairly simple and rely on HTML5 Boilerplate as our starting point.

Download HTML5 Boilerplate. The “strippedâ€� version will do. You may want to delete some files if you know you won’t be using them (crossdomain.xml, the test folder, etc.).

As you’ll see from the source code (see the final code below), our page is made up of four sections, all of which follow a similar pattern. Let’s look at one individual section:

<section class="story" id="first" data-speed="8" data-type="background">
 <div data-type="sprite" data-offsetY="950" data-Xposition="25%" data-speed="2"></div>
 <article>
  <h2>Background Only</h2>
  <div>
   <p>Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. Aenean ultricies mi vitae est. Mauris placerat eleifend leo.</p>
  </div>
 </article>
</section>

I’m not sure this is the best, most semantic use of those HTML5 tags, but it’s what we need to make this effect work. Normally, a section has a heading, so, arguably, the section should be a div and the article should be a section. However, as W+K points out, the HTML5 spec is still young and open to interpretation:

HTML5 is still an emerging standard, particularly in implementation. A lot of thought was given to semantics. Some decisions follow the HTML5 spec literally, while others deviate. As with any new technology, the first to use it in real-world projects are the ones who really shape it for the future.

– W+K

This is a refreshing interpretation. Projects like Nike Better World are an opportunity to “reality check� an emerging standard and, for the conscientious among us, to provide feedback on the spec.

In short, is the theory of the spec practical? W-K elaborates:

We use the article tag for pieces of content that can (and should) be individually (or as a group) syndicated. Each “story� is an article. We chose divs to wrap main content. We took the most liberty with the section tag, as we feel its best definition in the spec is as chapters of content, be it globally.

– W+K

As an aside (no pun intended!), HTML5 Doctor has begun a series of mark-up debates called Simplequizes, which are always interesting and illustrate that there is rarely one mark-up solution for any problem. Make sure to check them out.

In style.css, we can add a background to our section with the following code:

section { background: url(../images/slide1a.jpg) 50% 0 no-repeat fixed; }

We should also give our sections a height and width, so that the background images are visible:

.story { height: 1000px; padding: 0; margin: 0; width: 100%; max-width: 1920px; position: relative; margin: 0 auto; }

I’ve set the height of all our sections to 1000 pixels, but you can change this to suit your content. You can also change it on a per-section basis.

We have also made sure that the maximum width of the section is the maximum width of the background (1920 pixels), and we have specified a relative position so that we can absolutely position its children.

Here’s the page before adding JavaScript. It’s worth digging into the source code to see how we’ve duplicated the sections in the HTML and CSS.

Backgroundonly in Behind The Scenes Of Nike Better World

Even with this code alone, we already have a pleasing effect as we scroll down the page. We’re on our way.

The HTML5 Data Attribute

Before looking at parallax scrolling, we need to understand the new data attribute, which is used extensively throughout the HTML above.

Back in the good old days, we would shove any data that we wanted to be associated with an element into the rel attribute. If, for example, we needed to make the language of a story’s content accessible to JavaScript, you might have seen mark-up like this:

<article class='story' id="introduction" rel="en-us"></article>

Sometimes complex DOM manipulation requires more information than a rel can contain, and in the past I’ve stuffed information into the class attribute so that I could access it. Not any more!

The team at W+K had the same issue, and it used the data attribute throughout Nike Better World:

The data attribute is one of the most important attributes of HTML5. It allowed us to separate mark-up, CSS and JavaScript into a much cleaner workflow. Websites such as this, with high levels of interaction, are almost application-like behind the scenes, and the data attribute allows for a cleaner application framework.

– W+K

Sportisanaddress in Behind The Scenes Of Nike Better World

So, what is the data attribute? You can read about it in the official spec, which defines it as follows:

Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.

– W+K

In other words, any attribute prefixed with data- will be treated as storage for private data; it does not affect the mark-up, and the user cannot see it. Our previous example can now be rewritten as:

<article class='story' id="introduction" data-language="en-us"></article>

The other good news is that you can use more than one data attribute per element, which is exactly what we’re doing in our parallax example. You may have spotted the following:

<div data-type="sprite" data-offsetY="100" data-Xposition="50%" data-speed="2"></div>

Here, we are storing four pieces of information: the x and y data offsets and the data speed, and we are also marking this element as a data type. By testing for the existence of data-type in the JavaScript, we can now manipulate these elements as we wish.

Parallax Scrolling

On our page, three things create the parallax scrolling illusion:

  • The background scrolls at the slowest rate,
  • Any sprites scroll slightly faster than the background,
  • Any section content scrolls at the same speed as the window.

With three objects all scrolling at different speeds, we have created a beautiful parallax effect.

AnExplanation in Behind The Scenes Of Nike Better World

It goes without saying that we don’t need to worry about the third because the browser will take care of that for us! So, let’s start with the background scroll and some initial jQuery.

$(document).ready(function(){
 // Cache the Window object
 $window = $(window);
// Cache the Y offset and the speed
$('[data-type]').each(function() {
  $(this).data('offsetY', parseInt($(this).attr('data-offsetY')));
  $(this).data('speed', $(this).attr('data-speed'));
});
// For each element that has a data-type attribute
 $('section[data-type="background"]').each(function(){
  // Store some variables based on where we are
  $(this).data('speed', parseInt($(this).attr('data-speed')));
   var $self = $(this),
   offsetCoords = $self.offset(),
   topOffset = offsetCoords.top;
   $(window).scroll(function(){
    // The magic will happen in here!
   }); // window scroll
 });	// each data-type
}); // document ready

First, we have our trusty jQuery document ready function, to ensure that the DOM is ready for processing. Next, we cache the browser window object, which we will refer to quite often, and call it $window. (I like to prefix jQuery objects with $ so that I can easily see what is an object and what is a variable.)

We also use the jQuery .data method to attach the Y offset (explained later) and the scrolling speed of the background to each element. Again, this is a form of caching that will speed things up and make the code more readable.

We then iterate through each section that has a data attribute of data-type="background" with the following:

$('section[data-type="background"]').each(function(){}

Already we can see how useful data attributes are for storing multiple pieces of data about an object that we wish to use in JavaScript.

Inside the .each function, we can start to build up a picture of what needs to be done. For each element, we need to grab some variables:

// Store some variables based on where we are
var $self = $(this),
    offsetCoords = $self.offset(),
    topOffset = offsetCoords.top;

We cache the element as $self (again, using the $ notation because it’s a jQuery object). Next, we store the offset() of the element in offsetCoords and then grab the top offset using the .top property of offset().

Finally, we set up the window scroll event, which fires whenever the user moves the scroll bar or hits an arrow key (or moves the trackpad or swipes their finger, etc.).

We need to do two more things: check that the element is in view and, if it is, scroll it. We test whether it’s in view using the following code:

// If this section is in view
if ( ($.Window.scrollTop() + $.Window.height()) > ($offsetCoords.top) &&
( ($offsetCoords.top + $self.height()) > $.Window.scrollTop() ) ) {
}

The first condition checks whether the very top of the element has scrolled into view at the very bottom of the browser window.

The second condition checks whether the very bottom of the element has scrolled past the very top of the browser window.

You could use this method to check whether any element is in view. It’s sometimes useful (and quicker) to process elements only when the user can see them, rather than when they’re off screen.

So, we now know that some part of the section element with a data-type attribute is in view. We can now scroll the background. The trick here is to scroll the background slower or faster than the browser window is scrolling. This is what creates the parallax effect.

Here’s the code:

// Scroll the background at var speed
// the yPos is a negative value because we're scrolling it UP!
var yPos = -($window.scrollTop() / $self.data('speed'));

// If this element has a Y offset then add it on
if ($self.data('offsetY')) {
  yPos += $self.data('offsetY');
}

// Put together our final background position
var coords = '50% '+ yPos + 'px';

// Move the background
$self.css({ backgroundPosition: coords });

The y position is calculated by dividing the distance that the user has scrolled from the top of the window by the speed. The higher the speed, the slower the scroll.

Next, we check whether there is a y offset to apply to the background. Because the amount that the background scrolls is a function of how far the window has scrolled, the further down the page we are, the more the background has moved. This can lead to a situation in which the background starts to disappear up the page, leaving a white (or whatever color your background is) gap at the bottom of each section.

The way to combat this is to give those backgrounds an offset that pushes them down the page an extra few hundred pixels. The only way to find out this magic offset number is by experimenting in the browser. I wish it was more scientific than this, but this offset really does depend on the height of the browser window, the distance scrolled, the height of your sections and the height of your background images. You could perhaps write some JavaScript to calculate this, but to me this seems like overkill. Two minutes experimenting in Firebug yields the same result.

The next line defines a variable coords to store the coordinates of the background. The x position is always the same: 50%. This was the value we set in the CSS, and we won’t change it because we don’t want the element to scroll sideways. Of course, you’re welcome to change it if you want the background to scroll sideways as the user scrolls up, perhaps to reveal something.

(Making the speed a negative number for slower scrolling might make more sense, but then you’d have to divide by -$speed. Two negatives seems a little too abstract for this simple demonstration.)

Finally, we use the .css method to apply this new background position. Et voila: parallax scrolling!

Here’s the code in full:

// Cache the Window object
$window = $(window);

// Cache the Y offset and the speed of each sprite
$('[data-type]').each(function() {
  $(this).data('offsetY', parseInt($(this).attr('data-offsetY')));
  $(this).data('speed', $(this).attr('data-speed'));
});

// For each element that has a data-type attribute
$('section[data-type="background"]').each(function(){

// Store some variables based on where we are
var $self = $(this),
    offsetCoords = $self.offset(),
    topOffset = offsetCoords.top;

$(window).scroll(function(){

// If this section is in view
if ( ($window.scrollTop() + $window.height()) > (topOffset) &&
( (topOffset + $self.height()) > $window.scrollTop() ) ) {

  // Scroll the background at var speed
  // the yPos is a negative value because we're scrolling it UP!
  var yPos = -($window.scrollTop() / $self.data('speed'));

  // If this element has a Y offset then add it on
  if ($self.data('offsetY')) {
    yPos += $self.data('offsetY');
  }

  // Put together our final background position
  var coords = '50% '+ yPos + 'px';

  // Move the background
  $self.css({ backgroundPosition: coords });

  }; // in view

}); // window scroll

});	// each data-type

Of course, what we’ve done so far is quite a bit simpler than what’s on Nike Better World. W+K admits that the parallax scrolling threw it some challenges:

The parallax scrolling presented a few small challenges in cross-browser compatibility. It took a little experimenting to ensure the best scrolling experience. In the end, it was less about the actual parallax effect and more about synchronized masking of layers during transitions.

– W+K

W+K also reveals how it maintained a fast loading and paint speed by choosing its tools wisely:

The key to maintaining faster speeds is to use native CSS where possible, because DOM manipulation slows down rendering, particularly on older browsers and processors.

– W+K

For example, the “Moreâ€� button below spins on hover, an effect achieved with CSS3. In browsers that don’t support CSS3 transforms, the purpose of the graphic is still obvious.

CSSuse in Behind The Scenes Of Nike Better World

Adding More Elements

Of course, one of the other common features of parallax scrolling is that multiple items on the page scroll. So far, we have two elements that move independently of each other: the first is the page itself, which scrolls when the user moves the scroll bar, and the second is the background, which now scrolls at at slower rate thanks to the jQuery above and the background-position CSS attribute.

For many pages, this would be enough. It would be a lovely effect for the background of, say, a blog. However, Nike and others push it further by adding elements that move at a different speed than that of the page and background. To make things easy — well, easier — I’m going to call these new elements sprites.

Here’s the HTML:

<div id="smashinglogo" data-type="sprite" data-offsetY="1200" data-Xposition="25%" data-speed="2"></div>

Put this just before the closing </article> tag, so that it appears behind the contents of <article>. First, we give the div an id so that we can refer to it specifically in the CSS. Then we use our HTML5 data attribute to store a few values:

  • The status of a sprite,
  • A y (vertical) offset of 1200 pixels,
  • An x (horizontal) position as a percentage,
  • A scrolling speed.

We give the x position of the sprite a percentage value because it is relative to the size of the viewport. If we gave it an absolute value, which you’re welcome to try, there’s a chance it could slide out of view on either the left or right side.

Now about that y offset…

Inception

This is the bit that’s going to mess with your noodle and is perhaps the hardest part of the process to grasp.

Thanks to the logic in the JavaScript, the sprite won’t move until the parent section is in view. When it does move, it will move at (in this case) half the speed. You need the vertical position, then, to account for this slower movement; elements need to be placed higher up if they will be scrolling more slowly and, therefore, moving less on the y axis.

We don’t know how far the user has to scroll before the section appears at the bottom of the page. We could use JavaScript to read the viewport size and then do some calculations based on how far down the page the section is positioned. But that is already sounding too complicated. There is an easier way.

What we do know is how far the user has scrolled before the current section is flush with the top of the viewport: they have scrolled the y offset of that particular section. (Put another way, they have scrolled the height of all of the elements above the current one.)

So, if there are four sections, each 1000 pixels high, and the third section is at the top of the viewport, then the user must have scrolled 2000 pixels, because this is the total height of the preceding sections.

If we want our sprite to appear 200 pixels from the top of its parent section, you’d figure that the total vertical offset we give it should be 2200 pixels. But it’s not, because this sprite has speed, and this speed (in our example) is a function of how far the page has been scrolled.

Let’s assume that the speed is set as 2, which is half the speed at which the page is scrolling. When the section is fully in view, then the window has scrolled 2000 pixels. But we divide this by the speed (2) to get 1000 pixels. If we want the sprite to appear 200 pixels from the top, then we need to add 200 to 1000, giving us 200 pixels. Therefore, the offset is 1200. In the JavaScript, this number is inverted to -1200 because we are pushing the background-position down off the bottom of the page.

Here’s a sketch to show this in action.

Parallax-sketch in Behind The Scenes Of Nike Better World

This is one of those concepts that is easier to understand when you view the page and source code and scroll around with the console open in Firebug or Developer Tools.

The JavaScript looks like this:

// Check for other sprites in this section
$('[data-type="sprite"]', $self).each(function() {

  // Cache the sprite
  $sprite = $(this);

  // Use the same calculation to work out how far to scroll the sprite
  var yPos = -($.Window.scrollTop() / $sprite.data('speed'));
  var coords = $sprite.data('Xposition') + ' ' + (yPos + $sprite.data('offsetY')) + 'px';
  $sprite.css({ backgroundPosition: coords });
}); // sprites

HTML5 Video

One criticism levelled at Nike Better World is that it didn’t use HTML5 video. HTML5 is still not fully supported across browsers (I’m looking at you, Internet Explorer), but for the purposes of this article, we’ll embrace HTML5 video, thanks to the lovely folks at Vimeo and Yum Yum London.

But we can’t set a video as a background element, so we have a new challenge: how to position and scroll this new sprite?

Well, there are three ways:

  1. We could change its margin-top property within its parent section;
  2. We could make it a position: absolute element and change its top property when its parent section comes into view;
  3. We could define it as position: fixed and set its top property relative to the viewport.

Because we already have code for the third, let’s grab the low-hanging fruit and adapt it to our purposes.

Here’s the HTML we’ll use, which I’ve placed after the closing </article> tag:

<video controls width="480" width="320" data-type="video" data-offsetY="2500" data-speed="1.5">
  <source src="video/parallelparking.theora.ogv" type="video/ogg" />
  <source src="video/parallelparking.mp4" type="video/mp4" />
  <source src="video/parallelparking.webm" type="video/webm" />
</video>

First, we’ve opened our HTML5 video element and defined its width and height. We then set a new data-type state, video, and defined our y offset and the speed at which the element scrolls. It’s worth nothing that some experimentation is needed here to make sure the video is positioned correctly. Because it’s a position: fixed element, it will scroll on top of all other elements on the page. You can’t cater to every viewport at every screen resolution, but you can play around to get the best compromise for all browser sizes (See “Bespoke to Brokeâ€� below).

The CSS for the video element looks like this:

video { position: fixed; left: 50%; z-index: 1;}

I’ve positioned the video 50% from the left so that it moves when the browser’s size is changed. I’ve also given it a z-index: 1. This z-index prevents the video element from causing rendering problems with neighbouring sections.

And now for the JavaScript! This code should be familiar to you:

// Check for any Videos that need scrolling
$('[data-type="video"]', $self).each(function() {

  // Cache the sprite
  $video = $(this);

  // Use the same calculation to work out how far to scroll the sprite
  var yPos = -($window.scrollTop() / $video.data('speed'));
  var coords = (yPos + $video.data('offsetY')) + 'px';

  $video.css({ top: coords });

}); // video

And there you have it! A parallax scrolling HTML5 video.

Bespoke or Broke

Of course, every design is different, which means that your code for your design will be unique. The JavaScript above will plug and play, but you will need to experiment with values for the y offset to get the effect you want. Different viewport sizes means that users will scroll different amounts to get to each section, and this in turn affects how far your elements scroll. I can’t control it any more than you can, so you have to pick a happy medium. Even Nike Better World suffers when the viewport’s vertical axis stretches beyond the height of the background images.

I asked W+K how it decides which effects to power with JavaScript and which are better suited to modern CSS techniques:

Key points that required complex interaction relied on JavaScript, while visual-only interactivity relied on CSS3. Additionally, fewer browsers support native CSS3 techniques, so items that were more important to cross-browser compatibility were controlled via JavaScript as well.

– W+K

This is a wonderful example of “real-world design.� So often we are bamboozled with amazing new CSS effects, and we make websites that sneer at older browsers. The truth is, for most commercial websites and indeed for websites like Nike Better World that target the biggest audience possible, stepping back and considering how best to serve your visitors is important.

W+K explains further:

We started by creating the best possible version, but always kept the needs of all browsers in mind. Interactive storytelling must balance design and technology to be successful. A great website usable in one or two browsers ultimately fails if you want to engage a wide audience.

– W+K

And Internet Explorer?!

IE was launched in tandem with the primary site. Only IE6 experienced challenges, and as a deprecated browser, it gracefully degrades.

– W+K

The Final Code

The code snippets in this piece hopefully go some way to explaining the techniques required for a parallax scrolling effect. You can extend them further to scroll multiple elements in a section at different speeds, or even scroll elements sideways!

Feel free to grab the full source code from GitHub, and adapt it as you see fit. Don’t forget to let us know what you’ve done, so that others can learn from your work.

Of course, remember that manipulating huge images and multiple sprites with JavaScript can have huge performance drawbacks. As Keith Clark recently tweeted:

KeithClark1 in Behind The Scenes Of Nike Better World

Test, test and test again. Optimize your images, and be aware that you may have to compromise to support all browsers and operating systems.

Tell A Story

Above and beyond the technical wizardry of parallax websites — some of the best of which are listed below — the common thread that each seems to embody is story. That’s what makes them great.

I asked W+K what it learned from the project:

That a strong voice, simplicity and beauty are integral to a great interactive storytelling experience. We often hear things like “content is king, and technology is just a tool to deliver it,â€� but when you’re able to successfully combine a powerful message with a compelling execution, it creates an experience that people really respond to and want to spend time with.

– W+K

We really have just scratched the surface of the work that goes into a website like Nike Better World. The devil is in the details, and it doesn’t take long to see how much detail goes into both the design and development.

However, if you have a compelling story to tell and you’re not afraid of a little JavaScript and some mind-bending offset calculations, then a parallax website might just be the way to communicate your message to the world.

More Examples

Nike wasn’t the first and won’t be the last. Here are some other great examples of parallax scrolling:

Manufacture d’essai

Manufacture-dEssai in Behind The Scenes Of Nike Better World

Yebo Creative

Yebocreative in Behind The Scenes Of Nike Better World

TEDx Portland

TEDxPortland1 in Behind The Scenes Of Nike Better World

Ben the Bodyguard

BenTheBodyguard in Behind The Scenes Of Nike Better World

Campaign Monitor Is Hiring

CampaignMonitor1 in Behind The Scenes Of Nike Better World

Nizo App

Nizo1 in Behind The Scenes Of Nike Better World

7 Best Things of 2010

7BestThings in Behind The Scenes Of Nike Better World

If you’ve seen or built a great parallax website, please let us know about it in the comments.

… If you need any more encouragement to create a website as compelling as these, here’s what the team at W+K used to put together Nike Better World: MacBook Air 13″, Canon 5D Mark II, Coda, Adobe Photoshop and Adobe Illustrator.

Thanks

Putting together this article took the cooperation of a number of people. I’d like to thank Seth, Ryan, Ian and Duane for answering my questions; Katie Abrahamson at W+K for her patience and for helping coordinate the interview; and Nike for allowing us to dissect its website so that others could learn.

(al)


© Richard Shepherd for Smashing Magazine, 2011.


My Favorite Programming Mistakes

Advertisement in My Favorite Programming Mistakes
 in My Favorite Programming Mistakes  in My Favorite Programming Mistakes  in My Favorite Programming Mistakes

Over my programming career, I have made a lot of mistakes in several different languages. In fact, if I write 10 or more lines of code and it works the first time, I’ll get a bit suspicious and test it more rigorously than usual. I would expect to find a syntax error or a bad array reference or a misspelled variable or something.

Mwnt-beach in My Favorite Programming Mistakes

Coastline near Mwnt on the west coast of Wales. Read on to find out why this is halfway to being a very special place.

I like to classify these mistakes into three broad groups: cock-ups (or screw-ups in American English), errors and oversights.

A cock-up is when you stare blankly at the screen and whisper “Oops�: things like deleting a database or website, or overwriting three-days worth of work, or accidentally emailing 20,000 people.

Errors cover everything, from simple syntax errors like forgetting a } to fatal errors and computational errors.

When an error is so subtle and hard to find that it is almost beautiful, I would call it an oversight. This happens when a block of code is forced to handle a completely unforeseen and very unlikely set of circumstances. It makes you sit back and think “Wowâ€�: like seeing a bright rainbow or shooting star, except a bit less romantic and not quite as impressive when described to one’s partner over a candlelit dinner.

This article discusses some of the spectacular and beautiful mistakes I have made, and the lessons learned from them. The last three are my favorites.

Leaving Debug Mode On

The first two mistakes in this article were full-fledged cock-ups.

When I first started freelancing, I wrote a set of PHP libraries for handling database queries, forms and page templating. I built a debugging mode into the libraries at a fairly deep level, which depended on a global variable called $DEBUG.

I also kept a local copy of every major website I worked on, for developing, debugging and testing. So, whenever a problem occurred, I could set $DEBUG=1; at the top of the page, and it would tell me various things, such as all the database statements it was running. I rarely used this debug method on live websites; it was for local usage only.

Except for one day when I was working late at night, debugging a minor problem on a popular e-commerce website. I put $DEBUG=1; at the top of several pages and was switching between them. It was all a tired midnight blur, but in the end I somehow added the debugging variable to the most important page on the website, the one after the user clicks “Pay now,� and I uploaded it to the live website.

The next morning, I went out early for the whole day. I got home at 9:00 pm to find 12 increasingly frustrated messages on my answering machine and a lot more emails. For about 20 hours, whenever a customer clicked pay, they saw something like this:

Debug-mode-db-statements in My Favorite Programming Mistakes

What customers saw when they clicked “Pay.�

It took me about 10 seconds to fix, but a lot longer to apologize to my client for a day’s worth of lost orders.

Lessons Learned

I held an internal inquiry into this issue and established the following:

  1. Avoid working late at night;
  2. Make a full test order whenever I make a change to the order processing, however minor;
  3. Make sure debug statements never see the light of day on a live website;
  4. Provide some emergency contact details for me and/or a back-up programmer.

Thoughtful Debugging

For the third requirement, I implemented a couple of functions like this, to make sure that debugging messages are outputted only when I am looking at the website:

function CanDebug() {
 global $DEBUG;
 $allowed = array ('127.0.0.1', '81.1.1.1');
 if (in_array ($_SERVER['REMOTE_ADDR'], $allowed)) return $DEBUG;
 else return 0;
}
function Debug ($message) {
  if (!CanDebug()) return;
  echo '<div style="background:yellow; color:black; border: 1px solid black;';
  echo 'padding: 5px; margin: 5px; white-space: pre;">';
  if (is_string ($message)) echo $message;
  else var_dump ($message);
  echo '</div>';
}

Then, whenever I want to output something for debugging, I call the Debug function. This calls CanDebug to check the requesting IP address and the $DEBUG variable. The $allowed array contains my IP address for local testing (127.0.0.1) and my broadband IP address, which I can get from WhatIsMyIPAddress.com.

Then I can output things like this:

$DEBUG = 1;
Debug ("The total is now $total"); //about a debugging message
Debug ($somevariable); //output a variable
Debug ("About to run: $query"); //before running any database query
mysql_query ($query);

And I can be confident that no one but me (or anyone sharing my IP address, such as my boss) will ever see any debugging messages. Assuming that the variables above were set, the above code would look like this:

Debug-yellow in My Favorite Programming Mistakes

Outputting debugging statements.

For extra safety, I could have also put the error messages inside HTML comments, but then I would have had to sift through the HTML source to find the bit I was looking for.

I have another related useful bit of code that I can put at the top of a page or configuration file to ensure that all PHP notices, warnings and errors will be shown to me and only me. If the person is not me, then errors and warnings will be outputted to the error log but not shown on screen:

if (CanDebug()) {ini_set ('display_errors', 1); error_reporting (E_ALL);}
else {ini_set ('display_errors', 0); error_reporting (E_ALL & ~E_NOTICE);}

Debuggers

The method above is useful for quickly finding errors in very specific bits of code. There are also various debugging tools, such as FirePHP and Xdebug, that can provide a huge amount of information about a PHP script. They can also run invisibly, outputting a list of every function call to a log file with no output to the user.

Xdebug can be used like this:

ini_set ('xdebug.collect_params', 1);
xdebug_start_trace ('/tmp/mytrace');
echo substr ("This will be traced", 0, 10);
xdebug_stop_trace();

This bit of code logs all function calls and arguments to the file /tmp/mytrace.xt, which will look like this:

Xdebug-example in My Favorite Programming Mistakes

Contents of an Xdebug stack trace showing every function call.

Xdebug also displays much more information whenever there is a PHP notice, warning or error. However, it needs to be installed on the server, so it is probably not possible in most live hosting environments.

FirePHP, on the other hand, works as a PHP library that interacts with an add-on to Firebug, a plug-in for Firefox. You can output stack traces and debugging information directly from PHP to the Firebug console — again, invisible to the user.

For both of these methods, a function like CanDebug above is still useful for making sure that not everyone with Firebug can view the stack traces or generate big log files on the server.

Turning Debug Mode Off

Debugging emailing scripts is more involved. Definitively testing whether a script is sending an email properly is hard without actually sending the email. Which I once did by mistake.

A few years ago, I was asked to create a bulk emailing script to send daily emails to over 20,000 subscribed users. During development, I used something similar to the CanDebug function above, so that I could test the emailing script without actually sending an email. The function to send emails looked something like this:

function SendEmail ($to, $from, $subject, $message) {
  if (CanDebug() >= 10) Debug ("Would have emailed $to:\n$message");
  else {
    if (CanDebug()) {$subject = "Test to $to: $subject"; $to = "test@test.com";}
    mail ($to, $subject, $message, "From: $from");
  }
}

If I set $DEBUG=1, it would send the emails (all 20,000 of them) to a test address that I could check. If I set $DEBUG=10, it would tell me that it was trying to send an email but not actually send anything.

Soon after launch, a problem arose with the script. I think it ran out of memory from doing some inefficient processing 20,000 times. At some point, I went into fix something, forgot to set my $DEBUG variable (or else my broadband IP address had inconveniently changed) and mistakenly emailed 20,000 people.

I apologized to the agency I was working for, but thankfully nothing much came of it. I guess that spam filters blocked many of the messages. Or perhaps the recipients were merely pleased that the email did not contain anything for them to do or read.

Lessons Learned

I was very glad that I just put “test� in the subject and message of the test email, and not some statement reflecting how frustrated I was getting at that particular bug. I learned a few lessons:

  1. Be extra careful when testing bulk emailing scripts — check that the debug mode is working.
  2. Send test emails to as few people as possible.
  3. Always send polite test messages, like “Please ignore, just testing.â€� Don’t say something like “My client is a ninny,â€� in case it gets sent to 20,000 unsuspecting investors.

PHP Blank Page

Now we’re in the realm of hard-to-spot errors, rather than cock-ups. If you’d like to see a hard-to-debug error in PHP, bury the following somewhere deep in your code:

function TestMe() {TestMe();}
TestMe();

Depending on the browser and the server’s Apache and PHP versions, you might get a blank page, a “This Web page is not available,â€� a fatal error due to running out of memory, or the option to “Saveâ€� or “Openâ€� the page, like this:

Test-save-as in My Favorite Programming Mistakes

Infinite recursion, as dealt with by Firefox 3.6.

It basically causes infinite recursion, which can cause a Web server thread to run out of memory and/or crash. If it crashes, a small trace may or may not be left in the error log:

[Mon Jun 06 18:24:10 2011] [notice] child pid 7192
  exit signal Segmentation fault (11)

But this gives little indication of where or why the error occurred. And all of the quick debugging techniques of adding lines of output here or there may not help much, because as long as the offending code gets executed, the page will seem to fail in its entirety. This is mainly because PHP only periodically sends the HTML it generates to the browser. So, adding a lot of flush(); statements will at least show you what your script was doing immediately before the recursive error.

Of course, the code that leads to this error might be much more convoluted than the above. It could involve classes calling methods in other classes that refer back to the original classes. And it might only happen in certain hard-to-duplicate circumstances and only because you’ve changed something else somewhere else.

Lessons Learned

  1. Know the locations of error log files, in case something gets recorded there.
  2. This is where stack-tracing debuggers such as Xdebug can be really handy.
  3. Otherwise, set aside plenty of time to go through the code line by line, commenting out bits until it works.

Wrong Variable Type

This error often happens with databases. Given the following SQL statements…

CREATE TABLE products (
  id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(60),
  category VARCHAR(10),
  price DECIMAL(6,2)
);
INSERT INTO products VALUES (1, 'Great Expectations', 'book', 12.99);
INSERT INTO products VALUES (2, 'Meagre Expectations', 'cd', 2.50);
INSERT INTO products VALUES (3, 'Flared corduroys', 'retro clothing', 25);

… can you guess what is returned when you run the following?

SELECT * FROM products WHERE category='retro clothing';

The answer is nothing, because the category column is only 10 characters long, and so the category of the last product is cut off at retro clot. Recently edited products or new menu items suddenly disappearing can create a lot of confusion. But fixing this is generally very easy:

ALTER TABLE products MODIFY category VARCHAR(30);
UPDATE products SET category='retro clothing' WHERE category='retro clot';

Database-col-error in My Favorite Programming Mistakes

The category has been cut off after 10 characters, as shown in phpMyAdmin.

I made a more serious error with the first major e-commerce website that I worked on. At the end of the ordering process, the website would ask the customer for their credit card details and then call a Java program, which would send a request to Barclays ePDQ system to take the payment. The amount was sent as the number of pence. Not being very familiar with Java, I based the code on an example I found, which represented the total as a short integer:

short total;

The Java program was called on the command line. If it returned nothing, then the transaction was considered successful, emails were sent, and the order was fulfilled. If there was an error in processing the card, the program returned something like “Card not authorized� or “Card failed fraud checks.�

Short integers can store a value between -32768 and +32767. This seemed plenty to me. But I neglected that this was in pence, not pounds, so the highest possible total was actually £327.67. And the really bad news was that if the amount was higher than that, then the Java program simply crashed and returned nothing, which looked exactly like a successful order and was processed as normal.

It took a few months and several large unpaid transactions before the error was spotted, either by the accounting department or a vigilant and honest customer. I believe they recovered all of the payments in the end.

Lessons Learned

  1. When assigning a type to a database column or variable, be generous and flexible, and try to plan ahead.
  2. Make sure that a program succeeding responds differently to a program crashing.

1p Errors

Among my favorite mistakes are those that cause a discrepancy of just 1 pence (or cent, öre or other denomination). I like them because they are usually very subtle and hard to trace and often boil down to a rounding error. I have to become a mathematical detective, a job that I would readily do if enough work was available.

For a website a few years ago, I needed to create a quick JavaScript function to output a monetary amount. I used this:

<script type="text/javascript">
function GetMoney (amount) {return Math.round (amount * 100) / 100;}
</script>

However, it was quickly discovered that amounts like 1.20 were displayed as 1.2, which looks unprofessional. So, I changed it to this:

<script type="text/javascript">
function GetMoney (amount) {
  var pounds = Math.floor (amount);
  var pence = Math.round (amount * 100) % 100;
  return pounds + '.' + (pence < 10 ? '0' : '') + pence;
}
</script>

The main difference is the extra 0 in the last line. But now that the pence is computed separately, the modulus % operator is needed to get the remainder when the amount is divided by 100. Try to spot the unlikely circumstances under which this code would cause an error.

It happened on a website that sold beads. I have since learned that beads can be sold in a huge range of amounts and configurations, including customized mixes containing fractional quantities. Once, a customer bought 1.01 of an item costing £4.95, and ended up paying just £4.00. This is because the amount was passed as 4.9995. The rounded pence was 100, and % 100 left 0 pence, and so the pounds were floored to 4.

Beads-getmoney in My Favorite Programming Mistakes

A subtle rounding error on Beads Unlimited‘s website, where 101 beads sold at £4.95 per 100 were billed as £4 instead of £5.

This is still just a rounding error, a superset of 1p errors. I made a quick change to fix it:

<script type="text/javascript">
function GetMoney (amount) {
  var pounds = Math.floor (amount);
  var pence = Math.floor (amount * 100) % 100;
  return pounds + '.' + (pence < 10 ? '0' : '') + pence;
}
</script>

This wasn’t a great fix, though, because it rounded £4.9995 down to £4.99, which put it out of sync with any corresponding server-side calculations. But even more dramatically, when someone ordered 0.7 of something costing £1.00, it ended up displaying 69p instead of 70p! This is because floating-point numbers like 0.7 are represented in binary as a number more like 0.6999999999999999 (as described in a recent Smashing Magazine article), which would then be floored to 69 instead of rounded up to 70.

This is a true 1p error. To fix this, I added another rounding at the beginning:

<script type="text/javascript">
function GetMoney (amount) {
  var pence = Math.round (100 * amount);
  var pounds = Math.floor (pence / 100);
  pence %= 100;
  return pound + '.' + (pence < 10 ? '0' : '') + pence;
}
</script>

Now, I had four fairly complicated lines of code to do one very simple thing. Today, while writing this article, I discovered a built-in Javascript function to handle all of this for me:

<script type="text/javascript">
function GetMoney (amount) {return amount.toFixed (2);}
alert (GetMoney (4.9995) + ' ' + GetMoney (0.1 * 0.7));
</script>

Discounting With PayPal

PayPal is a 1p error waiting to happen. Many websites offer voucher codes that give a percentage off each order, computed at the end of the order. If you ordered two items costing 95p, the subtotal would be £1.90, and you would receive a 19p discount, for a total of £1.71.

However, PayPal does not support this type of discounting. If you want PayPal to display the items in your shopping basket, you have to pass each one separately with a price and quantity:

<input name="item_name_1" type="hidden" value="My Difficult Product" />
<input name="amount_1" type="hidden" value="0.99" />
<input name="quantity_1" type="hidden" value="1" />

Thus, you have to discount each item separately. 10% off of 95p leaves 85.5p. PayPal doesn’t accept fractional amounts, so you have to round up to 86p, for a grand total of £1.72 in PayPal, or round down to 85p, for a total of £1.70.

To solve this, I had to also make the website discount each item individually. Instead of just doing 10% × £1.90, it accumulates the discount item by item, using a whole amount of pence each time. Assuming $items is a PHP array of order item objects:

$discount = 0; $discountpercent = 10;
foreach ($items as $item) {
 $mydiscount = floor ($item->price * $discountpercent) / 100;
 $item->priceforpaypal = $item->price - $mydiscount;
 $discount += $mydiscount * $item->quantity;
}

Lessons Learned

  1. Don’t reinvent the wheel, even very small wheels that look easy from the outside.
  2. If you get a 1p discrepancy, check where and how numbers are rounded.
  3. Avoid representing prices using floats when possible. Instead, store the pence or cents as integers; and in databases, use a fixed-point type like DECIMAL.

Daylights Savings

I would not call the last two mistakes in this list “errors.â€� They require a very specific set of fairly rare circumstances, so they are more “oversightsâ€� on the programmer’s part. Oversights are like the acts of terrorism that are excluded by home insurance policies. They go beyond what a programmer could reasonably be expected to think of in advance.

Can you guess what is wrong with the following seemingly innocuous line of code, which selects orders that were completed more than one week ago?

mysql_query ("SELECT * FROM orders WHERE completeddate < '" .
  date ('Y-m-d H:i:s', (time() - 7 * 86400 + 600)) . "'")

I used a similar line in a system for a weekly repeating order. It looked up orders that were completed last week, duplicated them, and processed them for the current week. 86,400 is the number of seconds in a day, so time() - 7 * 86400 was exactly one week ago, and +600 gives it a leeway of 10 minutes.

This was a low-budget method of implementing repeating orders. Given more time, I would have created a separate table and/or shopping basket to differentiate between repeating and non-repeating items. As it happened, this code worked well for several months and then mysteriously failed in late March.

It took ages to recover from the oversight and to process those orders manually. And even longer to find the reason, especially because I had to fool the whole website into thinking that it was a different date.

I’ve pretty much given the trick away in the title of the section: I forgot to account for daylight savings, when one week is less than 7*86400 seconds.

Compare the following three ways of getting the date exactly one week ago. The last is the most elegant. I only recently discovered it:

$time = strtotime ('28 March 2011 00:01');
echo date ('Y-m-d H:i:s', ($time - 7 * 86400)) . '<br/>';
echo date ('Y-m-d H:i:s', mktime (date ('H', $time), date ('i', $time), 0,
  date ('n', $time), date ('j', $time) - 7, date ('Y', $time)));
echo date ('Y-m-d H:i:s', (strtotime ('-1 week', $time))) . '<br/>';

Lessons Learned

Drawing general lessons from a mistake like this is difficult, but there is a specific lesson here:

  1. On websites that repeat things, remember to consider time zones and daylight savings.
  2. Consider storing all times and dates in UTC (Coordinated Universal Time).
  3. Don’t reinvent the time wheel either: strtotime is a powerful function.

The next time I do a website for repeating orders, I won’t make that mistake.

Spam Error

My favorite mistake of all time is an even subtler oversight. Can you spot what is unusual about these made-up email addresses:

  • beckyrsmythe@somewhere.com
  • glynnfrenk@someplace.co.uk

A few years ago, spammers began targeting contact forms on websites, injecting headers and forcing the forms to send millions of messages to harvested addresses and later just to the form’s usual recipient.

This necessitated anti-spam filtering directly on the Web page that processed the form. When I was first asked to do this, I combined a few anti-spam scripts that I found on the Internet. Spammers now often put blocks of random letters in their messages to try to fool spam filters. So, one anti-spam technique is to check for these random letters by looking for certain consonants in a row.

I read somewhere that words with more than six consonants in a row are extremely rare in Latin-alphabet languages. The most consonants in a row in English is six: in “latchstring.� Other languages like Polish have many more diphthongs than English (dz, sz, cz), so I used seven to be on the safe side. The PHP code uses a regular expression and looks something like this:

foreach ($_POST as $key=>$val) {
        if (preg_match ('/[bcdfghjklmnpqrstvwxyz]{7,}/i', $val))
                die ("<h1>Spam Detected</h1><p>Too many consonants in $val</p>");
}

I had to revisit the script when it blocked someone with an email address like the ones above:

Spam-error in My Favorite Programming Mistakes

A customer whose email address had seven or more consonants in a row would have received this upon submitting a form.

Based on a small sample of 10,000, I found that approximately 0.2% of all email addresses would be filtered as spam, according to the rule above. One valid email address had nine consonants in a row. Increasing the number of allowed consonants from seven to ten decreases the script’s usefulness significantly, so instead I considered the letter “yâ€� a vowel.

This worked well, until a customer from Cwmtwrch near Swansea attempted to place an order. According to my sample, only 1 in 5000 customers have a name, email or address like this. Small but important, especially if you are one of them. So, I allowed “w� as a vowel, too. You can check for this in your own customer database with a MySQL query like the following:

SELECT CONCAT_WS(' ',firstname,lastname,email,city,address1,address2) AS thefields
FROM visitors HAVING LENGTH(thefields)>20 AND thefields RLIKE '[bcdfghjklmnpqrstvwxz]{7,}'

Lessons Learned

I learned that my anti-spam script was blocking potential customers only once my client forwarded me their complaints. When I received the first one (an email address containing a couple of “yâ€�s for vowels), I was amazed. It seemed so unlikely. A couple of weeks later, when shoppers in a small Welsh village were still mysteriously unable to place an order, I almost didn’t believe it. It seems that if a piece of code has a hole, someone somewhere will fall into it. So, I’ve learned to do the following:

  1. Take all error reports and complaints seriously. They may uncover something amazing like this.
  2. Jot down the really unlikely mistakes. You will impress other programmers… or me, at least

More specifically, logging everything that is processed by a spam filter is useful, because you can then try to spot any false positives or false negatives and use them to improve the filter.

Conclusion

Programming mistakes come in many shapes and sizes. This article has ranged from the very obvious cock-ups to the extremely subtle oversights. And it looks like they all support Murphy’s Law: if something can go wrong, it will.

However, for every mistake found, reported and fixed, probably a few more aren’t. Either they aren’t found (because they are so incredibly subtle that the set of circumstances that would cause them has never happened) or they aren’t reported (because most users don’t bother reporting errors — which is why any error reports that do come in should be taken seriously) or they aren’t fixed (because doing so would be too time-consuming or expensive).

Mistakes are also more likely to be found on popular websites, mainly because so many more people are putting those websites to work, but partly because fixing one mistake could cause another somewhere else.

The best lessons, therefore, are to plan ahead and to debug thoughtfully.

(kw)


© Paul Tero for Smashing Magazine, 2011.


Adaptive Web Design (Book review)

You have likely seen the term �progressive enhancement� quite a lot, especially if you�re a regular reader of this website. But do you understand exactly what it means, and do you try to apply it in every detail of your daily work?

If the answer is no, you�re far from alone. The last couple of years, with HTML5 (or more correctly parts of CSS3 and various JavaScript techniques) becoming the new Ajax, it seems that people are so eager to apply the shiny new front-end toys that they forget that the Web is supposed to be universal. And in doing so, many developers unfortunately forget about or ignore progressive enhancement.

One reason may be that there aren�t enough resources that explain progressive enhancement in a practical and easy-to-digest way. Luckily, Aaron Gustafson has written a book called Adaptive Web Design that does just that.

Read full post

Posted in , , , , .

Copyright © Roger Johansson



  •   
  • Copyright © 1996-2010 BlogmyQuery - BMQ. All rights reserved.
    iDream theme by Templates Next | Powered by WordPress