YQL: Using Web Content For Non-Programmers

Dec/10

YQL: Using Web Content For Non-Programmers

by Christian Heilmann under Feeds, Web Design

in YQL: Using Web Content For Non-Programmers

Building a beautiful design is a great experience. Seeing the design break apart when people start putting in real content, though, is painful. That’s why testing it as soon as possible with real information to see how it fares is so important. To this end, Web services provide us with a lot of information with which to fill our products. In recent years, this has been a specialist’s job, but the sheer amount of information available and the number of systems to consume it makes it easier and easier to use Web services, even for people with not much development experience.

On Programmable Web, you can find (to date) 2580 different application programming interfaces (or APIs). An API allows you to get access to an information provider’s data in a raw format and reformat it to suit your needs.

Programmable in YQL: Using Web Content For Non-Programmers

The Trouble With APIs

The problem with APIs is that access to them varies in simplicity, from just having to load data from a URL all the way up to having to authenticate with the server and give all kinds of information about the application you want to build before getting your first chunk of information.

Each API is based on a different idea of what information you need to provide, what format it should be in, what data it will give back and in what format. All this makes using third-party APIs in your products very time-consuming, and the pain multiplies with each one you use. If you want to get photos from Flickr and updates from Twitter and then show the geographical information in Twitter on a map, then you have quite a trek ahead.

Simplifying API Access

Yahoo uses APIs for nearly all of its products. Instead of accessing a database and displaying the information live on the screen, the front end calls an API, which in turn gets the information from the back end, which talks to databases. This gives Yahoo the benefit of being able to scale to millions of users and being able to change either the front or back end without disrupting the other.

Because the APIs have been built over 10 years, they all vary in format and the way in which you access them. This cost Yahoo too much time, which is why it built Yahoo Pipes — to ease the process.

Large view

Pipes is amazing. It is a visual way to mix and match information from the Web. However, as people used Pipes more, they ran into limitations. Versioning pipes was hard; to change the functionality of the pipe just slightly, you had to go back to the system, and it tended to slow down with very complex and large conversions. This is why Yahoo offers a new system for people’s needs that change a lot or get very complex.

YQL is both a service and a language (Yahoo Query Language). It makes consuming Web services and APIs dead simple, both in terms of access and format.

Retrieving Data With YQL

The easiest way to access YQL is to use the YQL console. This tool allows you to preview your YQL work and play with the system without having to know any programming at all. The interface is made up of several components:

Large view

The YQL statement section is where you write your YQL query.
YQL has a very simple syntax, and we’ll get into its details a bit later on. Now is the time to try it out. Enter your query, define the output format (XML or JSON), check whether to have diagnostics reporting, and then hit the “Test” button to see the information. There is also a permalink; click it to make sure you don’t lose your work in case you accidentally hit the “Back” button.
The results section shows you the information returned from the Web service.
You can either read it in XML or JSON format or click the “Tree view” to navigate the data in an Explorer-like interface.
The REST query section gives you the URL of your YQL query.
You can copy and paste this URL at any time to use it in a browser or program. Getting information from different sources with YQL is actually this easy.
The queries section gives you access to queries that you previously entered.
You can define query aliases for yourself (much as you would bookmark websites), get a history of the latest queries (very useful in case you mess up) and get some sample queries to get started.
The data tables section lists all the Web services you can access using YQL.
Clicking the name of a table will in most cases open a demo query in the console. If you hover over the link, you’ll get two more links — desc and src — which give you information about the parameters that the Web service allows and which show the source of the data table itself. In most cases, all you need to do is click the name. You can also filter the data table list by typing what you’re looking for.

Using YQL Data

By far the easiest way to use YQL data is to select JSON as the output format and define a callback function. If you do that, you can then copy and paste the URL from the console and write a very simple JavaScript to display the information in HTML. Let’s give that a go.

As a very simple example, let’s get some photos from Flickr for the search term “cat”:

select * from flickr.photos.search where text="cat"

Type that into the YQL console, and hit the “Test” button. You will get the results in XML — a lot of information about the photos:

Large view

Instead of XML, choose JSON as the output format, and enter myflickr as the callback function name. You will get the same information as a JSON object inside a call to the function myflickr.

Large view

You can then copy the URL created in the “REST query” field:

Large view

Write a JavaScript function called myflickr with a parameter data, and copy and paste the URL as the src of another script block:

<script>
  function myflickr(data){
    alert(data);
  }
</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20tex
t%3D%22cat%22&format=json&env=store%3A%2F%2Fdatatables.org
%2Falltableswithkeys&callback=myflickr"></script>

If you run this inside a browser, the URL you copied will retrieve the data from the YQL server and send it to the myflickr function as the data parameter. The data parameter is an object that contains all the returned information from YQL. To make sure you have received the right information, test whether the data.query.results property exists; then you can loop over the result set:

<script>function myflickr(data){
  if(data.query.results){
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      alert(photos[i].title);
    }
  }
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22
&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

You can easily get the structure of the information and know what is loop-able by checking the tree view of the results field in the console:

Datatree in YQL: Using Web Content For Non-Programmers

Right now, all this does is display the titles of the retrieved photos as alerts, which is nothing but annoying. To display the photos in the right format, we need a bit more — but no magic either:

<div id="flickr"></div>
<script>function myflickr(data){
  if(data.query.results){
    var out = '<ul>';
    var photos = data.query.results.photo;
    for(var i=0,j=photos.length;i<j;i++){
      out += '<li><img src="http://farm' + photos[i].farm +
             '.static.flickr.com/' + photos[i].server + '/' + photos[i].id +
             '_' + photos[i].secret + '_s.jpg" alt="' + photos[i].title +
             '"></li>';
    }
    out += '</ul>';
  }
  document.getElementById('flickr').innerHTML = out;
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20flickr.photos.search%20where%20text%3D%22cat%22&
format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
callback=myflickr"></script>

Put this into action and you’ll get photos of cats, live from Flickr and without having to go through any painful authentication process.

The complexity of the resulting HTML for display differs from data set to data set, but in essence the main trick remains the same: define a callback function, write it, copy and paste the URL you created in the console, test that data has been returned, and then go nuts.

Using YQL To Reuse HTML Content

One other very powerful use of YQL is to access HTML content on the Web and filter it for reuse. This is usually called “scraping” and is a pretty painful process. YQL makes it easier because of two things: it cleans up the HTML retrieved from a website by running it through HTML Tidy, and it allows you to filter the result with XPATH. As an example, let’s retrieve the list of my upcoming conferences and display it.

Go to http://icant.co.uk/ to see my upcoming speaking engagements:

Upcoming in YQL: Using Web Content For Non-Programmers

You can then use Firebug in Firefox to inspect this section of the page. Simply open Firebug, click the box with the arrow icon next to the bug, and move the cursor around the page until the blue border is around the element you want to inspect:

Large view

Right-click the selection, and select “Copy XPath” from the menu:

Large view

Go to the YQL console, and type in the following:

select * from html where url="http://icant.co.uk" and xpath=''

Copy the XPath from Firebug into the query, and hit the “Test” button.

select * from html where url="http://icant.co.uk" and
xpath='//*[@id="travels"]'

Large view

As you can see, this gets the HTML of the section that we want inside some XML. The easiest way to reuse this in HTML is by requesting a format that YQL calls JSON-P-X. This will return a simple JSON object with the HTML as a string. To use this, do the following:

Copy the URL from the REST field in the console.
Add &format=xml&callback=travels to the end of the URL.
Add this as the src to a script block, and write this terribly simple JavaScript function:

<div id="travels"></div>
<script>function travels(data){
  if(data.results){
    var travels = document.getElementById('travels');
    travels.innerHTML = data.results[0];
  }
}</script>
<script src="http://query.yahooapis.com/v1/public/yql?
q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Ficant.co.uk%22%20
and%20xpath%3D'%2F%2F*%5B%40id%3D%22travels%22%5D'&
diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&
format=xml&callback=travels"></script>

The result is an unordered list of my events on your website:

Yql-demo in YQL: Using Web Content For Non-Programmers

Debugging YQL Queries

Things will go wrong, and having no idea why is terribly frustrating. The good news with YQL is that you will get error messages that are actually human-readable. If something fails in the console, you will see a big box under the query telling you what the problem was:

Large view

Furthermore, you will see a diagnostics block in the data returned from YQL that tells you in detail what happened “under the hood.” If there are any problems accessing a certain service, it will show up there.

Large view

YQL Syntax

The basic syntax of YQL is very easy:

select {what} from {source} where {conditions}

You can filter your results, cut the information down only to the bits you want, paginate the results and nest queries in others. For all the details of the syntax and its nuances, check the extensive YQL documentation.

YQL Examples

You can do quite amazing things with YQL. By nesting statements in parentheses and filtering the results, you can reach far and wide across the Web of data. Simply click the following examples to see the results as XML documents. Copy and paste them into the console to play with them.

Search Google for the word “smashing”:
select * from google.search where q=”smashing”
Search Google for the word “smashing” but only return the titles and URLs:
select url,title from google.search where q=”smashing”
Grab the titles of the latest articles from the Smashing Magazine feed and translate them to French:
select * from google.translate where q in (select title from feed where url=”http://rss1.smashingmagazine.com/feed/”) and target=”fr”
Find the most recent mentions of the word “ipad” on Delicious and YouTube (using the Yahoo Firehose):
select * from social.updates.search where query=”ipad” and source in (“delicious”,”youtube”)
Show sushi restaurants in San Francisco. Get 50 results, but limit the amount to 20 and skip the first 10:
select * from local.search(50) where query=”sushi” and location=”san francisco, ca” limit 20 offset 10
Show photos taken in Paris, France (by defining Paris as a Where on Earth (WoE) ID from the geoplanet data set and then searching Flickr for photos with that WoE ID):
select * from flickr.photos.search where woe_id in (select woeid from geo.places where text=”Paris,France”)
Get the Twitter names of all speakers at SWDC-central.com, and then search Twitter for tweets mentioning them:
select text,from_user from twitter.search where q in (select content from html where url=”http://swdc-central.com/” and xpath=”//a[contains(.,'@')]“)

This is just a taste of the power of YQL. Check out some of my presentations on the subject.

YQL’s Limits

YQL has a few (sensible) limits:

You can access the URL 10,000 times an hour; after that you will be blocked. It doesn’t matter in our case because the blocking occurs per user, and since we are using JavaScript, this affects our end users individually and not our website. If you use YQL on the back end, you should cache the results and also authenticate to the service via oAuth to be allowed more requests.
The language allows you to retrieve information; insert, update and delete from data sets; and limit the amount of data you get back. You can get paginated data (0 to 20, 20 to 40 and so on), and you can sort and find unique entries. What you can’t do in the YQL syntax is more complex queries, like “Get me all data sets in which the third character in the title attribute is x,” or something like that. You could, however, write a JavaScript that does this kind of transformation before YQL returns the data..
You can access all open data on the Web, but if a website chooses to block YQL using the robots.txt directive, you won’t be allowed to access it. The same applies to data sources that require authentication or are hosted behind a firewall.

There Is More To YQL

This article covers how to use YQL to access information. If you have an interesting data set and want it to become part of the YQL infrastructure, you can easily do that, too. We’ll cover that in the next article.

Documentation and Related Links

(al)(vf)

kranthi

RSS feed for this post (comments) · TrackBack URI

BlogmyQuery – BMQ