How to transform a RSS feed in HTML in 3 simple steps

You are welcome to extract excerpt of this article, but do not copy it entirely on your blog/web site.
You are only allowed to use this post content in an printed article or a newsletter by clearly exposing the source URL of this article.

How to transform a RSS feed in HTML in 3 simple steps

I have been asked recently if I knew a simple way of transforming an RSS feed in html via PHP.
This procedure is indeed fairly simple, and can done a serie of small steps that I'm going to present here.

Namely, what you will need, is :

The PHP code I will present you here is geared towards PHP5. It will probably NOT work on PHP4.
Get used to it, PHP5 is soon at the end of his life, PHP6 is on the way.
If you are still stuck on PHP4, change your host.

The most elegant part of this method, is that you can change the way you want to present your HTML simply by updating your XSL file.
Thus, even a designer would be able to alter it, without having to change the PHP code a iota.

I've seen many peoples thinking "this is overkill" or "I'm not advanced enough", but trust me, it's false.
It's even plain simple.

Let me show you the way....

1. Structure of an RSS XML feed

The rss specifications tells us about the mandatory structure of an RSS feed.
To simplify it, I will stick to the version 2.0 of the RSS specifications.
You can find a description of the mandatory and optional elements at the Harvard University definition page [ http://cyber.law.harvard.edu/rss/rss.html ]

We will focus on the mandatory elements, and forgot the optional for now, but at the end of this tutorial, you will be able to extend the examples I give you here to include them as well.

So,the mandatory elements are:

This mean that an RSS feed can be as minimal as this: [ The RSS XML datas ]

And voila, you have a basic XML representation of the RSS feed.
Now, we are going to write an XSL style sheet, to convert every elements into HTML to be included in a web page.

2. The XSL style sheet

What are an XSL style sheet and an XSL engine ?

If you are not confident with the usage of XSL, you can take a look at the reference section, it will give you a serie of links to tutorials and references.

First thing first, what are an XSL engine and an XSL style sheet ?
An XSL engine is a program that is directed at the transformation of XML datas into something else. It works by parsing the XML datas accordingly to a style sheet, and extract values from the XML to replace them into the XSL file.

In simpler terms, the XSL style sheet acts as a template, which instruct the XSL engine where to take datas to put inside the output file.

There are 2 different approach for this parsing that exists.
The first, and most common, is to load a representation of the datas, and use a query language named XPATH to locate datas in the tree.
This works well to small sized XML files.
The second way is to parse the XML datas, from top to bottom, and send events when a node is reached. This is more efficient with big XML files, because it avoid the need to pars the whole file and to map it in memory.

We will work with the first approach in this tutorial.

Writing our first XSL style sheet

Now, as the style sheet is just going to be a simple html template, we will build our style sheet.
This is what we will use: [ The XSL style sheet to transform the XML into html ]

Let's take a closer look at this XSL style sheet, will you ?

The XSL style sheet have some mandatory elements, namely it's header. Don't try to fiddle much with them, except for the output method or the doctypes.

I will recapitulate here the main (or those I use the most) XSL elements:

Those are the basic element you will need to format your RSS feed.
If you want to go further, I invite you to take a look at the tutorials and references listed below.

So, what this style sheet does is simply parsing the <channel> element, and create an <h1> and <h2> elements with the site description and url, and then iterate through every <item> element and create a div for each stories, with an <h3> for the story link and the text of the story below.

Debugging your style sheet

There is a really simple way of debugging your style sheet.

  1. Save your RSS in a file on your desktop as "rss.xml".
  2. Edit it, and and the line
    <?xml-stylesheet href="rss.xsl" type="text/xsl"?>
    on the second line, just below the <?xml...
  3. Save your style sheet as "rss.xsl" in the same directory than your xml file
  4. Open IE or Firefox, and drop your rss.xml file on it.

You will use your browser XSL engine to do the transformation, and it will allow you to check your style sheet before uploading it.

The transformation: Using PHP xsl engine to create the HTML on the server

Now that your XSL style sheet is done, we will leverage the PHP xsl engine to transform the feed into html to be include.
This is really the simplest part.

You can see the php file here: [ The PHP transformation script ]
And a demo of the script in action: [ Demo of the rss converter ]
What this script does is simply apply the XSL style sheet on the XML feed, and return the datas in HTML.

Keep in mind that this is done "live", and that the function will take some time to run.
Namely, it will take a bit more time than it need to load the XML datas.

If you have some traffic, it would be good to cache the value returned by the function in a database, and to update it only from time to time. Or it will use your bandwidth and slow down your site.

Conclusion

Now you have seen that the transformation of a RSS feed into html is plain easy with a bit of XSL. You can extend the style sheet to incorporate any optional elements you want.

I hope this tutorial wasn't to boring nor too long, and if you have more questions about it, don't hesitate to ask.

References:

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to my feed and get articles like this delivered automatically each day to your feed reader.

Trackbacks & Pingbacks

No trackbacks/pingbacks yet.

Comments

Thanks a lot for this, much appreciated ;)

Dan

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)