Excel Compatible HTML

For the last couple of weeks I’ve been talking about adding an Excel export feature to one of my C#-based products. If you missed the previous posts, you might want to read the introduction, and last week’s What I did; an account of the options I chose, combined with an honest assessment of the decision process.

The bulk of the export feature was eventually provided by a third party component, xPort Tools, but I also used Excel compatible HTML files for one of the exports. Today, I’m going to finish off the series by explaining how to use HTML to output formatted data to Microsoft Excel.

It’s quite easy to get started. After all, Excel will just read HTML files, so as long as you put a table in there, you’ll probably do fine just opening it in Excel. But that would leave me with a very short post...

If you want to use the Microsoft Office and Excel attributes and elements in your HTML file, you need to start off by importing a couple of namespaces. The style properties and attributes from those namespaces will let you take advantage of a lot of Excel features. With the namespaces, a style block and a table, your template page will look something like this:-

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<style>
...CSS Styles go here
</style>
</head>

<body>
<table>
...Your table goes here
</table>
</body>
<html>

Adding Styles

Building the table will be a lot easier if you define all the styles you’re going to need and add them to the style block. Note that if you modify your file in Excel and then save your changes, your style names won’t be preserved.

Through a couple of additional style properties, Excel has a hierarchical style structure, which usually begins with a style called style0. Style0 contains the default style for the cells in the table, and is identified by the property "mso-style-id:0;". Subsequent styles can then indicate their parent style with the property "mso-style-parent: style;", where style is the name of the parent style. Style0 seems to be duplicated as the default style for the td element, so that these files can be opened in a browser and viewed as regular HTML. This might leave you with a style block something like this:-

.style0
{
mso-style-name:Normal;
mso-style-id:0;
}
td
{
mso-style-parent:style0;
}
.myTopStyle
{
mso-style-parent:style0;
border-top:.5pt solid black;
}

I’ve left out most of the style content, but you can use standard CSS properties (font-size, font-family, border, color, text-decoration etc) to build up detailed styles for your cells. One of the more useful Excel style properties is mso-number-format. You can use this so that your files will open in Excel with the correct cell formatting set for numeric data. I just used the defaults, so I don’t know exactly how they break down, but here are a couple of examples:-

mso-number-format:"dd\/mm\/yyyy\;\@"; ‘Short Date Format
mso-number-format:"\0022£\0022\#\,\#\#0\.00"; ‘UK Currency Format

Building the table

As I said earlier, Excel will read HTML tables, so any table you build with the standard table, tr, and td elements should just open. A simple cell with one of your styles applied would just be:-

<td class=myTopStyle>&nbsp;</td>

The &nbsp; is vital if you want the style to show up in an otherwise empty cell, but if you just want a completely empty, style-free cell, you can leave it out.

Merging cells horizontally or vertically can be done with the colspan and rowspan attributes respectively, the text-align and vertical-align style properties can be used in your style block to control the alignment. Note that, if you put text in an Excel cell, it will usually be visible no matter how long it is... excel will automatically show it regardless of the cell boundaries. Your browser won’t, so when the text is going to be longer than the size of the cell, you need to use colspan if you want to view it in a browser.

<td class=bigTitleFont colspan=3>This is my long title</td>

There’s another useful mso- style property for cases like this where you might not want cells to be merged when you open the file in Excel: mso-ignore. If you add "mso-ignore: colspan" to the style attribute of the cell, it will be merged correctly in the browser, but behave as normal in Excel.

This property is also useful for reducing the size of your html files: if you have a series of cells with the same style, or a series of empty cells, you can use colspan without getting unnecessarily merged cells when you open the file in Excel.

Number and Formula Cells

If your cell contains numeric data, you can flag this to Excel with the x:num attribute. For example:-

<td class=myNumberStyle x:num>38500</td>

That’s fine if you want the number to show as "38500", but if you want to see meaningful formatting data in both Excel and a browser, you need to go a little bit further. You need to set the value of the x:num attribute, and then format the cell content for display in the browser. The mso-number-format style property will handle the formatting for you in Excel:-

<td class=myCurrencyStyle x:num="38500">$38,500.00</td>
<td class=myDateStyle x:num="38500">05/28/2005</td>

Adding a formula is no harder. The x:fmla attribute can be used to set the formula for the cell:-

<td x:fmla="=SUM(A1:A5)">1000</td>

You need to supply the current result of the formula as the content value for the cell (1000 in the example above), if you want to view the file in a browser. When you open the same file in Excel, the formula you specify will be used to calculate the value of the cell.

I hope there’s enough here to get you started if you ever need to produce an Excel compatible HTML file. If you have any questions, I’ll try to answer them. If you have any suggestions, I’ll try to learn from them....

nice!

this is interesting stuff. so armed with this information, you wouldn't need a 3rd party product to generate an html document with Excel styles.

That's right

That's right, excel versions 2000 and higher can read and write these html files.

There's an xml based specification out there too, which I think is supported in 2002 and higher, but I didn't look into that too much.

I ended up using the third party component so I could output Excel files that could be read by other software that reads .xls files, but if it's specifically for Excel, or if you need formatting and formulas, I think this is a great option. The fact that you can read it in a browser is a bonus too.