.NET and Excel Importing

A quick tech tip this week, it’s not ground breaking, it might not even be new to many of you, but it really helped me out. I was looking to buy a component to add a simple Excel import facility to a project, and I had one of those "d’oh" moments.

Amongst all the components for sale, there were search results about using Jet, ISAMs, and OLEDB. I’ve used Office and Jet enough in the past to know that it makes reading an Excel file a straight forward process, but somehow I hadn’t realized that I could do it just as easily using OLEDB from a .NET application.

All I needed was the right connection string and select command, and I was able to read all of the data I wanted from Excel via a simple OleDbDataReader.

The Connection String

The basic connection string for an Excel 97/2000/XP .xls file is:-

connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=FileName;Extended Properties=\"Excel 8.0;HDR=Yes;\"";

Where Filename is the full path and name of the Excel file (surrounded by double quotes if it contains spaces), and HDR=Yes or HDR=No is used to indicate whether the first row of the worksheets contain the column names.

The SQL Statement

When the data source is an Excel workbook, the SQL Statement is used to indicate the worksheet and/or range that you want to get the data from. So a simple SQL statement would be:-

commandText = "Select * From [WorkSheet]";

Using the Code

This example shows the creation of the data connection, command and reader objects using the connection string and SQL:-

OleDbConnection conn = new OleDbConnection(connectionString);
OleDbCommand select = new OleDbCommand(commandText, conn);
conn.Open();
OleDbDataReader reader = select.ExecuteReader();

The OleDbDataReader can then be used to read data directly from the Excel file. Of course, this would be pretty useless if you had to know the name and structure of the Excel sheet, so fortunately there are a couple of easy ways to find out from the code.

Finding the Sheet & Range names

The OleDbConnection has a helpful GetOleDbSchemaTable method, which we can use to quickly determine the structure of the file. It returns a DataTable containing a row for each worksheet and named range. From the resulting table, we can find the name each object from the "TABLE_NAME" column:-

DataTable xlsSchema = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
string firstSheetName = (string)xlsSchema[0]["TABLE_NAME"];

That gives us a valid sheet or range name to use in the SQL command. Once we open the OleDbDataReader we can use another method to find out the structure of the individual sheets.

Finding the Column Names

The OleDbDataReader has a GetSchemaTable method we can use to get the column names from the chosen sheet. It also returns a DataTable, this time containing a row for every column. The "ColumnName" and "DataType" columns can be used to find the name and type of each column:-

DataTable sheetSchema = reader.GetSchemaTable();
string firstColumnName = (string)sheetSchema[0]["ColumnName"];

Putting it all together

To put everything together in the correct order, here’s a function which returns the contents of the first cell in the first sheet of the specified Excel file, in the format "Sheet Name: Column Name = 'Cell Value' (Data Type)".

public string GetFirstSheetFirstCellData(string excelFile)
{
	//Open the connection to the Excel file
	connectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + excelFile + "\";Extended Properties=\"Excel 8.0;HDR=Yes;\"";
	OleDbConnection conn = new OleDbConnection(connectionString);
	conn.Open();

	//Find out the Sheet & Range names
	DataTable xlsSchema = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
	string firstSheetName = (string)xlsSchema[0]["TABLE_NAME"];

	//Build the SQL command and open the data reader
	commandText = "Select * From [" + firstSheetName + "]";
	OleDbCommand select = new OleDbCommand(commandText, conn);
	OleDbDataReader reader = select.ExecuteReader();

	//Get the name and type of the first column
	DataTable sheetSchema = reader.GetSchemaTable();
	string firstColumnName = (string)sheetSchema[0]["ColumnName"];
	string firstColumnDataType = (string)sheetSchema[0]["DataType"];

	//If there are any rows returned, read the first cell value
	string firstCellValue = null;
	if (reader.HasRows)
	{
		reader.Read();
		firstCellValue = reader.GetValue(0).ToString();
	}
	else
		firstCellValue = "No Rows Returned";

	//Close and Dispose
	xlsSchema.Dispose();
	sheetSchema.Dispose();
	reader.Close();
	cmd.Dispose();
	conn.Close();
	conn.Dispose();

	return firstSheetName + ": " + firstColumnName + " = " + firstCellValue + " (" + firstColumnDataType + ")";
}

This should work as long as the Jet Engine and MDAC components are installed on the PC.

Conclusion

If you need a quick and easy way to read from Excel, try using Jet and OLEDB.

Cool ... Java programmers

Cool ... Java programmers who need to access Excel files should look at Apache's POI project.

POI

I've used POI pretty extensively due to DocSearcher. It is a Lucene-based indexing tool developed at the US Coast Guard and handles txt, doc, xls, html, pdf, and probably a few others.