Wednesday, October 11, 2006

Reading xml fast

Using the System.Diagnostics.StopWatch object, I've made an interesting experiment on how to read a 1mb xml file fast. For this experiment, I've made a small console application and for each of the different code examples, I then traverse the xml tree and write the values to the console. Last, I've tried a smaller file 5kb in order to see if the result is the same when working with smaller files.

The xml is quite simple and looks something like this:

  • xml
    • entry
      • name
      • adress
      • (...)

StreamReader

For a reference, I've started off with a simple stream read:



StreamReader reader = File.OpenText("c:\\testfile2.xml");
string input = null;
while ((input = reader.ReadLine()) != null)
{
Console.WriteLine(input);
}


And the result:
watch.Elapsed = {00:00:02.3910558}

The result is pretty much as expected, about two seconds to read all of the lines an put them on the screen.


XMLReader

Next up is the xml reader. It traverses the xml straight forward and reads all of the nodes.



FileStream stream = new FileStream("c:\\testfile2.xml", FileMode.Open);
XmlReader reader = new XmlTextReader(stream);
while(reader.Read())
{
Console.WriteLine(reader.Value);
}


And the result:
watch.Elapsed = {00:00:12.7904473}

12 seconds is about as expected. There's some overhead with finding the xml nodes, but it all seem pretty much like expected. If we would try this experiment over the internet or a slower file network, I believe that the xml reader overhead would not be that visible.


XPath

Next, we try the xPath aproach:



FileStream stream = new FileStream("c:\\testfile2.xml", FileMode.Open);
XPathDocument document = new XPathDocument(stream);
XPathNavigator navigator = document.CreateNavigator();
XPathNodeIterator node = navigator.Select("xml/entry");
for(...){...}


And the result:
watch.Elapsed = {00:03:29.5681325}


I knew it would take some time, but this is not acceptable. XPath is still a kind of favourite as it makes it possible to navigate the xml tree in a absolutely beautiful way.


DataSet

Next is the dataset approach.


DataSet ds = new DataSet();
ds.ReadXml("C:\\testfile2.xml");
foreach (DataTable tbl in ds.Tables)
{
foreach (DataRow dr in tbl.Rows)
{
for (...){...}
}
}


And the result:
watch.Elapsed = {00:00:03.6352829}

I'm a bit surprised. The dataset seems like the fastest way of traversing an xml file. Note that dataset navigation can be cumbersome when containing a lot of tables, like the one.


A smaller xmlfile

I've also tried the same aproaches to a 5kb xml file, here's the results and now it turns out XPath is the fastest method:


FileStream:
watch.Elapsed = {00:00:00.0144736}

XMLReader:
watch.Elapsed = {00:00:00.0302896}

Xpath:
watch.Elapsed = {00:00:00.0151563}

Dataset:
watch.Elapsed = {00:00:00.0225523}



kick it on DotNetKicks.com



17 comments:

Joel Lucsy said...

I'm curious why you didn't try something like:
XmlDocument doc = new XmlDocument();
doc.Load( "c:\\testfile2.xml" );
foreach (XmlNode nd in doc.DocumentElement.SelectNodes( "xml/entry" ))
{
...
}

Anonymous said...

XmlDocument lets you easily navigate through your xml but it is very resource intensive. If your goal is performance, use something else.

http://msdn2.microsoft.com/en-us/library/ms998559.aspx

Anonymous said...

What about XmlTextReader?

Anonymous said...

Your problem is the Console.WriteLine this is so slow that you can not make good measurements.

For me in my projects the XmlReader is up to 25 times faster than the DataSet with XMl Files bigger than 15 MB

Neets said...

Hi was browsing the net for a solution to a problem i am facing....i would like to use Xpath and read an xml file and put the result in a dataset. I am trying to use a gridview to edit it. Is this possible? So far, i have been able to see post the same question at various forums but not with any result.
Let me know if you can help me. you may post the ans, if you have one, in my non-tech blog.

Anonymous said...

Try VTD-XML (http://vtd-xml.sf.net)
the next generation document centric, all purpose XML parser/indexer

immanouel said...

Hi anders!!
I need your opinion!
I´m going to create a web search crawler and i was wondering where to start.. i have made some experiences with SQL and it is a fast DB reading data.. but it will get very heavy soon and then it will be not so fast! So my idea is to record the output search crawler (record the text/images, links etc), from the websites, in XML and then work it out with SQL.. So the real question is .. what is the fastest reading , SQL (from a heavy database) or XML(from multiple files)?
thanks
Ps- if you would like to contact me please use this email: immanouel@sapo.pt

Anonymous said...

The fastest database in existence that will do exactly what you are asking with little to no coding (it was built for logging data from a web crawler search engine) is at http://www.la-la-land.net. You can bring data in as XML then query against it like SQL...you don't even need to "move it over". It can handle quadrillions of records in less than 1 millisecond reading and writing time. It's indexes have indexes on the indexes. That's why it's so fast...and it's open source with no license. You can use it all you want, resell it even. Good luck but fuk.

Anonymous said...

Thanks for this great post.

I have also test to get the fasted xml from a file. Only the dataset is very slow at my tests.

Also you forget to use the XmlDocument.Load("path and filename");

Till 20kb the streamreader was the fasted. Above the XmlDocument.Load() was the fasted.

I hopes that this comment is usefull to somebody (-;

Muhammad Azeem said...

This is a nice article..
Its very easy to understand ..
And this article is using to learn something about it..

c#, dot.net, php tutorial

Thanks a lot..!

deep_spins said...

Hi, I am having a doubt here. I have to parse and update(based on the system configurations) a xml file of about 500KB,which is taking more than 15 minutes using XMLDocument approach.
Please tell me which is the best alternative so that the same can be achieved in 1 or 2 minutes.

Jasmin's Blog said...

Its so easy to understand and to decide also which should I use....very nice article...keep it up...:-)

Aamer Saeed said...

Hi,

Can anyone tell me the method to read the xml file with XmlTextReader and append it and then again write it on the same file.


Thanks,

Sawari said...

Very Very bad article to understand for the beginners :-(

Sudhir Chekuri said...

Nice post. Got a lot of important stuff from this blog.
dba kings

leonerdscrew said...

Load XML file to TreeView control

lingmaaki said...

XmlDataDocument xmldoc = new XmlDataDocument();
XmlNodeList xmlnode ;
int i = 0;
string str = null;
FileStream fs = new FileStream("product.xml", FileMode.Open, FileAccess.Read);
xmldoc.Load(fs);
xmlnode = xmldoc.GetElementsByTagName("Product");


source... http://csharp.net-informations.com/xml/how-to-read-xml.htm

ling