SAX xml parser and Java
Written by Mottola Michele - Italy - Reggio Emilia   
Monday, 22 August 2016 07:37
Last Updated on Tuesday, 23 August 2016 10:16
AddThis Social Bookmark Button

Summary

In this article i want to show
- an introduction to sax
- a simple use case that dimostrate the basic functionality of SAX
- how to validate an xml document with SAX
- how to use the filters

Introduction to SAX

SAX (Simple API for XML) is an xml parser. It started as java api but then it becames a standard.

Since SAX is a standard, there are a several implementations (in java and other languages). Java also provides a unified interface to the parsers (so also to SAX) through JAXP (java api for xml processing).

Basically JAXP uses the factory pattern to access a parser like SAX and then for parse the xml document.
So to parse an xml document we need an XMLReader and we can have this by retrieve the sax factory and then a sax parser object. Something like this


   SAXParserFactory spf = SAXParserFactory.newInstance(); 
   SAXParser saxParser = spf.newSAXParser(); 
   XMLReader xmlReader = saxParser.getXMLReader(); 
           xmlReader.parse("/path/to/people.xml");

A simple use case

In this example i want to show how SAX works and the use of one of its main interface (ContentHandler).
Imagine we have an xml document like this

people.xml

   
   <?xml version="1.0" encoding="UTF-8"?> 
   <people> 
           <person age="25"> 
                   <name>name1</name> 
                   <surname>surname1</surname> 
           </person> 
           <person age="30"> 
                   <name>name2</name> 
                   <surname>surname2</surname> 
           </person> 
           <person age="35"> 
                   <name>name3</name> 
                   <surname>surname3</surname> 
           </person> 
           <person age="25"> 
                   <name>name4</name> 
                   <surname>surname4</surname> 
           </person> 
   </people>

and we want to extract the name of the persons whereby their age is 25 years.

To do that we need to use a ContentHandler.
ContentHandler is a SAX interface whereby you can manage the events generates by SAX when it parse the xml document.
When SAX parse the document, it parse the document line by line, reading each element in the line and for each element it call a callback method (of ContentHandler) that you can program to execute the task you want.
So the next thing we need to do is to add the ContentHandler to the XMLReader before we parse the document.
To do that we first need to create a ContentHandler and then add it to the XMLReader with something like this

   
   XMLReader xmlReader = saxParser.getXMLReader(); 
   MyContentHandler handler = new MyContentHandler(); 
           xmlReader.setContentHandler(handler); 
           xmlReader.parse("/path/to/people.xml");

Now to create our MyContentHandler we can implement ContentHandler or we can extend DefaultHandler. I'll use the second way.

Now we can write our real program:

TestSax.java

   
   import java.io.IOException; 
    
   import javax.xml.parsers.ParserConfigurationException; 
   import javax.xml.parsers.SAXParser; 
   import javax.xml.parsers.SAXParserFactory; 
    
   import org.xml.sax.SAXException; 
   import org.xml.sax.XMLReader; 
    
   public class TestSax { 
    
           public static void main(String[] args) { 
    
    
                   SAXParserFactory spf = SAXParserFactory.newInstance(); 
    
                   try { 
                           SAXParser saxParser = spf.newSAXParser(); 
    
                           XMLReader xmlReader = saxParser.getXMLReader(); 
    
    
                           MyContentHandler handler = new MyContentHandler(); 
                           xmlReader.setContentHandler(handler); 
    
                           xmlReader.parse("src/sax6/people.xml"); 
    
    
    
                   } catch (ParserConfigurationException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } catch (SAXException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } catch (IOException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } 
    
    
           } 
    
   }
MyContentHandler.java
   import org.xml.sax.Attributes; 
   import org.xml.sax.SAXException; 
   import org.xml.sax.helpers.DefaultHandler; 
    
   public class MyContentHandler extends DefaultHandler { 
    
           private boolean startPerson = false; 
           private boolean startName = false; 
           private String person; 
    
           @Override 
           public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { 
    
                   if(qName.equals("person")){ 
                           if(attributes.getValue(0).equals("25")){ 
                                   startPerson = true; 
                           } 
                   } 
    
                   if(qName.equals("name")){ 
                           this.startName=true; 
                   } 
           } 
    
           @Override 
           public void characters(char[] ch, int start, int length) throws SAXException { 
    
                   if(startPerson){ 
                           if(startName){ 
                                   person = new String(ch, start, length); 
                                   System.out.println(person); 
                           } 
                   } 
           } 
    
    
    
           @Override 
           public void endElement(String uri, String localName, String qName) throws SAXException { 
    
                   if(qName.equals("person")){ 
                           startPerson=false; 
                   } 
    
                   if(qName.equals("name")){ 
                           startName=false; 
                   } 
           } 
   }

How does ContentHandler works?
As i told, when SAX parse the document it reads the document line by line, and for each line it read elements, attributes and the content of elements.
When it find an element that start, it invoke the callback method startElement(). Overriding this method you can know the name of the element invoked, its attributes name and its attributes value.
After an element there si its content. When SAX find this, it invoke characters() method from wich you can fetch the value.
Finally SAX find the end of an element and it invoke the endElement() method from which you can fetch the name of the element ended.

When i run this program, this is the output:

name1
name4

Validation

Here i want to show how we can use SAX to verify that a document is valid (well formed and valid) and how ErrorHandler interface works.

We start from this xml schema

City.xsd

   
   <?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
   <xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema"> 
    
     <xs:element name="city" type="city"/> 
    
     <xs:element name="person" type="person"/> 
    
     <xs:complexType name="city"> 
       <xs:sequence> 
         <xs:element ref="person" minOccurs="0" maxOccurs="unbounded"/> 
       </xs:sequence> 
     </xs:complexType> 
    
     <xs:complexType name="person"> 
       <xs:sequence> 
         <xs:element name="age" type="xs:int"/> 
         <xs:element name="name" type="xs:string" minOccurs="0"/> 
         <xs:element name="surname" type="xs:string" minOccurs="0"/> 
       </xs:sequence> 
     </xs:complexType> 
   </xs:schema>

and this instance

City.xml

   
   <?xml version="1.0" encoding="UTF-8"?> 
   <city xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="City.xsd"> 
     <person> 
       <age>25</age> 
       <name>Michele</name> 
       <surname>Pazzi</surname> 
     </person> 
     <person> 
           <age>prova</age> 
           <name>Elisa</name> 
           <surname>Manfredi</surname> 
    
   </city>

As you can see this xml document has two errors: the second tag person is not closed and i have used an invalid type in age (age=prova). This mean that document is not well formed and invalid.

Now we need to tell our program that we want enable validation and what kind of schema we want use.
These may be done by setting some options in SAXParserFactory. So we need to have something like this

   SAXParserFactory factory = SAXParserFactory.newInstance(); 
           factory.setValidating(false); 
           factory.setNamespaceAware(true); 
    
           SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); 
           Schema schema = schemaFactory.newSchema(); 
           factory.setSchema(schema);

in order to catch our errors we need use the ErrorHandler interface by setting it to the XMLReader

  
   XMLReader xmlReader = saxParser.getXMLReader(); 
           MyErrorHandler errorHandler = new MyErrorHandler(); 
           xmlReader.setErrorHandler(errorHandler);

This is our final program

TestSaxValidation.java

   
   import java.io.IOException; 
    
   import javax.xml.parsers.ParserConfigurationException; 
   import javax.xml.parsers.SAXParser; 
   import javax.xml.parsers.SAXParserFactory; 
   import javax.xml.validation.Schema; 
   import javax.xml.validation.SchemaFactory; 
    
   import org.xml.sax.SAXException; 
   import org.xml.sax.XMLReader; 
    
   public class TestSaxValidation1 { 
    
           public static void main(String[] args) { 
    
    
    
                   try { 
                           SAXParserFactory factory = SAXParserFactory.newInstance(); 
                                   factory.setValidating(false); 
                                   factory.setNamespaceAware(true); 
    
                                           SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); 
                                           Schema schema = schemaFactory.newSchema(); 
                                   factory.setSchema(schema); 
    
                           SAXParser saxParser = factory.newSAXParser(); 
    
                           XMLReader xmlReader = saxParser.getXMLReader(); 
    
                                   MyErrorHandler errorHandler = new MyErrorHandler(); 
                                   xmlReader.setErrorHandler(errorHandler); 
    
    
                                   xmlReader.parse("./src/sax5/City.xml"); 
    
                   } catch (SAXException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } catch (ParserConfigurationException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } catch (IOException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } 
           } 
    
   }
MyErrorHandler.java
   
   import org.xml.sax.SAXParseException; 
    
    
   import sax1.ErrorHandler; 
    
   public class MyErrorHandler extends ErrorHandler { 
    
           @Override 
           public void warning(SAXParseException e) { 
                   System.out.println("warning: "+e.getMessage()); 
    
           } 
    
           @Override 
           public void error(SAXParseException e) { 
                   System.out.println("document not valid: "+e.getMessage()); 
    
    
           } 
    
           @Override 
           public void fatalError(SAXParseException e) { 
                   System.out.println("not well formed document: "+e.getMessage()); 
    
           } 
    
   }

ErrorHandler interface has just 3 methods. The mains are: error() that is invoked when the document is not valid, fatalError() is invoked when the document is not well formed.

When i run our program i have this output:

document not valid: cvc-datatype-valid.1.2.1: 'prova' is not a valid value for 'integer'.
document not valid: cvc-type.3.1.3: The value 'prova' of element 'age' is not valid.
not well formed document: The element type "person" must be terminated by the matching end-tag "</person>".

Filters

Filters are a behaviour of SAX to support modularization. You can bind, to the XMLReader, more filters where each filter act some operation on xml document.

Suppose you have people.xml document that we saw before and we want two filters: the first change the value of an attribute, the second print the name of the people which have 25 years . This is the same example we have see before, but now we have a filter that change the xml before it is passed to the second filter.
The situation is something like this

so our main program is this

TestFilter.java
   
   package saxPubl; 
    
   import java.io.IOException; 
    
   import javax.xml.parsers.ParserConfigurationException; 
   import javax.xml.parsers.SAXParser; 
   import javax.xml.parsers.SAXParserFactory; 
   import org.xml.sax.SAXException; 
   import org.xml.sax.XMLReader; 
    
    
    
    
    
   public class TestFilter { 
    
           public static void main(String[] args) { 
    
                   SAXParserFactory spf = SAXParserFactory.newInstance(); 
    
                   try { 
                           SAXParser saxParser = spf.newSAXParser(); 
                           XMLReader xmlReader = saxParser.getXMLReader(); 
    
    
    
                           MyXMLFilter filter = new MyXMLFilter(); 
                           filter.setParent(xmlReader); 
    
                           MyXMLFilter2 filter2= new MyXMLFilter2(); 
                           filter2.setParent(filter); 
    
    
                           filter2.parse("src/saxPubl/people.xml"); 
    
    
                   } catch (ParserConfigurationException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } catch (SAXException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
    
                   } catch (IOException e) { 
                           // TODO Auto-generated catch block 
                           e.printStackTrace(); 
                   } 
    
    
    
           } 
    
   }

here i have only add two filters, binding them with setParent().

The code of filter2 is the same of the ContentHandler we have see in the 'A simple use case' previously, so it doen't need explanation.

MyXMLFilter2.java

   
   package saxPubl; 
    
   import org.xml.sax.Attributes; 
   import org.xml.sax.SAXException; 
   import org.xml.sax.helpers.XMLFilterImpl; 
    
   public class MyXMLFilter2 extends XMLFilterImpl { 
    
    
           private boolean startPerson = false; 
           private boolean startName = false; 
           private String person; 
    
           @Override 
           public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { 
    
                   if(qName.equals("person")){ 
                           if(attributes.getValue(0).equals("25")){ 
                                   //System.out.println(attributes.getValue(0)); 
                                   startPerson = true; 
                           } 
                   } 
    
                   if(qName.equals("name")){ 
                           this.startName=true; 
                   } 
           } 
    
           @Override 
           public void characters(char[] ch, int start, int length) throws SAXException { 
    
                   if(startPerson){ 
                           if(startName){ 
                                   person = new String(ch, start, length); 
                                   System.out.println(person); 
                           } 
                   } 
           } 
    
    
    
           @Override 
           public void endElement(String uri, String localName, String qName) throws SAXException { 
    
                   if(qName.equals("person")){ 
                           startPerson=false; 
                   } 
    
                   if(qName.equals("name")){ 
                           startName=false; 
                   } 
           } 
    
   }

and this is the code of filter1

MyXMLFilter1.java

   
   package saxPubl; 
    
   import org.xml.sax.Attributes; 
   import org.xml.sax.SAXException; 
   import org.xml.sax.helpers.AttributesImpl; 
   import org.xml.sax.helpers.XMLFilterImpl; 
    
   public class MyXMLFilter extends XMLFilterImpl { 
    
           @Override 
           public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException { 
    
    
                   AttributesImpl att = new AttributesImpl(atts); 
    
                   if(qName.equals("person")){ 
                           if(atts.getValue("age").equals("35")){ 
                                   att.setValue(0, "25"); 
                           } 
                   } 
    
                   super.startElement(uri, localName, qName, att); 
           } 
    
    
    
    
   }

Here i change the value of an attribute. In the people.xml there is only one person with 35 years, so i change it's age to 25. So now the xml has 3 person of 25 years.
To change this attribute i have first create a new list of attributes starting from the previously list and then i have set a value of its attribute. This is necessary because the AttributesImpl interface allows to set the value of an attribute.

Finally notice the use of super.startElement(...). Basically this send this event to the super class XMLReader that is able to register events.

When i start this program i have this output:

name1
name3
name4

as you can see the second filter print also the name3 because it get also the event generated from the first filter.





If you like, follow me

 

Comments  

 
#8 mobile legends apk 2019-11-22 21:52
You're so cool! I don't believe I've truly read anything like this before.

So wonderful to discover somebody with a few original thoughts on this subject matter.
Seriously.. thank you for starting this up. This web site is one thing that's needed
on the internet, someone with a little originality!
 
 
#7 mobile legends hack 2019-11-22 19:04
This is a topic that's near to my heart... Cheers!
Where are your contact details though?
 
 
#6 mobile legends hack 2019-11-22 01:58
Hey there I am so excited I found your blog, I
really found you by error, while I was searching on Bing for something else, Anyhow I am
here now and would just like to say cheers for a marvelous post and a all round interesting blog (I also love the
theme/design), I don’t have time to go through it all at the moment but I
have book-marked it and also added your RSS feeds, so when I have time
I will be back to read much more, Please do keep up the awesome jo.
 
 
#5 mobile legends hack 2019-11-21 11:27
Hiya very cool blog!! Guy .. Beautiful .. Superb .. I will bookmark your site and take the feeds additionally?
I am happy to seek out a lot of helpful information right
here in the put up, we need develop extra techniques in this regard, thank you for sharing.
. . . . .
 
 
#4 mobile legends mod 2019-11-20 13:22
I'm amazed, I have to admit. Seldom do I come across a blog that's equally educative and interesting, and
without a doubt, you have hit the nail on the head. The issue
is an issue that not enough folks are speaking intelligently about.
I am very happy I found this in my hunt for something relating to this.
 
 
#3 mobile legends cheat 2019-10-21 19:36
It is in reality a great and useful piece of information. I'm satisfied that you just shared this helpful info with us.
Please stay us up to date like this. Thanks for sharing.
 
 
#2 mobile legends hack 2019-10-20 20:36
Your means of describing all in this piece of writing is genuinely nice, all be able to
without difficulty be aware of it, Thanks a lot.
 
 
#1 grabacion de videos 2019-06-27 12:27
I just could not go away your website prior to suggesting that I really loved the
usual information a person provide to your guests?
Is gonna be again steadily to check up on new posts
 

Add comment

Security code
Refresh