Programming: JAXP Notes

Java API for XML Processing (JAXP)

1. The Java API for XML Processing (JAXP) is for processing XML data using applications written in the Java programming language.

2. JAXP uses the parser standards SAX (Simple API for XML Parsing) and DOM (Document Object Model) so that you can choose to parse your data as a stream of events or to build an object representation of it.

3. JAXP also supports the XSLT (XML Stylesheet Language Transformations) standard, giving you control over the presentation of data and enabling you to convert the data to other XML documents or to other formats, such as HTML.

4. JAXP can be divided into two main parts: a parsing API and a transform API.
------------------------------------------------------------------------------------------------------------------------
Software Requirement

For compiling and executing applications, you will require the Java Standard Edition (JSE) installed on your machine. The JAXP API is bundled with the JSE distribution.

-------------------------------------------------------------------------------------------------------------------------
API(Packages)

The main JAXP APIs are defined in the "javax.xml.parsers" package. That package contains two vendor-independent factory classes:

1. SAXParserFactory - gives you a SAXParser (used for SAX)
2. DocumentBuilderFactory - gives you a DocumentBuilder, which in turn creates a DOM-compliant Document object (used for DOM)

The factory APIs give you the ability to plug in an XML implementation offerred by another vendor without changing your source code. The implementation you get depends on the setting of the "javax.xml.parsers.SAXParserFactory" and "javax.xml.parsers.DocumentBuilderFactory" system properties. The default values (unless overridden at runtime) point to the reference implementation.

------------------------------------------------------------------------------------------------------------------------------
SAX vs DOM

The "Simple API for XML" (SAX) is the event-driven mechanism that does element-by-element processing. For server-side and high-performance applications it is typically SAX which is used. When it comes to fast, efficient reading of XML data, SAX is hard to beat. It requires little memory, because it does not construct an internal representation (tree structure) of the XML data. Instead, it simply sends data to the application as it is read - your application can then do whatever it wants to do with the data it sees.

The Document Object Model (DOM) API is generally an easier API to use. It provides a relatively familiar tree structure of objects. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user. On the negative side, constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API will tend to be preferred for server-side applications and data filters that do not require an in-memory representation of the data. However, when you need to modify an XML structure - especially when you need to modify it interactively, an in-memory structure like the DOM may make more sense. While DOM provides many powerful capabilities for large-scale documents (like books), it also requires a lot of complex coding.

SAX

1. Serial, event-driven mechanism
2. Fast, efficient
3. Requires little memory
4. Handles data when encountered
5. Preferred for server-side read applications

DOM

1. Provides a tree structure of objects
2. Powerful capabilities/Complex Coding
3. Higher memory/CPU requirements
4. Entire XML structure read in first
5. Tyically used for interactive modification of XML

------------------------------------------------------------------------------------------------------
Simple API for XML (SAX)

The Simple API for XML(SAX) is a standard interface for event-based XML parsing. SAX is an event driven process to identify the elements as the parser reads them. It then informs the application of events, such as the start and end of elements. In effect, the SAX API acts like a serial I/O stream - you see the data as it streams in, but you can't go back to an earlier position or leap ahead to a different position. In general it works well when you simply want to read data and have the application act on it.

Entities

1. SAXParserFactory - creates an instance of the SAXParser

2. SAXParser - in general you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.

3. SAXReader - The SAXParser wraps a SAXReader. It performs communication with the SAX event handlers .

4. DefaultHandler - This implements the ContentHandler, ErrorHandler, DTDHandler and EntityResolver interfaces with null methods, so that you can override only the ones we are interested in.

5. ContentHandler - Methods like startDocument, endDocument, startElement and endElement (among others) are invoked when an XML tag is recognised.

6. ErrorHandler - This interface can be used to implement your own error handling.

7. DTDHandler - Defines dtd handling methods.

8. EntityResolver - Used when the parser must identify data identified by a URL.

----------------------------------------------------------------------------------------------------------------------------------------
Ex: ContentHandler
personnel.xml

<?xml version='1.0'?>
<personnel>
<person>
<firstname>Aaa</firstname>
<lastname>Bbb</lastname>
</person>
<person>
<firstname>Ccc</firstname>
<lastname>Ddd</lastname>
</person>
<person>
<firstname>Eee</firstname>
<lastname>Fff</lastname>
</person>
</personnel>

Import Statements

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser

Class Declaration

public class MyParser extends DefaultHandler{

PArser Code

DefaultHandler handler = new MyParser();
SAXParserFactory factory = SAXParserFactory.newInstance();
try {

SAXParser saxParser = factory.newSAXParser();
System.out.println("Parsing the file....\n");
saxParser.parse(new File(filename.xml), handler);
}
catch (Throwable t) {
t.printStackTrace();
}

The ContentHandler interface requires a number of methods that the SAX parser invokes in response to different parsing events. The major handling methods are:

1. startDocument
2. endDocument
3. startElement
4. endElement
5. characters

Each of these methods is required by the interface to throw a SAXException.

public void startDocument() throws SAXException{

System.out.println("<?xml version='1.0' encoding='UTF-8'?>\n");
}

public void endDocument() throws SAXException{

System.out.println("\n\n\n***** Document Parsing Complete *****\n\n\n");
}

public void startElement(String namespaceURI, String sName, String qName,Attributes attrs)throws SAXException{

}

public void endElement(String namespaceURI, String sName, String qName)throws SAXException {

}

public void characters(char buf[], int offset, int len) throws SAXException {

}

Usage : java MyParser personnel.xml

-------------------------------------------------------------------------
Person.java

public class Person {
String firstname = new String("");
String lastname = new String("");
/**
* Standard constructor for a Question where each parameter is
* passed in its correct form.
*/

public Person(String firstname, String lastname) {

this.firstname = firstname;
this.lastname = lastname;
}

public Person() {
}

public String getFirstName(){

return firstname;
}

public String getLastName(){
return lastname;
}

public void setFirstName(String s){
firstname = s;
}

public void setLastName(String s){
lastname = s;
}
}

ParseClass.java

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;

public class ParseClass extends DefaultHandler{

StringBuffer textBuffer;
Person person = new Person();
int count = 0;

public static void main(String[] args){

if (args.length < 1) {
System.out.print("Usage: java ParseClass <document.xml>");
System.exit(1);
}

DefaultHandler handler = new ParseClass();
SAXParserFactory factory = SAXParserFactory.newInstance();

try {
SAXParser saxParser = factory.newSAXParser();
System.out.println("Parsing the Personnel Records....\n");
saxParser.parse(new File(args[0]), handler);
}
catch (Throwable t) {
t.printStackTrace();
}
System.exit(0);
}

public void startDocument() throws SAXException{
}

public void endDocument() throws SAXException{
}

public void startElement(String namespaceURI, String sName, String qName, Attributes attrs)throws SAXException{}

public void endElement(String namespaceURI, String sName, String qName)throws SAXException{

String eName = sName;

if ("".equals(eName)) eName = qName;

if (eName.equals("firstname")){

//System.out.println("Firstname = " + textBuffer.toString().trim());
person.setFirstName(textBuffer.toString().trim());
textBuffer = null;
}

if (eName.equals("lastname")){

//System.out.println("Lastname = " + textBuffer.toString().trim());
person.setLastName(textBuffer.toString().trim());
textBuffer = null;
}
if (eName.equals("person")){

System.out.println("Person Class = " + person);
System.out.println("Person -> Firstname: " + person.getFirstName() +"Lastname: " + person.getLastName() + "\n\n");
count++;
}
if (eName.equals("personnel")){

System.out.println("End of File! " + count + " records found.");
}
}
public void characters(char buf[], int offset, int len)throws SAXException {

String s = new String(buf, offset, len);
if (textBuffer == null) {

textBuffer = new StringBuffer(s);
}
else textBuffer.append(s);
}
}

----------------------------------------------------------------------------------------------------------------------------------
Document Object Model (DOM)

The Document Object Model, unlike SAX, has its origins in the World Wide Web Consortium (W3C). Whereas SAX is public-domain software, developed through collaboration and mailing lists, DOM is a standard just like the actual XML specification. The DOM is not specifically designed for Java, but to represent the content and model of documents across all programming languages and tools. Bindings exist for JavaScript, Java, CoRBA and other languages, allowing the DOM to be a cross-platform and cross-language specification.

The Document Object Model models an XML document. The DOM serves as a complete model representing each and every aspect of an XML object, allowing that object to be completely re-created from the model's data. With the DOM, programmers can build documents, navigate their structure, and add, modify or delete elements and content.
Ex:

DOMExample.java

Import Statements

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.IOException;
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;

Class Declaration

public class DomExample{

static Document document;

public static void main(String argv[]){
................
}
}

Parser Code

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

try {
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(argv[0]));
}
catch (SAXParseException spe) {

System.out.println("SAXParseException: Parsing Error!\n");
}
catch (SAXException sxe) {

System.out.println("SAXException: Parsing Error!\n");
}
catch (ParserConfigurationException pce) {

System.out.println("ParserConfigurationExptn:Parsing Error!\n");
}
catch (IOException ioe) {

System.out.println("IOException: Parsing Error!\n");
}

----------------------------------------------------------------------------------------------
Ex :2

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.IOException;
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;

public class DomExample2
{

public static void main(String argv[]){

Document document;

if (argv.length != 1) {

System.err.println("Usage: java DomExample filename");
System.exit(1);
}

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);

try {

DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(argv[0]));
displayInformation(document);
}
catch (SAXParseException spe) {
System.out.println("SAXParseException: Parsing Error!\n");
}
catch (SAXException sxe) {
System.out.println("SAXException: Parsing Error!\n");
}
catch (ParserConfigurationException pce) {
System.out.println("ParserConfigurationException: Parsing Error!\n");
}
catch (IOException ioe) {
System.out.println("IOException: Parsing Error!\n");
}
}

private static void displayInformation(Document document){

int nonWhitespaceNodes = 0;
System.out.println("Displaying Document Details....");

Element root = document.getDocumentElement();
System.out.println("Root Tag Name = " + root.getTagName());
System.out.println("Element has Attributes = " + root.hasAttributes());

NodeList nodeList = root.getChildNodes();
System.out.println("There are " + nodeList.getLength()
+ " child nodes including whitespace!\n");

for (int i=0;i<nodeList.getLength();i++) {

Node node = nodeList.item(i); // Get the next node
// Print out the name of this node
System.out.println("\nNodeName = " + node.getNodeName());

// Now we check for whitespace (Note: there should be none!)
// We are also checking the note type as an empty element would also have a textcontent length of 0

if ((node.getTextContent().trim().length()==0) && (node.getNodeType() == Node.TEXT_NODE )) {

System.out.println("This node is whitespace only!");
}
else nonWhitespaceNodes++;
}
System.out.println("There are " + nonWhitespaceNodes + " after we trim whitespace!\n\n");

// Now as an example, display all of the first name entries
NodeList nodeList2 = root.getElementsByTagName("firstname");
System.out.println("Number of firstname entries in the full nodelist: " + nodeList2.getLength());

for (int i=0;i< nodeList2.getLength();i++) {

Node node = nodeList2.item(i);
System.out.println(node.getNodeName() + " " + i + " = ");
// Remember we are using a tree structure where the data is represented by nodes also!
// So now we get the child of this firstname node, which will be the value we want
NodeList nodeList3 = node.getChildNodes();
System.out.println(nodeList3.item(0).getNodeValue()); // We know only one node
}
}

}
----------------------------------------------------------------------------------------------------------
DOM Construction/Modification

There are four packages within JAXP, which we use for transformations:

1. "javax.xml.transform" - Defines the TransformerFactory and Transformer classes, which you use to get an object capable of doing transformations. After creating a transformer object, you invoke its transform() method, providing it with an input(source) and output(result).

2. "javax.xml.transform.dom" - Classes to create input(source) and output(result) from a DOM

3. "javax.xml.transform.sax" - Classes to create input(source) from a SAX parser and output(result) objects from a SAX event handler.

4. "javax.xml.transform.stream" - Classes to create input(source) and output(result) objects from an I/O stream.

Ex::

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import java.io.File;
import java.io.IOException;
import org.w3c.dom.Document;
import org.w3c.dom.DOMException;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;

public class DomExample3{

public static void main(String argv[]) {

Document document;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

try {

DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.newDocument();
Element root = (Element) document.createElement("name");
document.appendChild(root);

Element firstname = document.createElement("firstname");
root.appendChild(firstname);
Element lastname = document.createElement("lastname");
root.appendChild(lastname);

firstname.appendChild(document.createTextNode("David"));
lastname.appendChild(document.createTextNode("Molloy"));

//displayInformation(document);
//writeXmlFile(document,"abc.xml");
}
catch (ParserConfigurationException pce) {
System.out.println("ParserConfigurationException: Parsing Error!\n");
}
}

public static void writeXmlFile(Document doc, String filename) {

try {

// Prepare the DOM document for writing
DOMSource source = new DOMSource(doc);

// Prepare the output file
File file = new File(filename);
StreamResult result = new StreamResult(file);

// Write the DOM document to the file
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(source, result);

} catch (TransformerConfigurationException e) {
} catch (TransformerException e) {
}
}
}

Programming

Wednesday, July 25, 2012

JAXP Notes

No comments:

Post a Comment

About Me

Blog Archive