Friday, May 28, 2010

Basic XML Parsing via PL/SQL

One question that comes up with some frequency on various OTN forums is "How do I parse this XML?" Depending upon where the XML is (PL/SQL or DB) and what version of Oracle, there are different options that can be used. In starting simple, I show some examples for parsing some basic XML via PL/SQL. This example shows some basic ways to parse XML.
DECLARE
l_doc VARCHAR2(2000);
l_domdoc dbms_xmldom.DOMDocument;
l_nodelist dbms_xmldom.DOMNodeList;
l_node dbms_xmldom.DOMNode;
l_value VARCHAR2(30);

l_xmltype XMLTYPE;
l_empx XMLTYPE;
l_index PLS_INTEGER;
l_col_ind PLS_INTEGER;
BEGIN
l_doc := '<employees>
<emp>
<name>Scott</name>
<favorites>
<color>red</color>
<color>orange</color>
</favorites>
</emp>
<emp>
<name>John</name>
<favorites>
<color>blue</color>
<color>green</color>
</favorites>
</emp>
</employees>';

l_domdoc := dbms_xmldom.newDomDocument(l_doc);

-- Method 1
dbms_output.put_line('Method 1');
l_nodelist := dbms_xslprocessor.selectNodes(dbms_xmldom.makeNode(l_domdoc),'/employees/emp/name');
FOR cur_emp IN 0 .. dbms_xmldom.getLength(l_nodelist) - 1 LOOP
l_node := dbms_xmldom.item(l_nodelist, cur_emp);
l_value := dbms_xmldom.getnodevalue(dbms_xmldom.getfirstchild(l_node));
dbms_output.put_line('Emp Name: '||l_value);
END LOOP;

-- Method 2
dbms_output.new_line;
dbms_output.put_line('Method 2');
l_nodelist := dbms_xmldom.getelementsbytagname(l_domdoc, 'name');
-- get first item from list, could loop as shown above
l_node := dbms_xmldom.item(l_nodelist, 0);
l_value := dbms_xmldom.getnodevalue(dbms_xmldom.getfirstchild(l_node));
dbms_output.put_line('Emp Name: '||l_value);

-- Done with DOMDocument examples, setup for XMLType based examples
dbms_xmldom.freeDocument(l_domdoc);
l_xmltype := XMLTYPE(l_doc);

-- Method 3
dbms_output.new_line;
dbms_output.put_line('Method 3');
l_index := 1;
WHILE l_xmltype.Existsnode('/employees/emp[' || To_Char(l_index) || ']') > 0
LOOP
l_value := l_xmltype.extract('/employees/emp[' || To_Char(l_index) || ']/name/text()').getStringVal();
dbms_output.put_line('Emp Name: '||l_value);
l_index := l_index + 1;
END LOOP;

-- Method 4
dbms_output.new_line;
dbms_output.put_line('Method 4');
l_value := l_xmltype.extract('/employees/emp[2]/name/text()').getStringVal();
dbms_output.put_line('Emp Name: '||l_value);

-- Method 5
dbms_output.new_line;
dbms_output.put_line('Method 5');
l_index := 1;
WHILE l_xmltype.Existsnode('/employees/emp[' || To_Char(l_index) || ']') > 0
LOOP
l_empx := l_xmltype.extract('/employees/emp[' || To_Char(l_index) || ']');
l_col_ind := 1;
WHILE l_empx.Existsnode('/emp/favorites/color[' || To_Char(l_col_ind) || ']') > 0
LOOP
l_value := l_empx.extract('/emp/favorites/color[' || To_Char(l_col_ind) || ']/text()').getStringVal();
dbms_output.put_line('Color: '||l_value);
l_col_ind := l_col_ind + 1;
END LOOP;
l_index := l_index + 1;
END LOOP;
END;

Running this script produces
Method 1
Emp Name: Scott
Emp Name: John

Method 2
Emp Name: Scott

Method 3
Emp Name: Scott
Emp Name: John

Method 4
Emp Name: John

Method 5
Color: red
Color: orange
Color: blue
Color: green

In PL/SQL, you basically have two methods. The first, and older method, is via dbms_xmldom/dbms_xslprocessor and the second is via XMLType. The first method uses DOMDocuments, DOMNodes, DOMNodelist, etc to slowly break down the XML and parse/traverse it. This is based on the W3C definition of a DOM. The second method uses XPath 1.0 syntax to quickly parse an XMLType and return either another XMLType fragment or a basic data type. My preference for parsing XML is via an XMLType as the code seems cleaner to write, understand, and maintain.

Methods 1 and 2 use the old style to parse the XML.
Method 1 used dbms_xmlprocessor to select all the nodes that match the XPath /employees/emp/name and then iterates over the list that is returned to print out all the names.
Method 2 uses dbms_xmldom to get all the nodes called "name" and simply prints the first one. It too could easily loop through all the name nodes since both operations return a DOMNodeList.

Methods 3 through 5 use XMLType to parse the XML.
Method 3 looks for and returns the same information as Method 1. This difference is that instead of returning a list of nodes that match the criteria, it goes out and gets each occurrence one at a time.
Method 4 is like Method 2 in that it returns the contents of a specific node directly, in this case, the name of the second employee "emp[2]/name"
Method 5 shows the name for each employee and their favorite colors. As the code shows, it extract each emp as an XMLType and then parses that XMLType to extract all the colors.

Gotchas:
  • /text() refers to the text contents of the given node. Leaving this off returns the node itself (an XMLType).
  • .getStringVal() and others, along with many examples can be found in the XMLType Operations section within the online Oracle Documentation.
  • "ORA-30625: method dispatch on NULL SELF argument is disallowed" means your XPath returned returned nothing aka NULL and further operations failed. This is common when doing a .get*Val() on a .extract() that returned NULL. One way is to do a .ExistsNode before the .extract().get*Val()

Sunday, May 23, 2010

XMLType - What is it?

While pondering my first meaningful post regarding what it should include and be limited to, I began pondering what an XMLType really is. A CLOB or VARCHAR2 is stored as characters. Numbers are stored as bits and bytes. Dates are an internal Oracle representation. An XMLType though is simply a system-defined opaque type per 11.2 documentation. Depending upon where the XMLType is used, it could be stored in various formats. In the database, it could be represented as a CLOB, binary XML, or object-relational storage. How PL/SQL represents an XMLType is unclear.

Oracle provides many functions to access the data in an XMLType, whether a member function such as .extract or via XMLTable which uses an XMLType. Since the introduction of XMLType in 9.2, the definition in the Oracle documentation has changed slightly over time. Whether this was just documentation clarification or a associated with an underlying structure change is a guess. I suspect the latter given Oracle continues to introduce new DB storage options in 11g.

So for now, I will accept an XMLType as any other basic data type within in Oracle, but I will still continue to wonder how it is represented in PL/SQL and how I can figure it out besides a memory dump. Future post will touch on how to access the data in a XMLType, whether in PL/SQL or the DB along with ways to build XML. For how XML can be stored in the DB, I defer to Marco's blog on that subject where possible.

Information on Data Cartridges (10.2 documentation), as mentioned by Marco in the comment.

Monday, May 17, 2010

Round 1

This is going to start out as a place to post common examples regarding SQL/XML, such as XMLTable and XQuery, along with any other XML DB information that I post about on the Oracle OTN forums. A scratchpad with common examples to build from. What this morphs into over time is a good question.

Why did I start this? I was rightfully prodded into starting this, but that was a good thing. Trying to find some of my previous posts to re-use as an example has become difficult.