Friday, May 28, 2010

Basic XML Parsing via PL/SQL

One question that comes up with some frequency on various OTN forums is "How do I parse this XML?" Depending upon where the XML is (PL/SQL or DB) and what version of Oracle, there are different options that can be used. In starting simple, I show some examples for parsing some basic XML via PL/SQL. This example shows some basic ways to parse XML.
DECLARE
l_doc VARCHAR2(2000);
l_domdoc dbms_xmldom.DOMDocument;
l_nodelist dbms_xmldom.DOMNodeList;
l_node dbms_xmldom.DOMNode;
l_value VARCHAR2(30);

l_xmltype XMLTYPE;
l_empx XMLTYPE;
l_index PLS_INTEGER;
l_col_ind PLS_INTEGER;
BEGIN
l_doc := '<employees>
<emp>
<name>Scott</name>
<favorites>
<color>red</color>
<color>orange</color>
</favorites>
</emp>
<emp>
<name>John</name>
<favorites>
<color>blue</color>
<color>green</color>
</favorites>
</emp>
</employees>';

l_domdoc := dbms_xmldom.newDomDocument(l_doc);

-- Method 1
dbms_output.put_line('Method 1');
l_nodelist := dbms_xslprocessor.selectNodes(dbms_xmldom.makeNode(l_domdoc),'/employees/emp/name');
FOR cur_emp IN 0 .. dbms_xmldom.getLength(l_nodelist) - 1 LOOP
l_node := dbms_xmldom.item(l_nodelist, cur_emp);
l_value := dbms_xmldom.getnodevalue(dbms_xmldom.getfirstchild(l_node));
dbms_output.put_line('Emp Name: '||l_value);
END LOOP;

-- Method 2
dbms_output.new_line;
dbms_output.put_line('Method 2');
l_nodelist := dbms_xmldom.getelementsbytagname(l_domdoc, 'name');
-- get first item from list, could loop as shown above
l_node := dbms_xmldom.item(l_nodelist, 0);
l_value := dbms_xmldom.getnodevalue(dbms_xmldom.getfirstchild(l_node));
dbms_output.put_line('Emp Name: '||l_value);

-- Done with DOMDocument examples, setup for XMLType based examples
dbms_xmldom.freeDocument(l_domdoc);
l_xmltype := XMLTYPE(l_doc);

-- Method 3
dbms_output.new_line;
dbms_output.put_line('Method 3');
l_index := 1;
WHILE l_xmltype.Existsnode('/employees/emp[' || To_Char(l_index) || ']') > 0
LOOP
l_value := l_xmltype.extract('/employees/emp[' || To_Char(l_index) || ']/name/text()').getStringVal();
dbms_output.put_line('Emp Name: '||l_value);
l_index := l_index + 1;
END LOOP;

-- Method 4
dbms_output.new_line;
dbms_output.put_line('Method 4');
l_value := l_xmltype.extract('/employees/emp[2]/name/text()').getStringVal();
dbms_output.put_line('Emp Name: '||l_value);

-- Method 5
dbms_output.new_line;
dbms_output.put_line('Method 5');
l_index := 1;
WHILE l_xmltype.Existsnode('/employees/emp[' || To_Char(l_index) || ']') > 0
LOOP
l_empx := l_xmltype.extract('/employees/emp[' || To_Char(l_index) || ']');
l_col_ind := 1;
WHILE l_empx.Existsnode('/emp/favorites/color[' || To_Char(l_col_ind) || ']') > 0
LOOP
l_value := l_empx.extract('/emp/favorites/color[' || To_Char(l_col_ind) || ']/text()').getStringVal();
dbms_output.put_line('Color: '||l_value);
l_col_ind := l_col_ind + 1;
END LOOP;
l_index := l_index + 1;
END LOOP;
END;

Running this script produces
Method 1
Emp Name: Scott
Emp Name: John

Method 2
Emp Name: Scott

Method 3
Emp Name: Scott
Emp Name: John

Method 4
Emp Name: John

Method 5
Color: red
Color: orange
Color: blue
Color: green

In PL/SQL, you basically have two methods. The first, and older method, is via dbms_xmldom/dbms_xslprocessor and the second is via XMLType. The first method uses DOMDocuments, DOMNodes, DOMNodelist, etc to slowly break down the XML and parse/traverse it. This is based on the W3C definition of a DOM. The second method uses XPath 1.0 syntax to quickly parse an XMLType and return either another XMLType fragment or a basic data type. My preference for parsing XML is via an XMLType as the code seems cleaner to write, understand, and maintain.

Methods 1 and 2 use the old style to parse the XML.
Method 1 used dbms_xmlprocessor to select all the nodes that match the XPath /employees/emp/name and then iterates over the list that is returned to print out all the names.
Method 2 uses dbms_xmldom to get all the nodes called "name" and simply prints the first one. It too could easily loop through all the name nodes since both operations return a DOMNodeList.

Methods 3 through 5 use XMLType to parse the XML.
Method 3 looks for and returns the same information as Method 1. This difference is that instead of returning a list of nodes that match the criteria, it goes out and gets each occurrence one at a time.
Method 4 is like Method 2 in that it returns the contents of a specific node directly, in this case, the name of the second employee "emp[2]/name"
Method 5 shows the name for each employee and their favorite colors. As the code shows, it extract each emp as an XMLType and then parses that XMLType to extract all the colors.

Gotchas:
  • /text() refers to the text contents of the given node. Leaving this off returns the node itself (an XMLType).
  • .getStringVal() and others, along with many examples can be found in the XMLType Operations section within the online Oracle Documentation.
  • "ORA-30625: method dispatch on NULL SELF argument is disallowed" means your XPath returned returned nothing aka NULL and further operations failed. This is common when doing a .get*Val() on a .extract() that returned NULL. One way is to do a .ExistsNode before the .extract().get*Val()

8 comments:

  1. Hi A_Non,

    The "Basic XML Parsing via PL/SQL" has been v.helpful. But there is one problem which i am facing. The XML which I am having has more than 2 namespaces in it and I am unable to Import the data from it. XML File :





    FamilyNameValue






    Please let me know how to tackle this situation. You could reply to me question in: http://forums.oracle.com/forums/thread.jspa?threadID=1110466&tstart=0

    Many thanks in advance.

    ReplyDelete
  2. Hi, Could you please email me your email address?

    ReplyDelete
  3. If you have a question specific to this article, you can post the question here. If you have a general question regarding parsing XML, you can also ask it at http://forums.oracle.com/forums/category.jspa?categoryID=51&start=0 or http://forums.oracle.com/forums/forum.jspa?forumID=34 for review by myself and others.

    ReplyDelete
  4. Hi,
    Here in the above example you have taken a xml value in l_doc varchar2 data type.
    But here my requirement is to take a xml file from a particular directory and save xml data in each db table columns.

    Please help me out here, its very very urgent.

    Thanks in advance.

    Aish

    ReplyDelete
    Replies
    1. Hi Arun,

      Had you got any solution. I am also doing same thing and facing problem.

      I am facing issue when xml file contains encode=UTF8.

      Please help me on this matter.

      Thanks in advance.

      Delete
    2. Late reply: Look at using BFileName for one alternative for reading XML in from disk. Plenty of examples in the XML DB forums at https://forums.oracle.com/forums/forum.jspa?forumID=34 Just search for that keyword.

      Delete
    3. Arun,
      You may need to use the second parm of XMLType to properly handle the XML encoding. Stolen from a search on the XML DB forums:
      xmltype(bfilename('SAMPLE_DATA','file.xml'),nls_charset_id('AL32UTF8'))

      Delete
  5. Excellent post! Thanks a bunch. I got caught by ORA-30625, but got around it with .ExistsNode.

    ReplyDelete