Wednesday, September 21, 2011

Options for Slow Performance Parsing Large XML - Part 3

The final piece to this chapter. In my previous post Options for Slow Performance Parsing Large XML - Part 2, I showed how registering a schema and having that process create a table could be used to increase the parsing speed of XML compared to parsing that XML via PL/SQL. Now I'm going to look at whether using a table without an associated schema will make any difference on the parsing/inserting speeds.

I again used the following block of code for running my tests
declare 
   l_result_xml   XMLTYPE;
   l_ns_xml       XMLTYPE;
   l_start        NUMBER;
   l_end          NUMBER;

begin
   -- Build up the XML
   -- Has 2100+ row nodes
   ...snipped...

   l_start := dbms_utility.get_time;
   INSERT INTO LOAD_TEMP_RESULT VALUES l_result_xml;
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TEMP TIME in sec: ' || to_char((l_end - l_start) / 100));

   l_start := dbms_utility.get_time;
   INSERT INTO import_queue
   (...20 columns...) 
   SELECT CBIQ_ID_SEQ, CBRH_ID_SEQ,
          ...18 columns ...
     FROM LOAD_TEMP_RESULT ltr,
          XMLTABLE('/resultset/row'
                   PASSING ltr.object_value  -- column alias name
                   COLUMNS
                   CBIQ_ID_SEQ          NUMBER(11) PATH 'CBIQ_ID_SEQ', 
                   CBRH_ID_SEQ          NUMBER(11) PATH 'CBRH_ID_SEQ',
                   ... 18 more columns ...);
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TIME in sec: ' || to_char((l_end - l_start) / 100));
   rollback;
end;

The only difference I changed was the CREATE TABLE statement for LOAD_TEMP_RESULT. The following table summarizes the various CREATE TABLE statements I used and the resulting times, where TEMP TIME is the output for "INSERT INTO TEMP TIME in sec:" and TIME is the output for "INSERT INTO TIME in sec:"
DDL TEMP TIME (sec) TIME (sec)
CREATE TABLE load_temp_result OF XMLTYPE; 0.05 16
CREATE TABLE load_temp_result OF XMLTYPE XMLTYPE STORE AS BASICFILE CLOB; 0.09 15.82
CREATE TABLE load_temp_result OF XMLTYPE XMLTYPE STORE AS SECUREFILE CLOB; 2.36 15.46
CREATE TABLE load_temp_result OF XMLTYPE XMLTYPE STORE AS BASICFILE BINARY XML; 0.27 0.63
CREATE TABLE load_temp_result OF XMLTYPE XMLTYPE STORE AS SECUREFILE BINARY XML; 0.24 0.63

As I am using 11.1.0.6, the default behavior for this version of Oracle is to store XMLTypes as a CLOB. This is confirmed by the fact that the first row (default storage) matches most closely with the second row (explicit storage). Based on the results, it is easy to see that even creating a XMLType table (or a XMLType column) where the XMLType is stored as BINARY XML is well worth looking at in terms of performance when validation is not needed.

I'm not going to go into a discussion on the differences between BASICFILE and SECUREFILE, but the highlights of what SECUREFILE offers can be found at LOB Features Introduced in Oracle Database 11g Release 1. Oracle recommends that SECUREFILE be used for as I have picked up from the XML DB forum on OTN.

To recap the results from the previous two threads on this topic,
  • Using XMLTable in a SQL statement to parse a PL/SQL XMLType variable took 1032 seconds (16+ minutes).
  • Using PL/SQL to loop through each node in the XML and parse the information out and then INSERT that into a table took 4 seconds.
  • Using a registered schema and associated table took 0.5 seconds.
For the given test case I was using (approx 2100 row nodes in the XML) on the system I was using (11.1.0.6 on a RedHat OS), doing a schema registration that creates an associated table just edged out using a Binary XML XMLType based table at 0.87 seconds. This should be due to the additional information that Oracle knows about the XML based off the schema. As this was also the first version that BINARY XML existed in, it could be partially due to that fact as well. Maybe if I get access to a newer version of Oracle, I could test that theory.

Friday, September 16, 2011

Options for Slow Performance Parsing Large XML - Part 2

As I covered previously in Options for Slow Performance Parsing Large XML, I was looking for a PL/SQL only solution given the business requirements to parse an XML with just over 2100 repeating nodes and insert information from each node into a table in the DB for additional processing. The first attempt using only XMLTable to parse the PL/SQL XMLType variable took over 1000 seconds (16+ minutes). The second attempt parsed the XML by looping through it in PL/SQL and using a FORALL to insert the data into the table.  This took just over 4 seconds.

That was fine for my situation, but what if you need something faster? As I previously said
One option is to register a schema in the database, create a table based off that schema, store the XML in there and parse it that way (if still needed).

To verify the answer, I took the long way to get to the above just to see what happens.  I started by creating a simple schema for the XML, doing a basic schema registration, converting the XML to a schema based XML and parsing that PL/SQL XMLType.  The basics of that approach were

DECLARE
   l_schema  CLOB;
BEGIN
   l_schema := '<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdb="http://xmlns.oracle.com/xdb">
 <xs:element name="resultset" type="resultSetType" xdb:defaultTable="LOAD_TEMP_RESULT"/>
 <xs:complexType name="resultSetType">
  <xs:sequence>
   <xs:element name="row" type="rowType" maxOccurs="unbounded"/>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="rowType">
  <xs:sequence>
   ...16 elements of xs:int, xs:string, xs:short, xs:date ...
  </xs:sequence>
 </xs:complexType>
</xs:schema>';

   dbms_xmlSchema.registerSchema('blog_pref.xsd', l_schema, gentypes => false, gentables => false);
END;/

declare 
   l_result_xml   XMLTYPE;
   l_ns_xml       XMLTYPE;
   l_start        NUMBER;
   l_end          NUMBER;

begin
   -- Build up the XML
   -- Has 2100+ row nodes
   ...snipped...

   l_start := dbms_utility.get_time;
   l_ns_xml := l_result_xml.createSchemaBasedXML('blog_pref.xsd');
   
   INSERT INTO import_queue
   (...20 columns...) 
   SELECT CBIQ_ID_SEQ, CBRH_ID_SEQ,
          ...18 columns ...
     FROM XMLTABLE('/resultset/row'
                   PASSING l_ns_xml
                   COLUMNS
                   CBIQ_ID_SEQ          NUMBER(11) PATH 'CBIQ_ID_SEQ', 
                   CBRH_ID_SEQ          NUMBER(11) PATH 'CBRH_ID_SEQ',
                   ... 18 more columns ...);
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TIME in sec: ' || to_char((l_end - l_start) / 100));
   rollback;
end;
The time for a run was
INSERT INTO TIME in sec: 1024.09

This was still 16+ minutes so no improvement at all. I wasn't expecting any improvement in overall time because
A) The schema had been registered without creating any types or tables
B) The code is still parsing a PL/SQL XMLType variable.

The only difference now is that the XML is associated to a schema but the object itself is still simply a PL/SQL XMLType variable. The next approach was to use the default schema registration. The only change to the above was to use
dbms_xmlSchema.registerSchema('blog_pref.xsd', l_schema);
so that any needed tables/types would be generated. Running the above code to put the data into the import_queue table resulted in a run of
INSERT INTO TIME in sec: 1057.13

This was still 16+ minutes so no improvement at all. I wasn't expecting any improvement in overall time because I had only removed condition A) from above above and condition B) was still in effect.
To remove condition B), I rewrote the code to insert into the table created during the registerSchema process and then read from that table using XMLTable to parse the XML. I used xdb:defaultTable="LOAD_TEMP_RESULT" in the schema to name the table that Oracle created. The code ended up looking like
declare 
   l_result_xml   XMLTYPE;
   l_ns_xml       XMLTYPE;
   l_start        NUMBER;
   l_end          NUMBER;

begin
   -- Build up the XML
   -- Has 2100+ row nodes
   ...snipped...

   l_start := dbms_utility.get_time;
   INSERT INTO LOAD_TEMP_RESULT VALUES l_result_xml;
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TEMP TIME in sec: ' || to_char((l_end - l_start) / 100));

   l_start := dbms_utility.get_time;
   INSERT INTO import_queue
   (...20 columns...) 
   SELECT CBIQ_ID_SEQ, CBRH_ID_SEQ,
          ...18 columns ...
     FROM LOAD_TEMP_RESULT ltr,
          XMLTABLE('/resultset/row'
                   PASSING ltr.object_value  -- column alias name
                   COLUMNS
                   CBIQ_ID_SEQ          NUMBER(11) PATH 'CBIQ_ID_SEQ', 
                   CBRH_ID_SEQ          NUMBER(11) PATH 'CBRH_ID_SEQ',
                   ... 18 more columns ...);
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TIME in sec: ' || to_char((l_end - l_start) / 100));
   rollback;
end;
The time for a run broke down as
INSERT INTO TEMP TIME in sec: .31
INSERT INTO TIME in sec: .17


So the time Oracle spent parsing the XML into the temporary table and then retrieving the data from that and inserting it into the import_queue table was less than 0.5 seconds for 2100 rows. That was a very good improvement over 4 seconds, though it does require the registration of a schema and a table assocaited to that schema.
Still to be covered was the other option I mentioned of
store the XML as binary XML (available in 11g). Hopefully I get to that soon.