Friday, September 16, 2011

Options for Slow Performance Parsing Large XML - Part 2

As I covered previously in Options for Slow Performance Parsing Large XML, I was looking for a PL/SQL only solution given the business requirements to parse an XML with just over 2100 repeating nodes and insert information from each node into a table in the DB for additional processing. The first attempt using only XMLTable to parse the PL/SQL XMLType variable took over 1000 seconds (16+ minutes). The second attempt parsed the XML by looping through it in PL/SQL and using a FORALL to insert the data into the table.  This took just over 4 seconds.

That was fine for my situation, but what if you need something faster? As I previously said
One option is to register a schema in the database, create a table based off that schema, store the XML in there and parse it that way (if still needed).

To verify the answer, I took the long way to get to the above just to see what happens.  I started by creating a simple schema for the XML, doing a basic schema registration, converting the XML to a schema based XML and parsing that PL/SQL XMLType.  The basics of that approach were

DECLARE
   l_schema  CLOB;
BEGIN
   l_schema := '<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdb="http://xmlns.oracle.com/xdb">
 <xs:element name="resultset" type="resultSetType" xdb:defaultTable="LOAD_TEMP_RESULT"/>
 <xs:complexType name="resultSetType">
  <xs:sequence>
   <xs:element name="row" type="rowType" maxOccurs="unbounded"/>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="rowType">
  <xs:sequence>
   ...16 elements of xs:int, xs:string, xs:short, xs:date ...
  </xs:sequence>
 </xs:complexType>
</xs:schema>';

   dbms_xmlSchema.registerSchema('blog_pref.xsd', l_schema, gentypes => false, gentables => false);
END;/

declare 
   l_result_xml   XMLTYPE;
   l_ns_xml       XMLTYPE;
   l_start        NUMBER;
   l_end          NUMBER;

begin
   -- Build up the XML
   -- Has 2100+ row nodes
   ...snipped...

   l_start := dbms_utility.get_time;
   l_ns_xml := l_result_xml.createSchemaBasedXML('blog_pref.xsd');
   
   INSERT INTO import_queue
   (...20 columns...) 
   SELECT CBIQ_ID_SEQ, CBRH_ID_SEQ,
          ...18 columns ...
     FROM XMLTABLE('/resultset/row'
                   PASSING l_ns_xml
                   COLUMNS
                   CBIQ_ID_SEQ          NUMBER(11) PATH 'CBIQ_ID_SEQ', 
                   CBRH_ID_SEQ          NUMBER(11) PATH 'CBRH_ID_SEQ',
                   ... 18 more columns ...);
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TIME in sec: ' || to_char((l_end - l_start) / 100));
   rollback;
end;
The time for a run was
INSERT INTO TIME in sec: 1024.09

This was still 16+ minutes so no improvement at all. I wasn't expecting any improvement in overall time because
A) The schema had been registered without creating any types or tables
B) The code is still parsing a PL/SQL XMLType variable.

The only difference now is that the XML is associated to a schema but the object itself is still simply a PL/SQL XMLType variable. The next approach was to use the default schema registration. The only change to the above was to use
dbms_xmlSchema.registerSchema('blog_pref.xsd', l_schema);
so that any needed tables/types would be generated. Running the above code to put the data into the import_queue table resulted in a run of
INSERT INTO TIME in sec: 1057.13

This was still 16+ minutes so no improvement at all. I wasn't expecting any improvement in overall time because I had only removed condition A) from above above and condition B) was still in effect.
To remove condition B), I rewrote the code to insert into the table created during the registerSchema process and then read from that table using XMLTable to parse the XML. I used xdb:defaultTable="LOAD_TEMP_RESULT" in the schema to name the table that Oracle created. The code ended up looking like
declare 
   l_result_xml   XMLTYPE;
   l_ns_xml       XMLTYPE;
   l_start        NUMBER;
   l_end          NUMBER;

begin
   -- Build up the XML
   -- Has 2100+ row nodes
   ...snipped...

   l_start := dbms_utility.get_time;
   INSERT INTO LOAD_TEMP_RESULT VALUES l_result_xml;
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TEMP TIME in sec: ' || to_char((l_end - l_start) / 100));

   l_start := dbms_utility.get_time;
   INSERT INTO import_queue
   (...20 columns...) 
   SELECT CBIQ_ID_SEQ, CBRH_ID_SEQ,
          ...18 columns ...
     FROM LOAD_TEMP_RESULT ltr,
          XMLTABLE('/resultset/row'
                   PASSING ltr.object_value  -- column alias name
                   COLUMNS
                   CBIQ_ID_SEQ          NUMBER(11) PATH 'CBIQ_ID_SEQ', 
                   CBRH_ID_SEQ          NUMBER(11) PATH 'CBRH_ID_SEQ',
                   ... 18 more columns ...);
   l_end   := dbms_utility.get_time;
   dbms_output.put_line('INSERT INTO TIME in sec: ' || to_char((l_end - l_start) / 100));
   rollback;
end;
The time for a run broke down as
INSERT INTO TEMP TIME in sec: .31
INSERT INTO TIME in sec: .17


So the time Oracle spent parsing the XML into the temporary table and then retrieving the data from that and inserting it into the import_queue table was less than 0.5 seconds for 2100 rows. That was a very good improvement over 4 seconds, though it does require the registration of a schema and a table assocaited to that schema.
Still to be covered was the other option I mentioned of
store the XML as binary XML (available in 11g). Hopefully I get to that soon.

No comments:

Post a Comment