Skip to content

XAware Community

Get the Flash Player to see this player.
Flash Image Rotator Module by Joomlashack.
XAware 5.6
Need Help
Webinars and Events
Advanced Tutorials
Webinars and Events

You are here: Home arrow Blogs

streamingmemory usagememory optimizationmemoryinstreaminginput parameter 28 Nov 2007 5:32 PM
avatar
Handling Very Large Files by kvandersluis

One common question I hear from users concerns how XAware handles large XML files.  The XAware Engine uses a tree-based, Document Object Model (DOM) approach to processing hierarchical data.  This means that an inbound XML message or file is read into memory before it is operated on by the Engine.  Naturally, this begs the question of how large files are handled.  A DOM approach would certainly cause issues with the multi-gigabyte files some of our production installations must process.  The answer is that XAware provides a "streaming" mode, where extremely large files can be handled in manageable chunks.

 

Streaming Mode

XAware's streaming mode is a memory optimization feature which treats a file as a set of repeating structures.  Common files that our customers have processed include lists of orders, customers, insurance policies, and invoices.  In streaming mode, the Engine retrieves the first item from the file, processes it, releases the memory used for the operation, then proceeds to the next item.  The process is repeated over the entire file.  The required memory footprint, then, is related directly to the size of this repeated structure, rather than the size of the file as a whole.  In this way, XAware can process files of any size.

 

Design Steps

When designing a BizDocument to handle a very large file, the typical design flow follows this pattern:

  1. Create the BizDocument as usual, from an XML Schema or a small XML sample of the file (right-click on the XML Schema file, select Generate | BizDocument).
  2. Design, test, and debug the BizDocument using a small XML sample file as input.  Your sample input should include just a couple of the repeating items.
  3. Apply the streaming parameter.
  4. Retest with large files.

 

Example Implementation

As a simple example, I've written a small BizDocument that reads an XML file of customers that looks like this:

 

<CustomerList>

    <Customer>

        <CustID>11201</CustID>

        <Type>Consumer</Type>

        <Name>

            <Last>Jackson</Last>

            <First>Allison</First>

            <Mid>A</Mid>

        </Name>

        <Address>9983 Dole Circle</Address>

        <City>Colorado Springs</City>

        <State>CO</State>

        <ZipCode>80920</ZipCode>

    </Customer>

    <Customer>

        <CustID>11312</CustID>

        <Type>Consumer</Type>

        <Name>

            <Last>Phillips</Last>

            <First>Raymond</First>

            <Mid>J</Mid>

        </Name>

        <Address>108 Main St</Address>

        <City>Colorado Springs</City>

        <State>CO</State>

        <ZipCode>80906</ZipCode>

    </Customer>

...

</CustomerList>

 

The BizDocument stores each customer into a MySQL database.  Here is the BizDocument.  For those not familiar with XAware Designer, note that I am showing the raw XML here, for ease of incorporating into an article.  You will usually work in the tree view in Designer which is much easier to read and work with, though harder to incorporate into an article!

 

<save xmlns:xa="http://xaware.org/xas/ns1" xa:version="5.0"

xa:on_error="xa-doc::/save/Error">

    <xa:description>process a file of customers, storing each in the database</xa:description>

    <save xa:bizcomp="customer/saveCustomer.xbc"

          xa:input="xa-input::/CustomerList/Customer" xa:remove="yes" />

    <Error xa:include="no">

        <detail>$xavar:error_stack$</detail>

    </Error>

</save>

 

In this BizDocument, the bulk of the work is being done by the BizComponent "customer/saveCustomer.xbc", which matches elements at the path "/CustomerList/Customer".  It is this BizComponent that transforms the XML customer structure into an insert statement and executes it.  The result is one insert statement for every Customer element from the input file or message.

 

To enable this BizDocument to process a very large file of Customers, I need to apply the streaming setting as I've mentioned previously.  In addition, whereas the original BizDocument takes its input from an arbitrary in-memory structure, I now want to specify that we'll be reading input from a file, a very common use case.  It makes sense, then, to introduce an input parameter to the BizDocument to pass in the name of the input file.

Enable Streaming

The stream parameter is established with the Designer menu option Edit | Insert/Edit | Stream Parameters.  You will be prompted for a sample input file name from which the expected XML input format is derived.  Then you can establish one or more XPath expressions into the file which trigger BizDocument execution.  Use the path tool to select the Customer element, resulting in an XPath expression "xa-input::/CustomerList/Customer".

 

Create Input Parameter

Next, create an input parameter for the BizDocument called "filename".  Use the menu option  Edit | Insert/Edit | Interface to do this.  Finally, change the xa:source attribute to reference the input parameter "%filename%" rather than the sample file you are using.  The resulting input streaming BizDocument now has an xa:stream and xa:input element:

 

<save xmlns:xa="http://xaware.org/xas/ns1" xa:version="5.0" xa:on_error="xa-doc::/save/Error" xa:visible="yes">

    <xa:description>process a file of customers, storing  in the database</xa:description>

    <xa:stream>

        <xa:instream xa:streamtype="file" xa:source="%filename%"

                     xa:match_stream="xa-input::/CustomerList/Customer" />

    </xa:stream>

    <xa:input>

        <xa:param xa:name="filename" xa:datatype="string"

                  xa:default="customer/data/customer2.xml"

                  xa:description="file containing input customer list" />

    </xa:input>

    <save xa:bizcomp="customer/saveCustomer.xbc"

          xa:input="xa-input::/CustomerList/Customer" xa:remove="yes" />

    <Error xa:include="no">

        <detail>$xavar:error_stack$</detail>

    </Error>

</save>

 

The BizDocument can be run in Designer, deployed and run on the server, or run by the stand-alone batch program XABizDoc.  When run, the filename parameter specifies the name of the file containing the CustomerList XML document.  The BizDocument is essentially invoked once for each Customer in the list, resulting in a new record in the database for each customer.

Summary

In this article, I've discussed the need to handle very large XML files, and how XAware supports this need with streaming mode.  While I discussed input streaming to handle very large input files, a similar feature called output streaming is used when you need to generate very large XML files.  I'll write about output streaming at a later time.



Trackback(0)
Comments (0)add comment

Write comment
You must be logged in to a comment. Please register if you do not have an account yet.

busy

Get the Flash Player to see this player.
Flash Image Rotator Module by Joomlashack.
Commercial
Free Training
QuickStart Packages
Image 4 Title
Image 5 Title

Polls

Which data source and BizComponent combinations do you most frequently use?
 

Blogs

Recent Entries

Visit XAware.com