Skip to content

XAware Community

Get the Flash Player to see this player.
Flash Image Rotator Module by Joomlashack.
XAware 5.6
Need Help
Webinars and Events
Advanced Tutorials
Webinars and Events

You are here: Home arrow Forums
XAware Community Forums
Welcome, Guest
Please Login or Register.    Lost Password?
Re:Processing large Fixed Length Column format fil (1 viewing) (1) Guest
Go to bottom Post Reply Favoured: 0
TOPIC: Re:Processing large Fixed Length Column format fil
#3905
prichards (Admin)
Admin
Posts: 319
User Offline Click here to see the profile of this user
Re:Processing large Fixed Length Column format fil 11 Years, 2 Months ago Karma: 18  
With a file that large, I am guessing that you are trying to perform several million inserts. XAware is better-suited to a transactional environment (such as providing services in an SOA environment, mapping data exchanges, aggregating data for an RIA, etc.). We are not optimized for large ETL operations, however the XAware mapping features and access to data sources provide ETL-type capabilities that have been used in smaller ETL scenarios.

From a high-level perspective, if you need to insert millions of records directly from a text file into a database, my first approach would be to use the "bulk" loading utility of your database if it provides one. Oracle, MS SQLServer, and other databases provide this capability to very quickly load many records into the database, usually from either a fixed or delimited file structure. You generally create a simple control file that maps record fields to table(s) and then invoke the bulk loader with the data file, control file, and other options (such as logging errors to file, turning off DB constraints for faster loading, etc.). This approach is much faster than individual SQL inserts (I've seen up to 3 orders of magnitude faster), regardless of whether you use XAware, Java or another approach to perform individual inserts.

If this is just one part of a larger project, you can use XAware to invoke a command line bulk load utility using one of the Execute() functoids. We have used this approach on a project that shredded large XML files into delimited text files, and then invoked the loader to load those files. XAware might also be used to transform the file if you have to apply some complex business rules (note that the bulk copy utilities generally provide some transformation capabilities such as formating values or applying database functions); read the input text file, map to XML to apply transformations and write the file out to the desired bulk loader format. You would want to use output streaming (to null - so that XAware does not process or write the intermediate XML), invoking the File Write BizComp from within the File Read BizComp, to process each record one at a time.


If you do not have a bulk loader capability for your database or want a different approach, there are a few other things you could try to speed up the XAware processing; but these will probably not result in the orders of magnitude improvement that bulk loading will provide. First, you should make sure that the XAware logging level is set to "INFO" or lower (preferably "SEVERE" or off) to ensure that minimal logging occurs - "FINEST" or "DEBUG" level logging can increase the run time by as much as a factor of 10x. Secondly, you can use xa:sql_batch to perform the inserts in large batches, which may provide some improvement depending on the database performance, the network latency, etc. You will have to experiment with batch sizes and measure the performance improvement, and manage error handling.

I think you have already identified the primary approach to increasing the performance: parallelization. You will want to max out the database processing so that you are pumping in the data as fast as you can. Multiple readers is one approach such as splitting up the input file and processing each section. A 25% improvement sounds reasonable for splitting the job into 5 pieces - which includes some overhead for splitting the files, and managing errors, and so forth.

Another parallel approach that we have used in the past is to take advantage of the J2EE architecture capabilities and use multiple processes, which can be dynamically scaled, such as MDBs (Message Driven Beans). You can create queues for work to be done (such as JMS queues for a record to be inserted), and then have a configurable number of MDBs to pull a task off the queue and process it. While the file reader is loading the queue(s), the MDBs start inserting the records in the database. Again, you would have to experiment with your environment to determine the optimal number of MDBs (we had 10-50 MDBs on one project with an 8 CPU server - the overhead of additional MDBs exceeded any additional performance gain). The idea is to load the database as much as possible, so you are limited to the performance of the database server). (I think there is an example of an MDB in the XAware examples, or default install.)

You could also look at expanding your architecture environment as needed adding additional servers/CPUs for the XAware/App server, for the database server, etc. XAware can be scaled infinitely onto multiple servers - you would then need a process for directing (load balancing) requests to multiple servers, such as an ESB, App Server load balancer or other load balancing utility.

Both of these latter approaches require a larger effort to design, test, manage (errors and re-processing) and tune, but can provide significant performance improvement over a simple, single, sequential processing task.

Anyone else have some thoughts or ideas on other approaches?
 
Report to moderator   Logged Logged  
  The administrator has disabled public write access.
      Topics Author Date
    thread link
Processing large Fixed Length Column format files
dkokkeel 2009/03/10 16:36
    thread link
thread linkthread link Re:Processing large Fixed Length Column format fil
prichards 2009/03/11 15:40
    thread link
thread linkthread linkthread link Re:Processing large Fixed Length Column format fil
dkokkeel 2009/03/18 16:13
    thread link
thread linkthread linkthread linkthread link Re:Processing large Fixed Length Column format fil
prichards 2009/03/18 16:53
Go to top Post Reply
Powered by FireBoardget the latest posts directly to your desktop

Community Login

Get the Flash Player to see this player.
Flash Image Rotator Module by Joomlashack.
Commercial
Free Training
QuickStart Packages
Image 4 Title
Image 5 Title

Visit XAware.com