One of the world's largest operators of aircraft collects data from sensors on each plane and helicopter in its fleet. Data on fixed-wing aircraft are collected ten times a second; helicopters report 40 times a second. While an aircraft is operating, the data are stored in flash memory, which is then copied onto a computer at the aircraft's base of operations. The operator would like to use the data to coach pilots and ground crews, to determine factors influencing need for maintenance and repair, to provide early warning of potential problems, and to provide forensic information in case of accidents.
Some of the aircraft operate from remote bases without database management expertise, a lot of IT infrastructure, and even without a reliable broadband connection. The operator would like these bases to be able to incorporate their own data into tools like Microsoft Flight Simulatorâ„¢ on-site for immediate coaching, as well as to have archival access to full flight databases for each of the types of aircraft they fly.
CHALLENGES
This fleet includes over 2,000 aircraft in many classes and configurations. Each aircraft uses thousands of sensors generating data at rates of 36,000 or 144,000 values per hour. The resulting table of flight data often contains tens of billions of rows. Even within a specific class, sensors are not always equivalent, so some tables include more than 10,000 columns. Traditional relational databases are overwhelmed by the volume, rate and disparity of the data, forcing data to be split up into common and extended tables; the latter must be converted into (entity-attribute-value) triplets, making this very large database nearly unmanageable.
In addition to these typical problems associated with trying to manage very large archives (billions of records) with traditional tools at reasonable cost, the operator had significant problems transforming and loading raw flight data. Standard Extract, Transfer, Load (ETL) tools are not well suited to processing this volume of data on commodity hardware; in particular, raw data arrives in column order and are not always sorted. This requires data to be transposed into row order, sorted, converted, and manipulated into other custom formats. The previous solution with all these challenges processed data at a rate of several flights per hour, which was far from mission acceptable goals: aircraft generated more data than the system could ingest, so analysts were forced to choose which sets of data to process.
SOLUTIONS
Xpace first helped redesign the existing ETL process, providing highly efficient custom code for core, performance-intensive functions, reducing the time required to convert a flight data set from minutes to seconds. Our team worked closely with the analytic and design team defining and implementing new capabilities beyond the reach of standard tools. Using these tools, the operator's IT contractor is building a custom ETL system to produce CSV files for common and extended data for ingestion into legacy Oracle databases, .NET DataTables for analysis tools, as well as Xpace archives.
Besides much faster build and retrieval, Xpace provides dramatically improved storage, with an Xpace archive only a few percent the size of the equivalent Oracle database; in fact, Xpace's data store is less than half the size of zipped raw data. This will allow a year's data to be managed with a high-end notebook or stored on an external hard drive, instead of a server farm, providing analysts very fast, convenient access to flight data.