A company compiles market research data in an industry with roughly one million outlets, each one of which may dispense any of thousands of products. The data include information on the outlets themselves (which may have several physical locations), how much product is sold by each outlet every month (products with similar uses are grouped together), and marketing activity by the various producers in the industry.
It is necessary to keep available at least 24 months of data, and users would like to be able to search aggregated ( e.g., quarterly, YTD) and other calculated values (e.g. a product's share of a certain group). Altogether, the raw data are tens of gigabytes.
They would also like to be able to make fairly large subsets of the data (generally all of that for one producer) available to sales forces offline, probably on DVD. The data are mostly numeric; on the wish list is the ability to search based on relationships among fields, as well as such staples as range searching.
Another wish is to keep data on how products are paid for (cash, credit cards, etc.), but that hasn't seriously been explored yet.
Once searches have been done, the resulting sets must be sortable (for visualization tools and report generation) based on any number of criteria – including criteria derived at runtime, for example, relations among several fields.</p> <p>The customer has been using several Oracle tables, joined on a unique identifier assigned to each outlet.
CHALLENGES
1. As stored in Oracle, the database is hundreds of gigabytes. Even producer subsets are tens of gigabytes – unwieldy for distribution to individual salespeople.
2. Performance for complex searches – anything beyond demographic information on the outlets, and simple sales searches – is extremely slow. Oracle does not allow the tens, or hundreds, of thousands of indexes required to specify precisely what is being searched, so search results must be filtered. Complex searches often cannot be done via indexes at all, and have to be batched and run weekly, in a complete database scan, which takes two or three days.
3. Once searches are done, reassembling the tables into complete records can be very slow.
4. Sorting can take hours, and sometimes has to be batched.
SOLUTIONS
As stored by Xpace (with custom compression implemented through our flexible framework), the database (including indexes) is tens, not hundreds, of gigabytes, and significant subsets can be distributed by DVD to the sales force.
1. With the database fully indexed, most searches take less than a second, and even very complex searches take just a few seconds, instead of a few days, allowing sales managers to explore different scenarios very quickly and easily
2. Since Xpace stores (and compresses) data together, retrieving records is very fast.
3. Index-based sorting is also very fast; with everything indexed, all sorting is index-based.
Since the customer adopted the new database, they have added payer data. They are also using the new database's ability to invert indexes (generally in less than a second) to obtain longitudinal slices of data, which allows reports to be generated very rapidly. The easy accessibility of indexes (through XML or an Xpace’s API) allows complete customizability of multi-index searching.