
This demo is comprised of 987 million records from the New York Stock Exchange's Trades and Quotes (TAQ) database. Each record is a stock market quote, with ten fields: - ticker symbol, date, exchange, time, bid price, bid quantity, offer price, offer quantity, mode and MMID. Most of the data are numeric, but ticker symbol, exchange, and MMID are text.
We received the data on more than 100 CDs, with no indexes; the Quotes portion of the database was compressed by the NYSE to approximately 36 gigabytes. After we copied the CDs to a hard drive, it took about a week of computing time to extract the Quotes data.
We have compressed the data to just over 4.5 gigabytes - small enough to fit on a single DVD. We've indexed all ten fields, and added three indexes: month, day of month, and hour; the indexes total less than 4.7 gigabytes, so also fit on a single DVD. You can expect to find around ten million records a second on a typical notebook computer, if you copy the indexes to the hard drive; this will improve as we continue to enhance the software
There are a few gaps in the data (notably, the NYSE didn't give us January data), and this is not an official NYSE release, but a demonstration of our compression and search technology. Check back for revised software (bug fixes and performance improvements), as well as additional demos.
Please email us at feedback@xpace.net with bug reports, comments, and suggestions, or to order the full demo on two DVDs.
Instructions