In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The speed gain can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some languages, may include pointer functions (or offsets to labels) to process that matching input.
In the case of reducing run-time computations, a classic example is a trigonometry calculation i.e. calculating the sine of a value. This can substantially slow some applications and to avoid it, the application can take a few seconds when it first starts to precalculate the sine of a number of values, for example for each whole number of degrees. Later, when the program wants the sine of a value, it uses the lookup table to retrieve the sine of a nearby value from a memory address instead of calculating it using a mathematical formula. Lookup tables are also used by mathematics co-processors; an error in a lookup table was responsible for Intel's infamous floating-point divide bug.
Before the advent of computers, similar tables were used by people to speed up hand calculations. Particularly prevalent were tables of values for trigonometry, logarithms, and statistical density functions.
Functions of a single variable (such as sine and cosine) may be implemented by a simple array - functions involving two or more variables require multidimensional array indexing techniques. Hence, one might replace a function to calculate xy for a limited range of x and y values with a two-dimensional array power[x][y]. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.
There are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation. This allows better accuracy for values falling between two precomputed values. This requires slightly more time but can greatly enhance accuracy in applications that require it. Depending on the values being precomputed, this technique can also be used to shrink the lookup table size while retaining about the same accuracy.
In image processing, lookup tables are often called LUTs and give an output value for each of a range of index values. One common LUT, called the colormap or palette, is used to determine the colors and intensity values with which a particular image will be displayed. Windowing in computed tomography refers to a related concept.
It's important to note that, while often effective, lookup tables can result in a severe penalty if the computation it replaces is relatively simple, not only because retrieving the result from memory may require more time, but also because it may increase memory requirements and pollute the cache. If the table is large, each table access will almost certainly cause a cache miss. This is increasingly becoming an issue as processors outrace memory. A similar issue appears in rematerialization, a compiler optimization. In some environments, such as the Java programming language, table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.
There are two fundamental limitations on when it is possible to construct a lookup table for a problem. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other restriction is the time required to compute the table values in the first instance — although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. Tables can however be statically defined in many cases, avoiding any additional processing once compiled.
Examples
Computing sines
Most computers, which only perform basic arithmetic operations, cannot directly calculate the sine of a given value. Instead, they use the CORDIC algorithm or a complex formula such as the following Taylor series to compute the value of sine to a high degree of precision:
(for x close to 0)
However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x. This will be close to the correct value because sine is a continuous function with a bounded rate of change. For example:
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
function lookup_sine(x)
return sine_table[round(1000 * x / pi)]
Linear interpolation on a portion of the sine function
Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for smooth functions such as the sine function. Here is our example using linear interpolation:
function lookup_sine(x)
x1 := floor(x*1000/pi)
y1 := sine_table[x1]
y2 := sine_table[x1+1]
return y1 + (y2-y1)*(x*1000/pi-x1)
When using interpolation, it is often beneficial to use non-uniform sampling, which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation.
Sine table example
// C 8-bit Sine Table
const unsigned char sinetable256 = {
128,131,134,137,140,143,146,149,152,156,159,162,165,168,171,174,
176,179,182,185,188,191,193,196,199,201,204,206,209,211,213,216,
218,220,222,224,226,228,230,232,234,236,237,239,240,242,243,245,
246,247,248,249,250,251,252,252,253,254,254,255,255,255,255,255,
255,255,255,255,255,255,254,254,253,252,252,251,250,249,248,247,
246,245,243,242,240,239,237,236,234,232,230,228,226,224,222,220,
218,216,213,211,209,206,204,201,199,196,193,191,188,185,182,179,
176,174,171,168,165,162,159,156,152,149,146,143,140,137,134,131,
128,124,121,118,115,112,109,106,103,99, 96, 93, 90, 87, 84, 81,
79, 76, 73, 70, 67, 64, 62, 59, 56, 54, 51, 49, 46, 44, 42, 39,
37, 35, 33, 31, 29, 27, 25, 23, 21, 19, 18, 16, 15, 13, 12, 10,
9, 8, 7, 6, 5, 4, 3, 3, 2, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 2, 3, 3, 4, 5, 6, 7, 8,
9, 10, 12, 13, 15, 16, 18, 19, 21, 23, 25, 27, 29, 31, 33, 35,
37, 39, 42, 44, 46, 49, 51, 54, 56, 59, 62, 64, 67, 70, 73, 76,
79, 81, 84, 87, 90, 93, 96, 99, 103,106,109,112,115,118,121,124
};
Counting bits
Another, more discrete problem that is expensive to solve on many computers is that of counting the number of bits which are set to 1 in a number, sometimes called the population function. For example, the number 37 is 100101 in binary, so it contains three set bits. A simple piece of C code designed to count the 1 bits in a int might look like this:
int count_ones(unsigned int x) {
int result = 0;
while (x != 0)
result++, x = x & (x-1);
return result;
}
Unfortunately, this simple algorithm can take potentially hundreds of cycles on a modern architecture, because it makes many branches (loops) and branching is slow. This can be ameliorated using loop unrolling and some other more clever tricks, but there is a simple and fast solution using table lookup: simply construct a table bits_set with 256 entries giving the number of one bits set in each possible byte value. We then use this table to find the number of ones in each byte of the integer and add up the results. With no branches, four memory accesses, and almost no arithmetic, this can be dramatically faster than the algorithm above (this code assumes that int is 32-bit wide):
int count_ones(unsigned int x) {
return bits_set x & 255 + bits_set(x >> 8) & 255
+ bits_set(x >> 16) & 255 + bits_set(x >> 24) & 255;
}
Note that even this simple algorithm can be too slow now, because the code runs faster from the cache of modern processors, but lookup tables do not fit well in caches and can cause a slower access to memory (in addition it requires computing addresses within a table, to perform the four lookups needed). On a 64-bit platform, the lookup table, if used, cannot be appropriately increased in size as it would exhaust processor caches, and if the lookup table is used to count bits by group of 8, then eight successive lookups are needed and this slows down the performance.
Caches
Storage caches (including disk caches for files, or processor caches for either for code or data) work also like a lookup table: the table is built with very fast memory instead of being stored on slower external memory, and maintains two pieces of data for a subrange of bits composing an external memory (or disk) address (notably the lowest bits of any possible external address):
- one piece (the tag) contains the value of the remaining bits of the address; if these bits match with those from the memory address to read or write, then the other piece contains the cached value for this address.
- the other piece maintains the data associated to that address.
A single (fast) lookup is performed to read the tag in the lookup table at the index specified by the lowest bits of the desired external storage address, and to determine if the memory address is hit by the cache. When a hit is found, no access to external memory is needed (except for write operations, where the cached value may need to be updated asynchronously to the slower memory after some time, or if the position in the cache must be replaced to cache another addresss).
Hardware LUTs
In digital logic, an n-bit lookup table can be implemented with a multiplexer whose select lines are the inputs of the LUT and whose inputs are constants. An n-bit LUT can encode any n-input Boolean function by modeling such functions as truth tables. This is an efficient way of encoding Boolean logic functions, and LUTs with 4-6 bits of input are in fact the key component of modern FPGAs.
See also
External links
- [1] - Fast table lookup using input character as index for branch table
- [2] - 'Art of Assembly' - Calculation via Table Lookups
|