MICROSOFT SQL SERVER lets you quickly build powerful and reliable
database applications, but making those apps perform their best is tricky.
Luckily, database administrators and developers can use several simple
techniques to coax more speed out of a SQL Server database. Here are 10 such
performance-boosting tactics you can quickly and easily apply to your SQL Server
database applications.
Tip 1: Assign Data Type Numeric Columns
Database administrators and developers often
assign the CHAR (character) data type to columns containing only numeric
information. Character data columns can hold anything and are a good catch-all.
However, specifying CHAR for numeric columns can decrease performance with
filters and join conditions and can increase storage requirements. For example,
suppose you're developing an application that tracks equipment stored at several
sites. Tables 1 and 2 show the application's inventory_header and inventory_detail.
As you design your application's tables, you notice that the part_number
column always contains a number between 1 and 12,000. However, when you
create the tables, you define all these columns with a char(20) type and place
an index on these columns in each table.
A commonly used report in your application prints a listing of the
inventory-item counts at all locations. The following statements make this
report printout possible:
select b.location, a.part_number,
a.part_desc, b.part_count
from inventory_header a, inventory_detail b
where a.part_number = b.part_number
order by b.location, a.part_number
If you define the part_number columns in Tables 1 and 2 as
char(20), the join operation requires the engine to compare the values in each
table byte by byte, up to 20 times per row. Because you know the part_number
column is always numeric and between 1 and 12,000, you can redefine the part_number
column's data type in both tables as smallint. This data type can hold
values between -32767 and +32767.
This data type helps your application perform better in several ways.
First, because SQL Server stores smallint values in only 2 bytes of data, you
save a significant amount of CPU work during join operations or filter
conditions, especially if you are processing several rows. Instead of comparing
up to 20 bytes to see whether two rows are logical join candidates, the engine
can finish its work after comparing only two bytes. Thus, using smallint instead
of char(20) saves you 18 bytes per row per table for all indexes that use this
column.
Tip 2: Prevent Substring Searches
When search criteria in a SQL statement start at the leftmost position of a
column and move right, searching through the indexed character data is
inherently easier than when your search criteria begin in the middle of the
data--that is, in a substring. For example, suppose you have to locate two types
of information in a phone book: all last names starting with a capital L
and all entries with a lower-case l as the third letter in the last
name.
The first request is easy because the phone book is already indexed (last
name, first name). You simply turn to the L section of the book and read
through the names until you find the first entry with a last name starting with
M.
This type of processing is analogous to running the SQL statement
select * from names
where last_name >= 'L%''
and last_name < 'M%''
If an index is on the last_name column in the names table, the
optimizer uses the index to help process this query.
For both machines and humans, the second request is harder to process than
the first. With the phone book, you have to read the first three letters of
every entry from cover to cover and mark all entries that have l as the
third letter.
This type of processing is analogous to running a SQL statement that
searches for a substring within the last_name column.
select * from names
where substring(last_name,
3,1) = 'L''
Although an index is on last_name, the optimizer can't use it
because you're not beginning your search from the leftmost byte of the last_name
column. To satisfy your request, the optimizer has to operate sequentially
on what could be an enormous table, which definitely hurts performance.