Nobody likes hanging around for results while their SQL query sluggishly retrieves data - especially if you’ve already waited to convert Excel to MySQL or some other data conversion.
The two biggest factors affecting SQL query run-time are database design and SQL query optimization. While database design might be out of your hands, there are plenty of options for optimizing your SQL queries to decrease data retrieval time.
1. Avoid multiple joins in a single query
This means no outer joins, no cross or outer applies, and no other complex subqueries. Limiting the number of joins gives the database optimizer less choice for optimal join and order type, resulting in a faster query time.
2. Only retrieve the data you really need
Why ask your database to spend time grabbing data you don’t actually need? Small data, not big data is the key here: always ask your database for the smallest data possible and you’ll spend far less time waiting.
Generally, this means specifying specific columns (although some database software won’t allow you to do this) and using the
LIMIT clause to do the same for rows.
3. Avoid functions on the left-hand side of the operator
Functions are great for providing complex tasks. They can be used both in
SELECT and in
WHERE clauses. However,
WHERE is a top culprit in performance issues.
Look at the following example:
SELECT nickname FROM users WHERE DATEDIFF(MONTH, appointment_date, '2017-05-28') < 0;
Even if there is an index on the ‘appointment_date’ column in the ‘users’ table, the query still needs to perform a full table scan. This is because we use the DATEDIFF function on the column appointment_date. The output of the function is evaluated at run time, so the server has to visit all the rows in the table to retrieve the necessary data. To enhance performance, the following change can be made:
SELECT nickname FROM users WHERE appointment_date > '2017-04-30';
This time, we aren’t using any functions in the
WHERE clause, so the system can utilize an index to seek the data more efficiently.
4. Symbol operators - use them
Symbol operators such as
!= are very helpful for quicker queries. If a column is indexed, we can optimize a query with symbol operators.
Here’s an unoptimized query:
SELECT * FROM TABLE WHERE COLUMN > 16
For this query the DBMS will need to look for the value 16 THEN scan forward to value 16 and below. An optimized value would be:
SELECT * FROM TABLE WHERE COLUMN >= 15
This query tells the DBMS to jump straight to value 15 instead.
5. Wildcard operators - use them properly
Ah, SQL wildcard operators. If you want optimal query times, definitely don’t use these
% incorrectly. This is especially true if your tables are massive.
To efficiently use wildcard operators, you’ll need to do a postfix wildcard instead of pre or full wildcard. Something like this:
SELECT * FROM TABLE WHERE COLUMN LIKE '%hello%';
SELECT * FROM TABLE WHERE COLUMN LIKE 'hello%';
SELECT * FROM TABLE WHERE COLUMN LIKE '%hello';
Remember that the column needs to be indexed in order to carry out a postfix wildcard query.
Oh, and by the way… a full wildcard on a few million records table can kill a database.
It’s best to avoid the
NOT operator if you want fast data retrieval. The DBMS searches for an exact match (aka a positive operator) much faster.
= operators return results more efficiently than negative operators such as
NOT EXIST, or
A negative operator causes the DBMS to find every row to identify that they all do not belong or exist within the table. Whereas a positive operator stops once the result has been found, which is always going to be much faster.
7. Don’t blindly reuse code
Forever on the lookout for time-saving hacks and efficiency enhancements, it’s not uncommon for us all to reuse bits of code. Yet this isn’t always the most optimal way to query a database.
Just because you know a line of SQL returns some of the data you want it doesn’t mean it’s the best way to get the data you want. Often, reusing SQL results in bits of data you don’t need.
The best bet if you really don’t want to write your own SQL is to trim existing queries down so you only retrieve the data you really need.
This isn’t an extensive list of optimizations but it should help reduce data retrieval times.