SQL - Ordering by a Number in a String

Problem

You want order your result set based on a number within a string. Consider the following view:

 create view V as
select e.ename ||' '||
cast(e.empno as char(4))||' '||
d.dname as data
from emp e, dept d
where e.deptno=d.deptno

This view returns the following data:

 DATA
----------------------------
CLARK 7782 ACCOUNTING
KING 7839 ACCOUNTING
MILLER 7934 ACCOUNTING
SMITH 7369 RESEARCH
JONES 7566 RESEARCH
SCOTT 7788 RESEARCH
ADAMS 7876 RESEARCH
FORD 7902 RESEARCH
ALLEN 7499 SALES
WARD 7521 SALES
MARTIN 7654 SALES
BLAKE 7698 SALES
TURNER 7844 SALES
JAMES 7900 SALES

You want to order the results based on the employee number, which falls between the employee name and respective department:

 DATA
---------------------------
SMITH 7369 RESEARCH
ALLEN 7499 SALES
WARD 7521 SALES
JONES 7566 RESEARCH
MARTIN 7654 SALES
BLAKE 7698 SALES
CLARK 7782 ACCOUNTING
SCOTT 7788 RESEARCH
KING 7839 ACCOUNTING
TURNER 7844 SALES
ADAMS 7876 RESEARCH
JAMES 7900 SALES
FORD 7902 RESEARCH
MILLER 7934 ACCOUNTING

Solution

Each solution uses functions and syntax specific to its DBMS, but the method (making use of the built-in functions REPLACE and TRANSLATE) is the same for each. The idea is to use REPLACE and TRANSLATE to remove non-digits from the strings, leaving only the numeric values upon which to sort.

DB2

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data
2 from V
3 order by
4 cast(
5 replace(
6 translate(data,repeat('#',length(data)),
7 replace(
8 translate(data,'##########','0123456789'),
9 '#','')),'#','') as integer)

Oracle

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data
2 from V
3 order by
4 to_number(
5 replace(
6 translate(data,
7 replace(
8 translate(data,'0123456789','##########'),
9 '#'),rpad('#',20,'#')),'#'))

PostgreSQL

Use the built-in functions REPLACE and TRANSLATE to order by numeric characters in a string:

 1 select data
2 from V
3 order by
4 cast(
5 replace(
6 translate(data,
7 replace(
8 translate(data,'0123456789','##########'),
9 '#',''),rpad('#',20,'#')),'#','') as integer)

MySQL and SQL Server

As of the time of this writing, neither vendor supplies the TRANSLATE function.

Discussion

The purpose of view V is only to supply rows on which to demonstrate this recipe's solution. The view simply concatenates several columns from the EMP table. The solution shows how to take such concatenated text as input and sort it by the employee number embedded within.

The ORDER BY clause in each solution may look a bit intimidating but performs quite well and is pretty straightforward once you examine it piece by piece. To order by the numbers in the string, it's easiest to remove any characters that are not numbers. Once the non-numeric characters are removed all that is left to do is cast the string of numerals into a number, then sort as you see fit. Before examining each function call it is important to understand the order in which each function is called. Starting with the innermost call, TRANSLATE (line 8 from each of the original solutions), you see that:

  1. TRANSLATE (line 8) is called and the results are returned to

  2. REPLACE (line 7) and those results are returned to

  3. TRANSLATE (line 6) and those results are returned to

  4. REPLACE (line 5) and those results are returned and finally

  5. cast into a number

The first step is to convert the numbers into characters that do not exist in the rest of the string. For this example, I chose "#" and used TRANSLATE to convert all non-numeric characters into occurrences of "#". For example, the following query shows the original data on the left and the results from the first translation:

 
select data,
translate(data,'0123456789','##########') as tmp
from V


DATA TMP
------------------------------ -----------------------
CLARK 7782 ACCOUNTING CLARK #### ACCOUNTING
KING 7839 ACCOUNTING KING #### ACCOUNTING
MILLER 7934 ACCOUNTING MILLER #### ACCOUNTING
SMITH 7369 RESEARCH SMITH #### RESEARCH
JONES 7566 RESEARCH JONES #### RESEARCH
SCOTT 7788 RESEARCH SCOTT #### RESEARCH
ADAMS 7876 RESEARCH ADAMS #### RESEARCH
FORD 7902 RESEARCH FORD #### RESEARCH
ALLEN 7499 SALES ALLEN #### SALES
WARD 7521 SALES WARD #### SALES
MARTIN 7654 SALES MARTIN #### SALES
BLAKE 7698 SALES BLAKE #### SALES
TURNER 7844 SALES TURNER #### SALES
JAMES 7900 SALES JAMES #### SALES

TRANSLATE finds the numerals in each string and converts each one to to the "#" character. The modified strings are then returned to REPLACE (line 11), which removes all occurrences of "#":

 
select data,
replace(
translate(data,'0123456789','##########'),'#') as tmp
from V


DATA TMP
------------------------------ -------------------
CLARK 7782 ACCOUNTING CLARK ACCOUNTING
KING 7839 ACCOUNTING KING ACCOUNTING
MILLER 7934 ACCOUNTING MILLER ACCOUNTING
SMITH 7369 RESEARCH SMITH RESEARCH
JONES 7566 RESEARCH JONES RESEARCH
SCOTT 7788 RESEARCH SCOTT RESEARCH
ADAMS 7876 RESEARCH ADAMS RESEARCH
FORD 7902 RESEARCH FORD RESEARCH
ALLEN 7499 SALES ALLEN SALES
WARD 7521 SALES WARD SALES
MARTIN 7654 SALES MARTIN SALES
BLAKE 7698 SALES BLAKE SALES
TURNER 7844 SALES TURNER SALES
JAMES 7900 SALES JAMES SALES

The strings are then returned to TRANSLATE once again, but this time it's the second (outermost) TRANSLATE in the solution. TRANSLATE searches the original string for any characters that match the characters in TMP. If any are found, they too are converted to "#"s. This conversion allows all non-numeric characters to be treated as a single character (because they are all transformed to the same character):

 
select data, translate(data,
replace(
translate(data,'0123456789','##########'),
'#'),
rpad('#',length(data),'#')) as tmp
from V


DATA TMP
------------------------------ ---------------------------
CLARK 7782 ACCOUNTING ########7782###########
KING 7839 ACCOUNTING ########7839###########
MILLER 7934 ACCOUNTING ########7934###########
SMITH 7369 RESEARCH ########7369#########
JONES 7566 RESEARCH ########7566#########
SCOTT 7788 RESEARCH ########7788#########
ADAMS 7876 RESEARCH ########7876#########
FORD 7902 RESEARCH ########7902#########
ALLEN 7499 SALES ########7499######
WARD 7521 SALES ########7521######
MARTIN 7654 SALES ########7654######
BLAKE 7698 SALES ########7698######
TURNER 7844 SALES ########7844######
JAMES 7900 SALES ########7900######

The next step is to remove all "#" characters through a call to REPLACE (line 8), leaving you with only numbers:

 
select data, replace(
translate(data,
replace(
translate(data,'0123456789','##########'),
'#'),
rpad('#',length(data),'#')),'#') as tmp
from V


DATA TMP
------------------------------ -----------
CLARK 7782 ACCOUNTING 7782
KING 7839 ACCOUNTING 7839
MILLER 7934 ACCOUNTING 7934
SMITH 7369 RESEARCH 7369
JONES 7566 RESEARCH 7566
SCOTT 7788 RESEARCH 7788
ADAMS 7876 RESEARCH 7876
FORD 7902 RESEARCH 7902
ALLEN 7499 SALES 7499
WARD 7521 SALES 7521
MARTIN 7654 SALES 7654
BLAKE 7698 SALES 7698
TURNER 7844 SALES 7844
JAMES 7900 SALES 7900

Finally, cast TMP to a number (line 4) using the appropriate DBMS function (often CAST) to accomplish this:

 
select data, to_number(
replace(
translate(data,
replace(
translate(data,'0123456789','##########'),
'#'),
rpad('#',length(data),'#')),'#')) as tmp
from V


DATA TMP
------------------------------ ----------
CLARK 7782 ACCOUNTING 7782
KING 7839 ACCOUNTING 7839
MILLER 7934 ACCOUNTING 7934
SMITH 7369 RESEARCH 7369
JONES 7566 RESEARCH 7566
SCOTT 7788 RESEARCH 7788
ADAMS 7876 RESEARCH 7876
FORD 7902 RESEARCH 7902
ALLEN 7499 SALES 7499
WARD 7521 SALES 7521
MARTIN 7654 SALES 7654
BLAKE 7698 SALES 7698
TURNER 7844 SALES 7844
JAMES 7900 SALES 7900

When developing queries like this, it's helpful to work with your expressions in the SELECT list. That way, you can easily view the intermediate results as you work toward a final solution. However, because the point of this recipe is to order the results, ultimately you should place all the function calls into the ORDER BY clause:

 
select data
from V
order by
to_number(
replace(
translate( data,
replace(
translate( data,'0123456789','##########'),
'#'),rpad('#',length(data),'#')),'#'))


DATA
---------------------------
SMITH 7369 RESEARCH
ALLEN 7499 SALES
WARD 7521 SALES
JONES 7566 RESEARCH
MARTIN 7654 SALES
BLAKE 7698 SALES
CLARK 7782 ACCOUNTING
SCOTT 7788 RESEARCH
KING 7839 ACCOUNTING
TURNER 7844 SALES
ADAMS 7876 RESEARCH
JAMES 7900 SALES
FORD 7902 RESEARCH
MILLER 7934 ACCOUNTING

As a final note, the data in the view is comprised of three fields, only one being numeric. Keep in mind that if there had been multiple numeric fields, they would have all been concatenated into one number before the rows were sorted.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: