Identifying Strings That Can Be Treated as Numbers : SQL

Problem

You have a column that is defined to hold character data. Unfortunately, the rows contain mixed numeric and character data. Consider view V:

 create view V as
select replace(mixed,' ','') as mixed
from (
select substr(ename,1,2)||
cast(deptno as char(4))||
substr(ename,3,2) as mixed
from emp
where deptno = 10
union all
select cast(empno as char(4)) as mixed
from emp
where deptno = 20
union all
select ename as mixed
from emp
where deptno = 30
) x
select * from v

MIXED
--------------
CL10AR
KI10NG
MI10LL
7369
7566
7788
7876
7902
ALLEN
WARD
MARTIN
BLAKE
TURNER
JAMES

You want to return rows that are numbers only, or that contain at least one number. If the numbers are mixed with character data, you want to remove the characters and return only the numbers. For the sample data above you want the following result set:

    MIXED
--------
10
10
10
7369
7566
7788
7876
7902

Solution

The functions REPLACE and TRANSLATE are extremely useful for manipulating strings and individual characters. The key is to convert all numbers to a single character, which then makes it easy to isolate and identify any number by referring to a single character.

DB2

Use functions TRANSLATE, REPLACE, and POSSTR to isolate the numeric characters in each row. The calls to CAST are necessary in view V; otherwise, the view will fail to be created due to type conversion errors. You'll need the function REPLACE to remove extraneous white space due to casting to the fixed length CHAR:

  1 select mixed old,
2 cast(
3 case
4 when
5 replace(
6 translate(mixed,'9999999999','0123456789'),'9','') = ''
7 then
8 mixed
9 else replace(
10 translate(mixed,
11 repeat('#',length(mixed)),
12 replace(
13 translate(mixed,'9999999999','0123456789'),'9','')),
14 '#','')
15 end as integer ) mixed
16 from V
17 where posstr(translate(mixed,'9999999999','0123456789'),'9') > 0

MySQL

The syntax for MySQL is slightly different and will define view V as:

 create view V as
select concat(
substr(ename,1,2),
replace(cast(deptno as char(4)),' ',''),
substr(ename,3,2)
) as mixed
from emp
where deptno = 10
union all
select replace(cast(empno as char(4)), ' ', '')
from emp where deptno = 20
union all
select ename from emp where deptno = 30

Because MySQL does not support the TRANSLATE function, you must walk each row and evaluate it on a character-by-character basis.

  1 select cast(group_concat(c order by pos separator '') as unsigned)
2 as MIXED1
3 from (
4 select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
5 from V,
6 ( select id pos from t10 ) iter
7 where iter.pos <= length(v.mixed)
8 and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57
9 ) y
10 group by mixed
11 order by 1
Oracle

Use functions TRANSLATE, REPLACE, and INSTR to isolate the numeric characters in each row. The calls to CAST are not necessary in view V. Use the function REPLACE to remove extraneous white space due to casting to the fixed length CHAR. If you decide you would like to keep the explicit type conversion calls in the view definition, it is suggested you cast to VARCHAR2:

  1 select to_number (
2 case
3 when
4 replace(translate(mixed,'0123456789','9999999999'),'9')
5 is not null
6 then
7 replace(
8 translate(mixed,
9 replace(
10 translate(mixed,'0123456789','9999999999'),'9'),
11 rpad('#',length(mixed),'#')),'#')
12 else
13 mixed
14 end
15 ) mixed
16 from V
17 where instr(translate(mixed,'0123456789','9999999999'),'9') > 0

PostgreSQL

Use functions TRANSLATE, REPLACE, and STRPOS to isolate the numeric characters in each row. The calls to CAST are not necessary in view V. Use the function REPLACE ito remove extraneous white space due to casting to the fixed length CHAR. If you decide you would like to keep the explicit type conversion calls in the view definition, it is suggested you cast to VARCHAR:

  1 select cast(
2 case
3 when
4 replace(translate(mixed,'0123456789','9999999999'),'9','')
5 is not null
6 then
7 replace(
8 translate(mixed,
9 replace(
10 translate(mixed,'0123456789','9999999999'),'9',''),
11 rpad('#',length(mixed),'#')),'#','')
12 else
13 mixed
14 end as integer ) as mixed
15 from V
16 where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0

SQL Server

The built-in function ISNUMERIC along with a wildcard search allows you to easily identify strings that contains numbers, but getting numeric characters out of a string is not particularly efficient because the TRANSLATE function is not supported.

Discussion

The TRANSLATE function is very useful here as it allows you to easily isolate and identify numbers and characters. The trick is to convert all numbers to a single character; this way, rather than searching for different numbers you only search for one character.

DB2, Oracle, and PostgreSQL

The syntax differs slightly among these DBMSs, but the technique is the same. I'll use the solution for PostgreSQL for the discussion.

The real work is done by functions TRANSLATE and REPLACE. To get the final result set requires several function calls, each listed below in one query:

 
select mixed as orig,
translate(mixed,'0123456789','9999999999') as mixed1,
replace(translate(mixed,'0123456789','9999999999'),'9','') as mixed2,
translate(mixed,
replace(
translate(mixed,'0123456789','9999999999'),'9',''),
rpad('#',length(mixed),'#')) as mixed3,
replace(
translate(mixed,
replace(
translate(mixed,'0123456789','9999999999'),'9',''),
rpad('#',length(mixed),'#')),'#','') as mixed4
from V
where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0


ORIG | MIXED1 | MIXED2 | MIXED3 | MIXED4 | MIXED5
--------+--------+--------+--------+--------+--------
CL10AR | CL99AR | CLAR | ##10## | 10 | 10
KI10NG | KI99NG | KING | ##10## | 10 | 10
MI10LL | MI99LL | MILL | ##10## | 10 | 10
7369 | 9999 | | 7369 | 7369 | 7369
7566 | 9999 | | 7566 | 7566 | 7566
7788 | 9999 | | 7788 | 7788 | 7788
7876 | 9999 | | 7876 | 7876 | 7876
7902 | 9999 | | 7902 | 7902 | 7902

First, notice that any rows without at least one number are removed. How this is accomplished will become clear as you examine each of the columns in the above result set. The rows that are kept are the values in the ORIG column and are the rows that will eventually make up the result set. The first step to extracting the numbers is to use the function TRANSLATE to convert any number to a 9 (you can use any digit; 9 is arbitrary), this is represented by the values in MIXED1. Now that all numbers are 9's, they can be treating as a single unit. The next step is to remove all of the numbers by using the function REPLACE. Because all digits are now 9, REPLACE simply looks for any 9's and removes them. This is represented by the values in MIXED2. The next step, MIXED3, uses values that are returned by MIXED2. These values are then compared to the values in ORIG. If any characters from MIXED2 are found in ORIG, they are converted to the # character by TRANSLATE. The result set from MIXED3 shows that the letters, not the numbers, have now been singled out and converted to a single character. Now that all non-numeric characters are represented by #'s, they can be treated as a single unit. The next step, MIXED4, uses REPLACE to find and remove any # characters in each row; what's left are numbers only. The final step is to cast the numeric characters as numbers. Now that you've gone through the steps, you can see how the WHERE clause works. The results from MIXED1 are passed to STRPOS, and if a 9 is found (the position in the string where the first 9 is located) the result must be greater than 0. For rows that return a value greater than zero, it means there's at least one number in that row and it should be kept.

MySQL

The first step is to walk each string and evaluate each character and determine whether or not it's a number:

 
select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
from V,
( select id pos from t10 ) iter
where iter.pos <= length(v.mixed)
order by 1,2


+--------+------+------+
| mixed | pos | c |
+--------+------+------+
| 7369 | 1 | 7 |
| 7369 | 2 | 3 |
| 7369 | 3 | 6 |
| 7369 | 4 | 9 |

| ALLEN | 1 | A |
| ALLEN | 2 | L |
| ALLEN | 3 | L |
| ALLEN | 4 | E |
| ALLEN | 5 | N |

| CL10AR | 1 | C |
| CL10AR | 2 | L |
| CL10AR | 3 | 1 |
| CL10AR | 4 | 0 |
| CL10AR | 5 | A |
| CL10AR | 6 | R |
+--------+------+------+

Now that each character in each string can be evaluated individually, the next step is to keep only the rows that have a number in the C column:

 
select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
from V,
( select id pos from t10 ) iter
where iter.pos <= length(v.mixed)
and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57
order by 1,2


+--------+------+------+
| mixed | pos | c |
+--------+------+------+
| 7369 | 1 | 7 |
| 7369 | 2 | 3 |
| 7369 | 3 | 6 |
| 7369 | 4 | 9 |

| CL10AR | 3 | 1 |
| CL10AR | 4 | 0 |

+--------+------+------+

At this point, all the rows in column C are numbers. The next step is to use GROUP_CONCAT to concatenate the numbers to form their respective whole number in MIXED. The final result is then cast as a number:

 
select cast(group_concat(c order by pos separator '') as unsigned)
as MIXED1
from (
select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
from V,
( select id pos from t10 ) iter
where iter.pos <= length(v.mixed)
and ascii(substr(x.mixed,iter.pos,1)) between 48 and 57
) y
group by mixed
order by 1


+--------+
| MIXED1 |
+--------+
| 10 |
| 10 |
| 10 |
| 7369 |
| 7566 |
| 7788 |
| 7876 |
| 7902 |
+--------+

As a final note, keep in mind that any digits in each string will be concatenated to form one numeric value. For example, an input value of, say, '99Gennick87' will result in the value 9987 being returned. This is something to keep in mind, particularly when working with serialized data.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: