Determining Whether a String Is Alphanumeric : SQL

Problem

You want to return rows from a table only when a column of interest contains no characters other than numbers and letters. Consider the following view V (SQL Server users will use the operator "+" for concatenation instead of "||"):

 create view V as
select ename as data
from emp
where deptno=10
union all
select ename||', $'|| cast(sal as char(4)) ||'.00' as data
from emp
where deptno=20
union all
select ename|| cast(deptno as char(4)) as data
from emp
where deptno=30

The view V represents your table, and it returns the following:

 DATA
--------------------
CLARK
KING
MILLER
SMITH, $800.00
JONES, $2975.00
SCOTT, $3000.00
ADAMS, $1100.00
FORD, $3000.00
ALLEN30
WARD30
MARTIN30
BLAKE30
TURNER30
JAMES30

However, from the view's data you want to return only the following records:

 DATA
-------------
CLARK
KING
MILLER
ALLEN30
WARD30
MARTIN30
BLAKE30
TURNER30
JAMES30

In short, you wish to omit those rows containing data other than letters and digits.

Solution

It may seem intuitive at first to solve the problem by searching for all the possible non-alphanumeric characters that can be found in a string, but, on the contrary, you will find it easier to do the exact opposite: find all the alphanumeric characters. By doing so, you can treat all the alphanumeric characters as one by converting them to one single character. The reason you want to do this is so the alphanumeric characters can be manipulated together, as a whole. Once you've generated a copy of the string in which all alphanumeric characters are represented by a single character of your choosing, it is easy to isolate the alphanumeric characters from any other characters.

DB2

Use the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. For DB2 users, the CAST function calls in view V are necessary; otherwise, the view cannot be created due to type conversion errors. Take extra care when working with casts to CHAR as they are fixed length (padded):

 1 select data
2 from V
3 where translate(lower(data),
4 repeat('a',36),
5 '0123456789abcdefghijklmnopqrstuvwxyz') =
6 repeat('a',length(data))

MySQL

The syntax for view V is slightly different in MySQL:

 create view V as
select ename as data
from emp
where deptno=10
union all
select concat(ename,', $',sal,'.00') as data
from emp
where deptno=20
union all
select concat(ename,deptno) as data
from emp
where deptno=30

Use a regular expression to easily find rows that contain non-alphanumeric data:

 1 select data
2 from V
3 where data regexp '[^0-9a-zA-Z]' = 0

Oracle and PostgreSQL

Use the function TRANSLATE to convert all alphanumeric characters to a single character, then identify any rows that have characters other than the converted alphanumeric character. The CAST function calls in view V are not needed for Oracle and PostgreSQL. Take extra care when working with casts to CHAR as they are fixed length (padded). If you decide to cast, cast to VARCHAR or VARCHAR2:

 1 select data
2 from V
3 where translate(lower(data),
4 '0123456789abcdefghijklmnopqrstuvwxyz',
5 rpad('a',36,'a')) = rpad('a',length(data),'a')

SQL Server

Because SQL Server does not support a TRANSLATE function, you must walk each row and find any that contains a character that contains a non-alphanumeric value. That can be done many ways, but the following solution uses an ASCII-value evaluation:

  1 select data
2 from (
3 select v.data, iter.pos,
4 substring(v.data,iter.pos,1) c,
5 ascii(substring(v.data,iter.pos,1)) val
6 from v,
7 ( select id as pos from t100 ) iter
8 where iter.pos <= len(v.data)
9 ) x
10 group by data
11 having min(val) between 48 and 122

Discussion

The key to these solutions is being able to reference multiple characters concurrently. By using the function TRANSLATE you can easily manipulate all numbers or all characters without having to "iterate" and inspect each character one by one.

DB2, Oracle, and PostgreSQL

Only 9 of the 14 rows from view V are alphanumeric. To find the rows that are alphanumeric only, simply use the function TRANSLATE. In this example, TRANSLATE converts characters 09 and az to "a". Once the conversion is done, the converted row is then compared with a string of all "a" with the same length (as the row). If the length is the same, then you know all the characters are alphanumeric and nothing else.

By using the TRANSLATE function (using the Oracle syntax):

 where translate(lower(data),
'0123456789abcdefghijklmnopqrstuvwxyz',
rpad('a',36,'a'))

you convert all numbers and letters into a distinct character (I chose "a"). Once the data is converted, all strings that are indeed alphanumeric can be identified as a string comprising only a single character (in this case, "a"). This can be seen by running TRANSLATE by itself:

 
select data, translate(lower(data),
'0123456789abcdefghijklmnopqrstuvwxyz',
rpad('a',36,'a'))
from V


DATA TRANSLATE(LOWER(DATA)
-------------------- ---------------------
CLARK aaaaa

SMITH, $800.00 aaaaa, $aaa.aa

ALLEN30 aaaaaaa


The alphanumeric values are converted, but the string lengths have not been modified. Because the lengths are the same, the rows to keep are the ones for which the call to TRANSLATE returns all a's. You keep those rows, rejecting the others, by comparing each original string's length with the length of its corresponding string of a's:

 
select data, translate(lower(data),
'0123456789abcdefghijklmnopqrstuvwxyz',
rpad('a',36,'a')) translated,
rpad('a',length(data),'a') fixed
from V


DATA TRANSLATED FIXED
-------------------- -------------------- ----------------
CLARK aaaaa aaaaa

SMITH, $800.00 aaaaa, $aaa.aa aaaaaaaaaaaaaa

ALLEN30 aaaaaaa aaaaaaa


The last step is to keep only the strings where TRANSLATED equals FIXED.

MySQL

The expression in the WHERE clause:

 where data regexp '[^0-9a-zA-Z]' = 0

causes rows that have only numbers or characters to be returned. The value ranges in the brackets, "0-9a-zA-Z", represent all possible numbers and letters. The character "^" is for negation, so the expression can be stated as "not numbers or letters." A return value of 1 is true and 0 is false, so the whole expression can be stated as "return rows where anything other than numbers and letters is false."

SQL Server

The first step is to walk each row returned by view V. Each character in the value returned for DATA will itself be returned as a row. The values returned by C represent each individual character for the values returned by DATA:

 +-----------------+------+------+------+
| data | pos | c | val |
+-----------------+------+------+------+
| ADAMS, $1100.00 | 1 | A | 65 |
| ADAMS, $1100.00 | 2 | D | 68 |
| ADAMS, $1100.00 | 3 | A | 65 |
| ADAMS, $1100.00 | 4 | M | 77 |
| ADAMS, $1100.00 | 5 | S | 83 |
| ADAMS, $1100.00 | 6 | , | 44 |
| ADAMS, $1100.00 | 7 | | 32 |
| ADAMS, $1100.00 | 8 | $ | 36 |
| ADAMS, $1100.00 | 9 | 1 | 49 |
| ADAMS, $1100.00 | 10 | 1 | 49 |
| ADAMS, $1100.00 | 11 | 0 | 48 |
| ADAMS, $1100.00 | 12 | 0 | 48 |
| ADAMS, $1100.00 | 13 | . | 46 |
| ADAMS, $1100.00 | 14 | 0 | 48 |
| ADAMS, $1100.00 | 15 | 0 | 48 |

Inline view Z not only returns each character in the column DATA row by row, it also provides the ASCII value for each character. For this particular implementation of SQL Server, the range 48122 represents alphanumeric characters. With that knowledge, you can group each row in DATA and filter out any such that the minimum ASCII value is not in the 48122 range.

** If you want the Full Table detail. Refer the SQL Table in Label List. Or Click here to View the Table

0 Comments: