Thursday, March 29, 2012

getting the middle initial from a name column

-- I have a first name field that sometimes contains only the first name,
-- sometimes contains the first name and the middle initial, and
-- sometimes contains the first name, a space, followed by NMI (for no middle initial)
-- how do I correctly grab the first letter of the middle initial in all cases?
-- I have been playing with patindex but its harder than I thought. guess I need a case
-- statement in addition to this. Any idea how I can do this?
-- thanks!

create table UHS_t1 (c1 varchar(20))
insert UHS_t1 select 'john a'
insert UHS_t1 select 'jeff b'
insert UHS_t1 select 'sue z'
insert UHS_t1 select 'joe nmi'
insert UHS_t1 select 'jamie'

select *, substring(c1, patindex('%[ ]%', c1)+1, 1) as middle_name
from UHS_t1
go
drop table UHS_t1When you insert the value to Firstname field, you need to add a specific character between First Name and Middle Name. That character is your pattern to use PATINDEX.|||Thanks for your reply...but
I am working with a database with millions of rows in the name column that have been entered via a front end app.
(I do not have control over the data)
During the data entry, the users sometimes added the middle initial, sometimes did not and sometimes typed in MNI or nmi...
The data above was just sample data to explain my predicament.
I am trying to extract the middle initial from this column, when it exists.

I tried

,CASE
WHEN substring(c1, patindex('%[ ]%', c1)+1, 3) = 'nmi' THEN ''
WHEN substring(c1, patindex('%[ ]%', c1)+1, 3) = 'NMI' THEN ''
WHEN substring(c1, patindex('%[ ]%', c1)+1, 1) IS NOT NULL THEN substring(c1, patindex('%[ ]%', c1)+1, 1)

ELSE ''
END

but when I have a column with value = john, I get j when I expect ''.

Any idea what SQL will give me what i need?|||Is there a space between the First Name and Middle Name if Middle Name is entered?|||You were close. This works...

SELECT CASE
WHEN PATINDEX('% %',c1) BETWEEN 1 AND DATALENGTH(c1)
THEN CASE
WHEN patindex('% nmi%', c1) > 1 THEN ''
WHEN patindex('% NMI%', c1) > 1 THEN ''
ELSE substring(c1, patindex('% %', c1)+1, 1)
END
ELSE ''
END
FROM #UHS_t1

you just needed to nest a pair of cases. The outer one checks to see if a middle initial or NMI appears, while the inner one extracts the MI when it does. The leading space in the inner check for the characters nmi is important, otherwise it might find nmi buried in someone's first name.|||Awesome!
I am almost there...

Of course, murphy's law struck, I found a few (<100) data values that had the column similar to "S. Joe"

in which case our SQL returns J. i.e. it is doing what we ask it. Only problem is S is the middle initial and Joe is the first name. Yuckkk
I guess this is just a case of a field that was used badly, the front end app should never have had a free form text entry to allow middle initial inserted into a first name field a haphazard way! Tooo bad the guy who designed the app is long gone, otherwise I could have yelled at someone instead of having to learn t-SQL magic to clean this up! haha.

Anyway, thanks for all your help. I really appreciate it!|||This procedure is kinda ugly, but it works well for the project it was created for. It takes a name as a single parameter and splits it out into prefix, first, middle, last, and suffix. Enjoy or ignore as appropriate...

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[usp_name_split]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)
drop procedure [dbo].[usp_name_split]
GO

SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS ON
GO

/*
************************************************** *****
* Procedure usp_name_split *
* Created 10/10/2002 by Ken C Stuber *
* *
* Takes a single string of a persons name and *
* Splits it into it's component parts *
* (prefix, first name, middle name, last name, suffix*
************************************************** *****

This procedure assumes the name is presented in one of the following formats:
(the middle initial can be the middle name spelled out)

'Mr. Ken C. Stuber Jr.'
'Mr. Ken Stuber Jr.'
'Ken Stuber'
'Ken C. Stuber'
'Mr. Ken Stuber'
'Mr. Ken C. Stuber'
'Ken Stuber Jr.'
'Ken C. Stuber Jr.'
'Stuber, Ken'
'Stuber, Ken C.'
'Stuber Jr., Ken'
'Stuber Jr., Ken C.'
'Stuber Jr., Mr. Ken'
'Stuber Jr., Mr. Ken C.'
'Stuber, Mr. Ken'
'Stuber, Mr. Ken C.'
'Stuber'

This procedure WILL NOT work properly if the name is presented in one of the following formats
or if any name parts are not in the correct order.:
'Stuber Jr.'
'C. Stuber Jr.'
'C. Stuber'
'Mr. Ken'
'Mr. Ken C.'
'Ken'
'Ken C.'
'Stuber, C.'

Accepted Prefixes are as follows (with or without periods):
Dr.
Mr.
Mrs.
Ms.
Miss.

and suffixes are as follows (with or without periods):
II
III
IV
Jr.
M.D.
Sr.

In cases where a three part name is given and last name is not presented first and none of the parts
is as presented above for prefix and suffix, the second part is assumed to be a middle name.

In cases where a four part name is given and last name is not presented first and none of the parts
is as presented above for prefix or suffix, the first part is assumed to be a prefix and the third
part is assumed to be a middle name.
*/

CREATE PROCEDURE usp_name_split
(
@.name_string VARCHAR(255),
@.prefix VARCHAR(100) OUTPUT,
@.fname VARCHAR(100) OUTPUT,
@.mname VARCHAR(100) OUTPUT,
@.lname VARCHAR(100) OUTPUT,
@.suffix VARCHAR(100) OUTPUT
)
AS

BEGIN

DECLARE @.index SMALLINT
DECLARE @.first_part VARCHAR(100)
DECLARE @.second_part VARCHAR(100)
DECLARE @.third_part VARCHAR(100)
DECLARE @.fourth_part VARCHAR(100)
DECLARE @.fifth_part VARCHAR(100)

SET @.first_part = ''
SET @.second_part = ''
SET @.third_part = ''
SET @.fourth_part = ''
SEt @.fifth_part = ''
SET @.name_string = LTRIM(RTRIM(@.name_string))

--Split name into it's 1 to 5 parts
SET @.index = CHARINDEX(' ',@.name_string)
IF @.index = 0
BEGIN
SET @.first_part = @.name_string
END
ELSE
BEGIN
SET @.first_part = LEFT(@.name_string,@.index-1)
SET @.name_string = RIGHT (@.name_string,LEN(@.name_string)-@.index)
SET @.index = CHARINDEX(' ',@.name_string)
IF @.index <> 0
BEGIN
SET @.second_part = LEFT(@.name_string,@.index-1)
SET @.name_string = RIGHT (@.name_string,LEN(@.name_string)-@.index)
SET @.index = CHARINDEX(' ',@.name_string)
IF @.index <> 0
BEGIN
SET @.third_part = LEFT(@.name_string,@.index-1)
SET @.name_string = RIGHT (@.name_string,LEN(@.name_string)-@.index)
SET @.index = CHARINDEX(' ',@.name_string)
IF @.index <> 0
BEGIN
SET @.fourth_part = LEFT(@.name_string,@.index-1)
SET @.fifth_part = RIGHT(@.name_string,LEN(@.name_string)-@.index)
END
ELSE
BEGIN
SET @.fourth_part = @.name_string
END
END
ELSE
BEGIN
SET @.third_part = @.name_string
END
END
ELSE
BEGIN
SET @.second_part = @.name_string
END
END

--Determine if only one name is provided
IF @.second_part = ''
BEGIN
SET @.lname = @.first_part
END
--Determine if last name is presented first, with no suffix and assign parts
ELSE IF CHARINDEX(',',@.first_part) <> 0
BEGIN
SET @.lname = LEFT(@.first_part,LEN(@.first_part)-1)
IF @.third_part = ''
BEGIN
SET @.fname = @.second_part
END
ELSE
BEGIN
IF @.fourth_part = ''
BEGIN
IF LOWER(@.second_part) IN ('dr','mr','mrs','ms','miss','dr.','mr.','mrs.','m s.','miss.')
BEGIN
SET @.fname = @.third_part
SET @.index = CHARINDEX('.',@.second_part)
IF @.index = 0
BEGIN
SET @.second_part = @.second_part + '.'
END
SET @.prefix = @.second_part
END
ELSE
BEGIN
SET @.fname = @.second_part
SET @.mname = @.third_part
END
END
ELSE
BEGIN
SET @.index = CHARINDEX('.',@.second_part)
IF @.index = 0
BEGIN
SET @.second_part = @.second_part + '.'
END
SET @.prefix = @.second_part
SET @.fname = @.third_part
SET @.mname = @.fourth_part
END
END
END
--Determine if last name is presented first, with suffix and assign parts
ELSE IF CHARINDEX(',',@.second_part) <> 0
BEGIN
SET @.lname = @.first_part
SET @.suffix = LEFT(@.second_part,LEN(@.second_part)-1)
IF @.fourth_part = ''
BEGIN
SET @.fname = @.third_part
END
ELSE
BEGIN
IF @.fifth_part = ''
BEGIN
IF LOWER(@.third_part) IN ('dr','mr','mrs','ms','miss','dr.','mr.','mrs.','m s.','miss.')
BEGIN
SET @.fname = @.fourth_part
SET @.index = CHARINDEX('.',@.third_part)
IF @.index = 0
BEGIN
SET @.third_part = @.third_part + '.'
END
SET @.prefix = @.third_part
END
ELSE
BEGIN
SET @.fname = @.third_part
SET @.mname = @.fourth_part
END
END
ELSE
BEGIN
SET @.index = CHARINDEX('.',@.third_part)
IF @.index = 0
BEGIN
SET @.third_part = @.third_part + '.'
END
SET @.prefix = @.third_part
SET @.fname = @.fourth_part
SET @.mname = @.fifth_part
END
END
END
--last name is not presented first, all five parts present
ELSE IF LEN(@.fifth_part) > 0
BEGIN
SET @.index = CHARINDEX('.',@.first_part)
IF @.index = 0
BEGIN
SET @.first_part = @.first_part + '.'
END
SET @.prefix = @.first_part
SET @.fname = @.second_part
SET @.mname = @.third_part
SET @.lname = @.fourth_part
SET @.index = CHARINDEX('.',@.fifth_part)
IF @.index = 0
BEGIN
SET @.fifth_part = @.fifth_part + '.'
END
SET @.suffix = @.fifth_part
END
--last name is not presented first, only two parts present
ELSE IF LEN(@.third_part) = 0
BEGIN
SET @.fname = @.first_part
SET @.lname = @.second_part
END
--last name is not presented first, three parts present
ELSE IF LEN(@.fourth_part) = 0
BEGIN
IF LOWER(@.first_part) IN ('dr','mr','mrs','ms','miss','dr.','mr.','mrs.','m s.','miss.')
BEGIN
SET @.index = CHARINDEX('.',@.first_part)
IF @.index = 0
BEGIN
SET @.first_part = @.first_part + '.'
END
SET @.prefix = @.first_part
SET @.fname = @.second_part
SET @.lname = @.third_part
END
ELSE IF LOWER(@.third_part) IN ('II','III','IV','jr.','m.d.','sr.','jr','md','sr' ,'md.')
BEGIN
SET @.fname = @.first_part
SET @.lname = @.second_part
SET @.index = CHARINDEX('.',@.third_part)
IF @.index = 0
BEGIN
SET @.third_part = @.third_part + '.'
END
SET @.suffix = @.third_part
END
ELSE
BEGIN
SET @.fname = @.first_part
SET @.mname = @.second_part
SET @.lname = @.third_part
END
END
--last name not presented first, four parts present
ELSE
BEGIN
IF LOWER(@.first_part) IN ('dr','mr','mrs','ms','miss','dr.','mr.','mrs.','m s.','miss.')
BEGIN
SET @.index = CHARINDEX('.',@.first_part)
IF @.index = 0
BEGIN
SET @.first_part = @.first_part + '.'
END
SET @.prefix = @.first_part
SET @.fname = @.second_part
IF LOWER(@.fourth_part) IN ('II','III','IV','jr.','m.d.','sr.','jr','md','sr' ,'md.')
BEGIN
SET @.lname = @.third_part
SET @.index = CHARINDEX('.',@.fourth_part)
IF @.index = 0
BEGIN
SET @.fourth_part = @.fourth_part + '.'
END
SET @.suffix = @.fourth_part
END
ELSE
BEGIN
SET @.mname = @.third_part
SEt @.lname = @.fourth_part
END
END
ELSE IF LOWER(@.fourth_part) IN ('II','III','IV','jr.','m.d.','sr.','jr','md','sr' ,'md.')
BEGIN
SET @.index = CHARINDEX('.',@.fourth_part)
IF @.index = 0
BEGIN
SET @.fourth_part = @.fourth_part + '.'
END
SET @.suffix = @.fourth_part
IF LOWER(@.first_part) IN ('dr','mr','mrs','ms','miss','dr.','mr.','mrs.','m s.','miss.')
BEGIN
SET @.index = CHARINDEX('.',@.first_part)
IF @.index = 0
BEGIN
SET @.first_part = @.first_part + '.'
END
SET @.prefix = @.first_part
SET @.fname = @.second_part
SET @.lname = @.third_part

END
ELSE
BEGIN
SET @.fname = @.first_part
SET @.mname = @.second_part
SET @.lname = @.third_part
END
END
ELSE
BEGIN
SET @.index = CHARINDEX('.',@.first_part)
IF @.index = 0
BEGIN
SET @.first_part = @.first_part + '.'
END
SET @.prefix = @.first_part
SET @.fname = @.second_part
SET @.mname = @.third_part
SET @.lname = @.fourth_part
END
END
END

GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO|||How about:

select substring (c1, patindex (c1, '% [a-zA-Z] %'), 1)

This should pick up any alpha character isolated by a space on either side. I have not tested it, so a +1 or -1 may have to be thrown in for good measure.|||Thought it looked promissing...have to look at it some more..

How's bean town...

it 23 heading in to tyhe low teens tonight...

USE Pubs
GO
SELECT fname+ ' ' + minit + ' '+ lname AS Names INTO myTable99 FROM employee
GO
SELECT Names, SUBSTRING(Names, PATINDEX(Names,'% [a-zA-Z] %'),1) FROM myTable99
GO
DROP TABLE myTable99
GO|||Hmm. It has a few limitations... Here. Try these:

create table test11
(c1 varchar(20))

insert into test11
values ('brett a kaiser')
insert into test11
values ('m crowley')
insert into test11
values ('g w bush')
insert into test11
values ('g h w bush')
insert into test11
values ('brett b. kaiser')

select case when patindex ('% [a-zA-Z] %', c1) > 0 then substring (c1, patindex ('% [a-zA-Z] %', c1) + 1, 1)else null end, c1
from test11

Had to add the case statement, or it would add the first letter of the first name on everyone with no middle initial.|||ooooooooooooooooooo.....

I like it!

Would definetley make a good function...just need to get rid of the punctuation...

CREATE FUNCTION udf_MI
(@.x varchar(256))
RETURNS char(1)
AS
BEGIN
DECLARE @.y char(1)
SELECT @.x = REPLACE(@.x,'.','')
SELECT @.y = CASE WHEN PATINDEX('% [a-zA-Z] %', @.x) > 0
THEN SUBSTRING(@.x,PATINDEX('% [a-zA-Z] %',@.x)+ 1,1)
ELSE NULL
END
RETURN @.y
END
GO

SELECT c1, dbo.udf_MI(c1) FROM Test11
GO

DROP FUNCTION udf_MI
GO

No comments:

Post a Comment