|
Home > Archive > MS SQL Server > June 2005 > Parsing Awful Data
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Parsing Awful Data
|
|
|
| Due to extremely awful design, I've ended up with a field called OrderedBy
that contains data like this:
John Smith 212-984-3272
John Smith
John Smith 2129843272
John Smith 212-9843272
etc. etc.
I intend to fix this and move the phone number out to a separate field like
anyone with half a brain would, but I need to do a report quickly that
determines all of these John Smith's are the same person (there's a field
called ClientCode which helps me determine that). I built a view for this
report and everything was fine until I realized that this data was as bad as
it is. Is there any simply way to parse this data out, as part of my view?
Or am I stuck with re-designing it without group by's, etc. so I can parse
it with VB?
Thanks,
James
| |
| Adam Machanic 2005-06-14, 1:23 pm |
| How will you know if the John Smith without a phone number is the same as
the John Smiths with phone numbers?
If there are no other characters besides hyphens in the phone numbers, you
could just do something like:
REPLACE(Col, '-', '')
Then the three with phone numbers can be grouped... But that still leaves
the fourth row.
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"James" <letyoudown@verizon.net> wrote in message
news:eu$qR7QcFHA.580@TK2MSFTNGP15.phx.gbl...
> Due to extremely awful design, I've ended up with a field called OrderedBy
> that contains data like this:
>
> John Smith 212-984-3272
> John Smith
> John Smith 2129843272
> John Smith 212-9843272
>
> etc. etc.
>
> I intend to fix this and move the phone number out to a separate field
like
> anyone with half a brain would, but I need to do a report quickly that
> determines all of these John Smith's are the same person (there's a field
> called ClientCode which helps me determine that). I built a view for this
> report and everything was fine until I realized that this data was as bad
as
> it is. Is there any simply way to parse this data out, as part of my
view?
> Or am I stuck with re-designing it without group by's, etc. so I can parse
> it with VB?
>
> Thanks,
> James
>
>
| |
|
| I don't know that they are, other than they work for the same company. I've
been told to make that assumption.
The problem is, I need to get a result set without the phone numbers in the
field, effectively returning:
JOHN SMITH
JOHN SMITH
JOHN SMITH
JOHN SMITH
based on that data. At this point, they don't "want" to parse it out into
another field.
"Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
news:eEcxP%23QcFHA.3464@tk2msftngp13.phx.gbl...
> How will you know if the John Smith without a phone number is the same as
> the John Smiths with phone numbers?
>
> If there are no other characters besides hyphens in the phone numbers, you
> could just do something like:
>
> REPLACE(Col, '-', '')
>
> Then the three with phone numbers can be grouped... But that still leaves
> the fourth row.
>
>
> --
> Adam Machanic
> SQL Server MVP
> http://www.datamanipulation.net
> --
>
>
> "James" <letyoudown@verizon.net> wrote in message
> news:eu$qR7QcFHA.580@TK2MSFTNGP15.phx.gbl...
OrderedBy[color=dark
red]
> like
field[color=darkred]
this[color=darkred]
bad[color=darkred]
> as
> view?
parse[color=darkred]
>
>
| |
| Alejandro Mesa 2005-06-14, 8:23 pm |
| Try,
select
rtrim(coalesce(left(
c1, nullif(patindex('%[0-9]%', c1), 0) - 1), c1)) as
fullname,
replace(right(c1, len(c1) - nullif(patindex('%[0-9]%', c1), 0)), '-', '')as
phone_number
from
(
select 'John Smith 212-984-3272'
union all
select 'John Smith'
union all
select 'John Smith 2129843272'
union all
select 'John Smith 212-9843272'
) as t1(c1)
AMB
"James" wrote:
> Due to extremely awful design, I've ended up with a field called OrderedBy
> that contains data like this:
>
> John Smith 212-984-3272
> John Smith
> John Smith 2129843272
> John Smith 212-9843272
>
> etc. etc.
>
> I intend to fix this and move the phone number out to a separate field like
> anyone with half a brain would, but I need to do a report quickly that
> determines all of these John Smith's are the same person (there's a field
> called ClientCode which helps me determine that). I built a view for this
> report and everything was fine until I realized that this data was as bad as
> it is. Is there any simply way to parse this data out, as part of my view?
> Or am I stuck with re-designing it without group by's, etc. so I can parse
> it with VB?
>
> Thanks,
> James
>
>
>
| |
| Adam Machanic 2005-06-14, 8:23 pm |
| It's your job to make them -think- they want to :)
Anyway, you could try:
RTRIM(
LTRIM(
LEFT(Col,
CASE PATINDEX('% [0-9]%', Col)
WHEN 0 THEN LEN(Col)
ELSE PATINDEX('% [0-9]%', Col)
END)
)
)
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"James" <letyoudown@verizon.net> wrote in message
news:eTHSlFRcFHA.3504@TK2MSFTNGP12.phx.gbl...
> I don't know that they are, other than they work for the same company.
I've
> been told to make that assumption.
>
> The problem is, I need to get a result set without the phone numbers in
the
> field, effectively returning:
>
> JOHN SMITH
> JOHN SMITH
> JOHN SMITH
> JOHN SMITH
>
> based on that data. At this point, they don't "want" to parse it out into
> another field.
>
> "Adam Machanic" <amachanic@hotmail._removetoemail_.com> wrote in message
> news:eEcxP%23QcFHA.3464@tk2msftngp13.phx.gbl...
as[color=darkred]
you[color=darkred]
leaves[color=darkred
]
> OrderedBy
> field
> this
> bad
> parse
>
>
|
|
|
|
|