Home > Archive > Microsoft SQL Server forum > February 2006 > Tricky SQL question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Tricky SQL question
Erich93063

2006-02-25, 9:45 am

Ok so I'm gonna attempt to explain my problem as easy as I can. I have
a database of contacts (they are actually vendors). I am writing a
procedure that will query the vendors and generate a tab delimited file
that will be used an accounting program (MYOB). In the vendors database
I am exporting from, I have a field for "First Name", a field for "Last
Name" and a field (looked up by ID) for "Company Name". Now here is the
problem. In the usual case there will be a company name, a first name
and a last name for example:

Company Name: ACME Corporation
First Name: John
Last Name: Smith

HOWEVER, sometimes the vendor is not part of a company and the users of
the database have just been entering in the vendors name as the name of
the company. So it would end up looking like this:

Company Name: John Smith
First Name: John
Last Name: Smith

So the problem is that the accounting program that I am importing these
contacts into uses a weird schema. Instead of separate fields for first
name, last name and company, they have a combined field for "Company
Name/Last Name" and then a separate field for first name. So if the
first name field is blank then we know this contact is a company and
not an individual. As soon as we provide a first name, the record
becomes an individual as opposed to a company.

SO the MYOB import procedure is expecting two fields. One called
"[Co./Last Name]" and one called "[First Name]". So when I query my
current vendor database, I need to figure out if the company name
contains both the first and last name of the vendor and if so then
return only the vendors "last name" for the "[Co./Last Name]" field.
However if the company name is completely different then the first and
last name of the vendor, then I need to return the company name for
the "[Co./Last Name]" field and make the "[First Name]" field blank.

Here is what I thought would work but it doesn't:

SELECT
CASE
WHEN cu.vchrFirstName LIKE '%' + co.vchrCompanyName
+ '%'
AND cu.vchrLastName LIKE '%' + co.vchrCompanyName
+ '%'
THEN cu.vchrLastName
ELSE co.vchrCompanyName
END AS [Co./Last Name],
CASE
WHEN cu.vchrFirstName LIKE '%' + co.vchrCompanyName
+ '%'
AND cu.vchrLastName LIKE '%' + co.vchrCompanyName
+ '%'
THEN cu.vchrFirstName
ELSE ''
END AS [First Name],


I don't get an error when I use this code, but it just always fails and
ends up using the company name no matter what even though I know for
sure that the first and last names ARE in the company name.

Sorry for my long winded explanation.

THANKS

Erland Sommarskog

2006-02-25, 9:45 am

Erich93063 (erich93063@gmail.com) writes:
> I have a field for "First Name", a field for "Last Name" and a field
> (looked up by ID) for "Company Name". Now here is the problem. In the
> usual case there will be a company name, a first name and a last name
> for example:
>
> Company Name: ACME Corporation
> First Name: John
> Last Name: Smith
>
> HOWEVER, sometimes the vendor is not part of a company and the users of
> the database have just been entering in the vendors name as the name of
> the company. So it would end up looking like this:
>
> Company Name: John Smith
> First Name: John
> Last Name: Smith
>
> So the problem is that the accounting program that I am importing these
> contacts into uses a weird schema. Instead of separate fields for first
> name, last name and company, they have a combined field for "Company
> Name/Last Name" and then a separate field for first name. So if the
> first name field is blank then we know this contact is a company and
> not an individual. As soon as we provide a first name, the record
> becomes an individual as opposed to a company.
>...
> SELECT CASE WHEN cu.vchrFirstName LIKE '%' + co.vchrCompanyName + '%' AND
> cu.vchrLastName LIKE '%' + co.vchrCompanyName + '%'
> THEN cu.vchrLastName
> ELSE co.vchrCompanyName
> END AS [Co./Last Name],
> CASE WHEN cu.vchrFirstName LIKE '%' + co.vchrCompanyName + '%' AND
> cu.vchrLastName LIKE '%' + co.vchrCompanyName + '%' THEN
> cu.vchrFirstName
> ELSE ''
> END AS [First Name],


You have it backwards: You are trying to see if "John" is like
%John Smith% which it isn't. But "John Smith" is like "%John%".

Really how you should do this, I don't know, because I suspect that
the quality of the data is not perfect. It it was, you should
just say:

CASE WHEN FirstName + ' ' + LastName = CompanyName

But users may have entered extra spaces between the first and last
name - or no space at all.

You could try

CASE WHEN CompanyName LIKE '%' + FirstName '%' AND
CompanyName LIKE '%' + LastName '%'

But there is entry where the users entered X first and last name, and
the company has an X in the name...

It would have been a lot simpler, if you had permitted CompanyName
to be NULL for private vendros.

By the way, in the output, shouldn't {First Name] be NULL rather than
'' for companies?

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx
Erich93063

2006-02-25, 9:45 am

OMG Im stupid. Ok yeah so now it works perfectly! THANKS

As far as the first name being NULL or not, I am generated a CSV file
so I just need an empty string. Unless you're saying it is more optimal
to use NULL instead of an empty string. As far as what would get
outputed to the CSV file it would be the same.

THANK YOU SO MUCH!!!!!!!!!!!!

Andrew

2006-02-25, 9:45 am


"Erich93063" <erich93063@gmail.com> wrote in message
news:1140647157.967448.102260@g47g2000cwa.googlegroups.com...
> Ok so I'm gonna attempt to explain my problem as easy as I can. I have
> a database of contacts (they are actually vendors). I am writing a
> procedure that will query the vendors and generate a tab delimited file
> that will be used an accounting program (MYOB). In the vendors database
> I am exporting from, I have a field for "First Name", a field for "Last
> Name" and a field (looked up by ID) for "Company Name". Now here is the
> problem. In the usual case there will be a company name, a first name
> and a last name for example:
>
> Company Name: ACME Corporation
> First Name: John
> Last Name: Smith
>
> HOWEVER, sometimes the vendor is not part of a company and the users of
> the database have just been entering in the vendors name as the name of
> the company. So it would end up looking like this:
>
> Company Name: John Smith
> First Name: John
> Last Name: Smith
>


Unless I've misundestood what you're describing - the thing you absolutely
have to address is the data issues. It's a hoary old truism - but, GIGO.
In situations like this - I have always found that you will get a
significant number of entries which are duplicates. Eg - If you have John
Smith, Dave Jones and Min Patel from the same company - you may well have at
least 3 entries. If you have John Smith from several - perhaps competing
companies, or as individuals, you have a serious mess.
If you are trying to report on this data - you're feeding your decision
makers crap.
In the first instance - you have to try and get high level buy in to put
significant pressure on those entering the data to clean it and keep it
clean. This may involve a change to the interface at the front end - for
example, to force the user to enter a postcode/zip code for any new entry,
then next option is to choose from an entry already in the database - and
then - if it's not there - to allow them to select an option from a
commercial address lookup such as the PAF software available in the UK -
where you have to do a search linked to zip/post code.

All of this - obviously - has recource issues. You have to make it clear
that no matter how clever you try to be - you can't address shite data with
increasingly complex workrounds. A chaotic contacts database loses the
company money in a load of ways. If you claim that you can guesstimate what
people "should" have put in - the first time you fail on a juicy contract -
you're in the crap.

Your only professional and sensible option is to go to the people who are
living by this information to feed their families - get buy in and get the
people putting crap in to put it right, get agreement to implement a
strategy for ensuring that the data is useful, getting agreement that this
is stuck to.

Contacts are silver - leads are gold - you have to make sure they're usable
and tracable. Getting in touch with the wrong Mr Smith or Mr Singh is not
acceptable. Therefore, trying to address the issue by any other means than
a data cleaning exercise involving those putting in crap is not a good
option. Thwe ymay be putting in rubbish - but I bet they know who 90% + of
the people are. End users think computers can make heuristic decisions
based on their phone contacts with the customer - ot that yo can program
them. If you persist this myth - it'll get worse.

I'll be honest - an attempt at a workaround is easier to sell - but in this
case you really have to take the hard road and sell other people who know
who is who putting it right and knowing they have to keep it right. It's
not going to make you popular and by the time your company is making money
on it - you'll be forgotten. You'll be entitled to sit back and feel smug
though (;->

Summary;
There is *no* work around for unusable data. The better the data quality -
the more money your company makes. Mistakes on your on your contacts
database can be "very" embarassing and lose money - if you sell a
workaround, *you* are in the firing line. Get the data right. This helps
people using the data make money. The more they make - the more chance of
keeping your job - or getting the pay rise you deserve. This is a
management issue - *not* a technical one.

IMHO

All the best
Andrew


Erland Sommarskog

2006-02-25, 9:45 am

Erich93063 (erich93063@gmail.com) writes:
> As far as the first name being NULL or not, I am generated a CSV file
> so I just need an empty string. Unless you're saying it is more optimal
> to use NULL instead of an empty string. As far as what would get
> outputed to the CSV file it would be the same.


OK, for a CSV that would be the right thing. I got the impression that
you were inserting data into a table.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx
Erland Sommarskog

2006-02-25, 9:45 am

Andrew (andrewgothard@Nospa
mthanks.com) writes:
> Contacts are silver - leads are gold


Lead into gold? Alchemy?


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx
Sponsored Links





Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive | Programming forum archive

Copyright 2008 droptable.com