|
Home > Archive > MS SQL Server > March 2006 > Ignored words in the full text query
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Ignored words in the full text query
|
|
| Griff 2006-03-13, 11:23 am |
| Got a bizarre problem
In our search box, someone types in the search term:
A$
This results in the PARTIAL SQL:
......AND (CONTAINS(TableToQue
ry, ' ""A$"" '))
And this results in the error:
Server: Msg 7619, Level 16, State 1, Line 1
Execution of a full-text operation failed. A clause of the query
contained only ignored words.
The set of "noise" words:
C:\Program Files\Microsoft SQL
Server\MSSQL\FTDATA\
SQLServer\Config\noi
se.eng
has two lines that appear relevant:
$
a b c d e f g h i j k l m n o p q r s t u v w x y z
But the "word" being searched is "A$" and not "A" with "$".
Can someone explain why I'm getting this particular error?
Thanks in advance
Griff
PS - Using SQLServer 2000 on Windows 2003 being queried from an ASP script
running in IIS under Windows 2000. I know that the "word delimiters" list
is different on Windows2000 & Windows 2003, but I don't know if there are
any other server- related issues.
| |
| Hilary Cotter 2006-03-14, 7:23 am |
| You need to remove the a, so the lines would look like
$
bcdef.....
The $ is thrown away so a search on a, a$, or $a would give hits to a, a$,
$a, !a, a!, etc.
After you make your change to the noise word list - for US english noise.enu
found in
C:\Program Files\Microsoft SQL Server\MSSQL$SQL2000
\FTDATA\SQLServer\Co
nfig
stop mssearch, save your changes, restart mssearch and do a full population.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Griff" <howling@the.moon> wrote in message
news:OHjQSorRGHA.1204@TK2MSFTNGP12.phx.gbl...
> Got a bizarre problem
>
> In our search box, someone types in the search term:
> A$
>
> This results in the PARTIAL SQL:
> ......AND (CONTAINS(TableToQue
ry, ' ""A$"" '))
>
> And this results in the error:
> Server: Msg 7619, Level 16, State 1, Line 1
> Execution of a full-text operation failed. A clause of the query
> contained only ignored words.
>
> The set of "noise" words:
> C:\Program Files\Microsoft SQL
> Server\MSSQL\FTDATA\
SQLServer\Config\noi
se.eng
>
> has two lines that appear relevant:
> $
> a b c d e f g h i j k l m n o p q r s t u v w x y z
>
> But the "word" being searched is "A$" and not "A" with "$".
>
> Can someone explain why I'm getting this particular error?
>
> Thanks in advance
>
> Griff
>
> PS - Using SQLServer 2000 on Windows 2003 being queried from an ASP script
> running in IIS under Windows 2000. I know that the "word delimiters" list
> is different on Windows2000 & Windows 2003, but I don't know if there are
> any other server- related issues.
>
| |
|
| Hi Hilary
Many thanks for your response. I have a few further questions just to help
clarify my understanding.
You say that the $ symbol is "thrown away". What are the exact criteria for
throwing these characters away? I.e., are they ALWAYS ditched, only ditched
if on the end of a word...etc?
Presumably the $ character is ditched when the data in the table field is
actually indexed. As an example, say that I had the following text in the
table "abcd$efgh". How would this be saved in the full text index: as a
single word "abcdefgh" or as two words "abcd" and "efgh"? If I wanted to
enter a search term that would return this row, I'd presumably have to enter
the indexed word (so either "abcdefgh" if one word or either "abcd" or
"efgh" if two words); entering "abcd$efgh" presumably would return no hits.
If $ is a "throw away" character, what other characters are also thrown
away? The reason I'm interested in this is that I want to pre-parse the
search terms that are entered to see if they are suitable for use within a
full-text query. So, I'd need to throw away all $ characters (and all the
other characters that are also known to be thrown away when the full text
index is searched) and see what was left. Is this simply the list in the
noise.eng file (UK English)? Are the criteria for throwing these characters
away the same for all characters? I assume that they must be.
Many thanks
Griff
| |
| Hilary Cotter 2006-03-16, 7:24 am |
| All non alphanumeric characters are thrown away and replaced by white space.
So a search on A will match with a!, !a, a@, @a, a#, a$, a%, "a.", "a,",
etc. A search on a! will match with A!, a., a, a$, (A, etc. Note that its
case insensitive and this behavior applies to tokens of more than one
letter, ie !aa. Note further that a must be removed from the noise word list
for this to work.
There are some exceptions.
`,_, following any letter whether the word/letter is in the noise word list
of not will result in a new token.
So that a search on `B, or `b, will match in a case insensitive manner with
b` or `B, and the same with _. If your column has b, B, `b, and B` a search
on b, will return b and B (if they are not in the noise word list), and a
search on `b will match only with `b or `B whether b is in the noise word
list or not.
.. ++ following lower cased letters, and sharp # following upper cased
letters. A search c++ will match with c++ and not with c. c does not have to
be in the noise word list for this to work. And its any letter of the
alphabet, but not numbers.
HTH - I have submitted a paper for publication on this matter to
simple-talk. It was supposed to have been published in Feb.
--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.
This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"Griff" <howling@the.moon> wrote in message
news:%23y27oE2RGHA.224@TK2MSFTNGP10.phx.gbl...
> Hi Hilary
>
> Many thanks for your response. I have a few further questions just to
> help clarify my understanding.
>
> You say that the $ symbol is "thrown away". What are the exact criteria
> for throwing these characters away? I.e., are they ALWAYS ditched, only
> ditched if on the end of a word...etc?
>
> Presumably the $ character is ditched when the data in the table field is
> actually indexed. As an example, say that I had the following text in the
> table "abcd$efgh". How would this be saved in the full text index: as a
> single word "abcdefgh" or as two words "abcd" and "efgh"? If I wanted to
> enter a search term that would return this row, I'd presumably have to
> enter the indexed word (so either "abcdefgh" if one word or either "abcd"
> or "efgh" if two words); entering "abcd$efgh" presumably would return no
> hits.
>
> If $ is a "throw away" character, what other characters are also thrown
> away? The reason I'm interested in this is that I want to pre-parse the
> search terms that are entered to see if they are suitable for use within a
> full-text query. So, I'd need to throw away all $ characters (and all the
> other characters that are also known to be thrown away when the full text
> index is searched) and see what was left. Is this simply the list in the
> noise.eng file (UK English)? Are the criteria for throwing these
> characters away the same for all characters? I assume that they must be.
>
> Many thanks
>
> Griff
>
|
|
|
|
|