|
Fulltext
search
Have you ever wanted to search text stored in your database,
but couldn't figure out how to do it efficiently? Are you lazy like me
and don't enjoy maintaining reverse indexes, dictionaries, and word scores?
You're in luck. The release of SQL Server 2005 has made searching text stored in
databases available to the masses.
SQL Server 2005 has had FULLTEXT searching in one form or another since version 8.
FULLTEXT indices in SQL allow database administrators and programmers to designate
any character-based field (CHAR, VARCHAR, or TEXT) as a FULLTEXT index,
which allows for complex text searching against data stored in those fields.
This feature is not to be confused with the LIKE function in SQL.
LIKE works more along the lines of a regular expression.
On the other hand, FULLTEXT indices are fully indexed fields which support stopwords,
boolean searches, and relevancy ratings.
SELECT * FROM MyTable WHERE Notes = 'Deliver Tuesday'
SELECT * FROM MyTable WHERE Notes LIKE '%caution%'
With full-text searching, you can perform many other types of search:
- Two words near each other
- Any word derived from a particular root (for example run, ran, or running)
- Multiple words with distinct weightings
- A word or phrase close to the search word or phrase
How to create a fulltext catalogue in your database (Enterprise Manager)
The easiest way to do this is to open SQL Server Enterprise Manager and expand the node for your database to find the Full-Text Catalogs node (if that node isn't present, check to make sure that the Microsoft Search Service is installed on the server). Right-click on the node and select New Full-Text Catalog. SQL Server will prompt you for a name and location for the catalog (and it will supply a default location). Name the catalog anything you like and click OK to create it.
Next you need to tell SQL Server what data to include in the catalog. Again, you can do this in Enterprise Manager. Right-click on a table and select Full-Text Index Table, Define Full-Text Indexing on a Table. This will launch the SQL Server Full-Text Indexing Wizard. You need to make these choices to complete the wizard:
Select a unique index on the table
Select the columns to index. You can optionally specify a language to use for word breaking.
Select the catalog to contain the index, or create a new catalog.
Create a schedule to repopulate the index on a regular basis (this is also optional).
When you finish the wizard, it will create the index for the table. But the index won't have any entries in it yet. Right-click on the table again anfd select Full-Text Index Table, Start Full Population to build the actual index
How to create a fulltext catalogue in your database (Query Analizer)
create fulltext catalog my_catalog_name_here
in path ‘c:\mysqldata\somesubdirectory’
as default
The ‘in path’ is optional, if you omit it your catalog is created in the same place as the data. For small databases this is fine, for large ones you might actually want to store the catalog on a separate hard disk in order to get a performance boost.
The ‘as default’ clause says this catalog will be the default one used for new full text search indexes, or for searching existing ones. Most times you’ll probably only need one catalog for a database, so you can add this and forget it.
Once you have a catalog created, you may need to tweak it. There’s not a lot of tweaking you can do, just three ways you can alter it, and all are implemented via the alter command.
alter fulltext catalog my_catalog_name_here rebuild
alter fulltext catalog my_catalog_name_here reorganize
alter fulltext catalog my_catalog_name_here as default
The first command, rebuild does just what it says. Your old catalog goes to the great bit bucket in the sky (i.e. it’s deleted) and SQL Server will recreate all of your full text search indexes. And it should be obvious, but remember during this time your full text search will not be available.
Reorganize is something like doing a disk defrag, it cleans up and reorganizes your full text search indexes. While it may not be as efficient as doing a complete rebuild, it does have the advantage of not taking the catalog offline while it does it’s work.
Finally ‘as default’ simply makes the catalog the default, in case you either forgot or were distracted by Mike Rowe doing something nauseating on “Dirty Jobs” .
Fulltext indexes
create fulltext index on my_table_name_here
(column1, column2,…)
key index my_tables_unique_index_name
on my_catalog_name_here
with change_tracking {manual | auto | off}, no population
The first thing is also the most obvious, you need to supply the name of the table in the first line. Note we’re not supplying a name for the full text search index. Since there’s only one per table, SQL Server takes care of creating the full text search index name for us.
Next we need to supply the name of the column or columns we want indexed. These can be any sort of text field. Just list them one after another, separated by commas.
The next item is also required, and sort of tricky. Each row in the table you are doing full text searching on must have a unique index. It makes sense when you think about it, for the text search to be efficient it must be able to quickly move to the row with the word you’re hunting for, and the way to do that is via the unique index.
Next you will need to tell SQL Server how often to update the index. You do this through the with
change_tracking parameter. OFF turns it off entirely, no updates will be done until you
issue a rebuild via the alter syntax You might want to use OFF when you have a table that gets updated very rarely.
AUTO, on the other hand is for when you have a table that gets updated frequently. It will update the full text search index when the associated table is updated. The final option, MANUAL will flag changes to the underlying table, but it won’t update the full text search index until you tell it to.
The final parameter, no population , only applies when you use OFF. It tells SQL Server not to populate the index when it’s created. If you omit it, or use AUTO or MANUAL, SQL Server will populate the full text search index when the index is created.
You may decide one day you no longer need the full text search index. Since the readers of this blog are the smartest, most intelligent readers on the planet you’ve already figured out we’ll need to use a variant of the drop command:
drop fulltext index on my_table_name_here
If you need to alter the index to either rebuild it or change the columns, you can use the following options:
alter fulltext index on my_table_name_here
parameters here
-
set change_tracking {off | auto | manual} – This works the same as with the create command, it lets you change the tracking mode.
-
disable – Disables the full text search index, it’s not used for searching nor is it updated. However the data is left intact, should you want to turn it back on.
-
enable – Enables the full text search index after a disable.
-
add ( column ) – Adds the passed in column to the full text search index.
-
drop ( column ) – Removes the passed in column from the full text search index.
-
start full population –This rebuilds the index from the ground up.
-
start incremental population –This will update the index since the last time it was updated. Note you must have a timestamp column on your table for this to work.
-
start update population –Remember a moment ago when I talked about the change_tracking manual option? Well this command is how you update an index with manual change tracking.
How to search using the fulltext predicates
Four T-SQL predicates are involved in full-text searching:
FREETEXT
FREETEXTTABLE
CONTAINS
CONTAINSTABLE
FREETEXT
is the easiest of these to work with; it lets you specify a search term but then tries to look at the meaning rather than the exact term when finding matches. For instance, here's a query using FREETEXT together with its results:
SELECT ProductName
FROM Products
WHERE FREETEXT (ProductName, 'spread' )
ProductName
----------------------------------------
Grandma's Boysenberry Spread
Vegie-spread
As you can see, FREETEXT finds the word or words you give it anywhere in the search column.
FREETEXTTABLE works like FREETEXT
except that it returns its results in a Table object.
CONTAINS
(and CONTAINSTABLE, which works the same but delivers results in a table) offers a much more complex syntax for using a full-text indexed column:
For instance, you can search for one word "near" another this way:
SELECT ProductName
FROM Products
WHERE CONTAINS(ProductName, '"laugh*" NEAR lager')
ProductName
----------------------------------------
Laughing Lumberjack Lager
Note the use of "laugh*" to match any word starting with "laugh." You can also supply a weighted list of terms to CONTAINS, and it will prefer matches with a higher weight:
SELECT ProductName
FROM Products
WHERE CONTAINS(ProductName, 'ISABOUT (stout weight (.8),
ale weight (.4), lager weight (.2) )' )
ProductName
----------------------------------------
Laughing Lumberjack Lager
Steeleye Stout
Sasquatch Ale
Outback Lager
Enable/Disable Fulltext search on a database
sp_fulltext_database 'enable'
Enables full-text indexing within the current database.
Important Use carefully. If full-text catalogs already exist, this procedure drops all full-text catalogs,
re-creates any full-text indexing indicated in the system tables, and marks the database as full-text enabled.
This action does not cause index population to begin; an explicit start_full or start_incremental on each catalog must be issued using sp_fulltext_catalog to populate or repopulate the full-text index.
The reverse action is 'disable' which will remove all
full-text catalogs in the file system for the current database and mark the database as
being disabled for full-text indexing.
Return information about existing catalog/catalogues
sp_help_fulltext_catalogs 'myCat'
This will display the ID, Name, Path, included tables and the population status which can be one of the following values:
0 = Idle
1 = Full population in progress
2 = Paused
3 = Throttled
4 = Recovering
5 = Shutdown
6 = Incremental population in progress
7 = Building index
8 = Disk is full. Paused
9 = Change tracking
NULL = User does not have VIEW permission on the full-text catalog, or database is not full-text enabled, or full-text component not installed.
sp_help_fulltext_catalog_components
Returns a list of all components (filters, word-breakers, and protocol handlers), used for all full-text catalogs in the current database.
sp_help_fulltext_columns 'SOLO.tblProducts'
Returns the columns designated for full-text indexing in a table in a database.
sp_help_fulltext_tables 'MyCat'
Returns a list of tables that are registered for full-text indexing.
Noise Words And Stop Words
Noise Words
Words like a, an, the, or, and so on. SQL Server 2005 refers to these words as “noise words”.
The list of noise words that ships with SQL Server 2005 is fine for most users. However, there are times when it can be an advantage to add words to the list. For example, let’s say your company has a rule that it’s name must appear in the header of all internal Word documents. Further, let’s say that all of those Word documents are stored in a varbinary(max) field and full text indexed. Your company name would effectively become a noise word, searching for it would return every record in your table and thus be useless in terms of results. You would then want to add your company name to the list of noise words for SQL Server 2005 to ignore.
Or, let’s take the opposite example. There was a word in the default SQL Server 2005 noise word list that, when in all caps, was an acronym for a piece of equipment in our plants. Thus it was getting omitted from searches. We had to remove that word from the noise word list.
So, how do we change the list of noise words in SQL Server 2005? First we have to track it down. Open up regedit and navigate to this spot in the tree.
HKEY_LOCAL_MACHINE -> SOFTWARE -> Microsoft -> Microsoft SQL Server -> [insert your instance name here] -> MSSearch -> Language -> [insert your language abbreviation here]
My instance name is MSSQL.1. Pick your language enu - short for English US or eng - english UK. Now look for the name NoiseFile. The value for mine is:
C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\noiseenu.txt.
Use any text editor to open and look at the file, it’s a simple text file. Mine only had 129 words in it. Now simply add or remove the words you want (or don’t want) in the noise word list. When done, save the file.
Now for the bad news. In order for the noise word file to take effect you must repopulate the full text indexes. You can do the entire catalog at once, or one table at a time. To do the entire catalog, use the alter full text catalog command.
Bear in mind one important fact, changes to the noise word file apply to the entire SQL Server, and all the databases running on it. Thus if you include or exclude noise words to support one app, you are affecting every app that touches every full text indexed database on your server.
Stop Words
While SQL Server 2005 had Noise Words for its full text search, SQL Server 2008 has moved to StopWords. The good news is the change is much deeper than just a rebranding. I say good news, because with the change comes a lot more flexibility and functionality.
There are actually two new tools introduced with SQL Server 2008, StopWords and StopList. A StopList acts as a named container for a group of StopWords. You can then associate a StopList with one or more tables. This is a great enhancement over Noise Words, which applied to the entire server. Now you can associate a group of StopWords, in a StopList, with specific tables without affecting the rest of the tables on the database or server.
More to come on this...
|
|
|
|
1 2
|
|
| .Net Reporter |
 |
Management Reporting made easy with the .Net reporter package regardless of which accounting system you use. |
|
more info>>
|
|
| Boon Cart |
 |
Shopping Cart - available now in 4 layout templates at your discretion |
|
more info>>
|
|
|
|
|