Let's say you have a database with content that you, or your users, want to search over. The database looks something like:
|1||Luke||But I was going into Tosche Station to pick up some power converters!|
|2||Obi-Wan Kenobi||It's over anakin, I have the high ground!|
Searching the table
The traditional way to search a table is with textual search operators. The way to do it is with the
LIKE keyword, for example:
SELECT * FROM content WHERE title LIKE 'Lu%'. This shows ID 1 as a result. Searching with textual search operators is very basic and has limitations.
- There is no ranking. Every result has the same weight value to it.
- It is not possible to search with a sentence. When you want to search something like 'That character that hates sand', there is no easy way to do that with
This is where Postgresql comes in.
Full Text Searching
Postgresql has a very nifty feature called 'Full Text Searching'. It provides the capability to identify words and get the stem. The way it works is that postgres builds a document (list of vectors) of the words with the location(s) of the given text. You then save it in another column so that postgres does not need to rebuild the vector every time you want to search the table. When the table has been build, you can search it with 2 different functions. These functions will be explained further down below
Creating a search configuration
For this example we are going to make our own configuration for a customized search index. Run the following sql to create a configuration with the name 'star_wars' that copies the (default) English configuration.
CREATE TEXT SEARCH CONFIGURATION star_wars ( COPY = english );
Converting the text to vectors (document)
Now that we have a search configuration we can execute a function that converts the given text to a document.
SELECT to_tsvector('star_wars', quote) AS document FROM star_wars_quotes;
The result of the query above is:
As you can see, the function removes all stop words and transforms the important words to lexemes/tokens. There is also a number appended to the tokens. This indicates the location of the word in the text. For example, convert is the 13th word in the text.
Searching the text with a query
There are 2 ways of searching the documents. With
plainto_tsquery. You can do more with
to_tsquery, but it limits the input you can give. Both functions transform the given text to tokens so that it can be used to match with the document.
To build a more complex query, use
to_tsquery. It only accepts single words and must be seperated with boolean operators
| (OR) and
! (NOT). For example:
SELECT to_tsquery('star_wars', 'I & need & converters') AS query. The result of this query is:
'need' & 'convert'. Again, it removes all stop words.
When you are building an application in which users can lookup records, you do not want them to only input a single word at the time. Luckily, there is another function
plainto_tsquery. With this function, you can query a whole string. The disadvantage of this is that you cannot use boolean operators. For example,
SELECT plainto_tsquery('star_wars', 'Do I need power converters?') AS query gives the result:
'need' & 'power' & 'convert'.
To actually match records with a query, the
@@ operator needs to be used. For example, I want a quote from Kenobi that has something to do with ground:
SELECT * FROM star_wars_quotes WHERE to_tsvector('star_wars', quote) @@ plainto_tsquery('star_wars', 'Anakin and ground?')
This query gives the record with id 2 as result, exactly what we want!
Full Text Search is perfect for when you want to make a table searchable and also want some customization. When there is grammar sensitive text, it might not be the ideal solution.
Thank you for reading!