Turbocharge Your WordPress Search Using Solr

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
Create a Linode account to try this guide with a $ credit.
This credit will be applied to any valid services used during your first  days.

The standard search that is built into WordPress does not provide the best search experience you can offer your visitors, given its inability to suggest search phrases, catch typos, understand word variations, organize and filter results, and index documents for search results. Full text search engines often offer these features and Apache Solr is a free, open-source option that does.

In this guide, you will learn how to install Java, install and configure Solr on Ubuntu 14.x or Debian 7.x, and integrate it into your WordPress blog using the WPSolr plugin.

Note
This guide is written for a non-root user. Commands that require elevated privileges are prefixed with sudo. If you’re not familiar with the sudo command, you can check our Users and Groups guide.

Prerequisites

  • WordPress must be already installed and configured. If you have not yet installed WordPress, follow the Manage Web Content with WordPress guide.

  • Much of this guide assumes that Solr is being installed on the same server as WordPress; however, Solr can be installed on a second server for security or scalability reasons. Alternate steps are provided should Solr be installed on a second server.

Install Java

Since Solr is a Java web application, it requires a Java Runtime Environment (JRE).

  1. Check if Java is already installed on your server using the following commands:

    whereis java
    java -version
    

    If Java is already installed, it will output the path of the executable Java file and the Java version that is being run. Skip to the next step.

  2. Install the openjdk-7-jre-headless package:

    sudo apt-get install openjdk-7-jre-headless
    
  3. After the JRE is installed, test it by checking the version:

    java -version
    

    If it’s working correctly, it should produce similar output:

    java version "1.7.0_75"
    OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~deb7u1)
    OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
    

Install unzip, curl and php5-curl

  1. Install the unzip, curl, and php5-curl packages:

    sudo apt-get install unzip curl php5-curl
    
  2. Restart the HTTP server where WordPress is hosted:

    sudo service apache2 restart
    

Install and Configure Solr

Download and Install Solr

  1. Open the Solr download site in your browser.

  2. Apache will provide a download link based on location:

    Selecting an Apache download mirror site

  3. Click the link to open a page of Solr releases:

  4. Click on the highest available 4.x version to see the files in that release:

    Note
    Since Solr 5.x is still in beta, its configuration procedures are different from 4.x, and WPSolr is not yet compatible with the 5.x release.

    Solr release directory listing

  5. Copy the link address for the non-source .tgz file.

  6. On your Linode, download that file into your home directory using the wget command:

    cd ~
    wget http://apache.bytenet.in/lucene/solr/4.10.4/solr-4.10.4.tgz
    
  7. Install Solr under the /opt directory:

    sudo tar -C /opt -xzvf solr-4.10.4.tgz
    

Install WPSolr Configuration Files

For Solr to index blog posts, it needs to know the structure of the blog data. This structure is described in Solr configuration files.

  1. WPSolr provides ready-made configuration files on their website. Visit the WPSolr website and get the link address of the latest WPSolr release for your Solr version:

    The copied address will look similar to http://wpsolr.com/?wpdmdl=2064.

  2. On the server where Solr is installed, use the wget command to download the file from copied address and save it as wpsolr_config.zip.

    wget -O wpsolr_config.zip http://wpsolr.com/?wpdmdl=2064
    
  3. Extract wpsolr_config.zip:

    unzip wpsolr_config.zip
    
  4. Copy schema.xml and solrconfig.xml into /opt/solr-4.10.4/example/solr/collection1/conf. Back up the original files before copying:

    sudo cp /opt/solr-4.10.4/example/solr/collection1/conf/schema.xml /opt/solr-4.10.4/example/solr/collection1/conf/schema.xml.original
    sudo cp /opt/solr-4.10.4/example/solr/collection1/conf/solrconfig.xml  /opt/solr-4.10.4/example/solr/collection1/conf/solrconfig.xml.original
    sudo cp schema.xml  /opt/solr-4.10.4/example/solr/collection1/conf/
    sudo cp solrconfig.xml  /opt/solr-4.10.4/example/solr/collection1/conf/
    

Change the IP Address and Port of Solr (Optional)

By default, Solr listens for search requests on all IP addresses at port 8983. For security reasons, you may wish to change the IP address and/or port it listens on. It is also recommended that only WordPress be able to query Solr.

  1. First, make a back up of /opt/solr-4.10.4/example/etc/jetty.xml. Then, open the file in a text editor:

    sudo cp /opt/solr-4.10.4/example/etc/jetty.xml /opt/solr-4.10.4/example/etc/jetty.xml.backup
    
  2. Locate the section where listening host and port are set:

    File: /opt/solr-4.10.4/example/etc/jetty.xml
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    
    <!--
        <Call name="addConnector">
          <Arg>
              <New class="org.eclipse.jetty.server.nio.SelectChannelConnector">
                <Set name="host"><SystemProperty name="jetty.host" /></Set>
                <Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
                <Set name="maxIdleTime">50000</Set>
                <Set name="Acceptors">2</Set>
                <Set name="statsOn">false</Set>
                <Set name="confidentialPort">8443</Set>
                <Set name="lowResourcesConnections">5000</Set>
                <Set name="lowResourcesMaxIdleTime">5000</Set>
              </New>
          </Arg>
        </Call>
    -->
  3. Set the appropriate listening IP address:

    • If Solr is on the same server as WordPress, replace <Set name="host"><SystemProperty name="jetty.host" /></Set> with:

      <Set name="host">localhost</Set>
      
    • If Solr is on a different server from WordPress, replace <Set name="host"><SystemProperty name="jetty.host" /></Set> with:

      <Set name="host">123.45.67.89</Set>
      

      Replace 123.45.67.89 with your own private IP or FQDN.

Create a User Account and User Group for Solr

For security purposes, Solr should run with its own user account and group.

Note
The following commands should be run on the server where Solr is installed.
  1. Create a group named solr:

    sudo addgroup --system solr
    
  2. Create a user named solr:

    sudo adduser --system --ingroup solr --home /opt/solr-4.10.4 --shell /bin/sh --disabled-password --disabled-login solr
    
  3. Transfer ownership of the Solr directory to the user solr:

    sudo chown -R solr:solr /opt/solr-4.10.4
    

Configure Solr as a Startup Service

Note
Run the following commands on the server where Solr is installed.
  1. Use a text editor to create a new script /etc/init.d/solr. Alternatively, you can download it from this link:

    sudo nano /etc/init.d/solr
    
  2. Copy the following text into the editor, save and close it:

    Note
    If using a different version of Solr, change the JETTY_HOME=/opt/solr-4.10.4/example line to match the installed version.
    File: /etc/init.d/solr
      1
      2
      3
      4
      5
      6
      7
      8
      9
     10
     11
     12
     13
     14
     15
     16
     17
     18
     19
     20
     21
     22
     23
     24
     25
     26
     27
     28
     29
     30
     31
     32
     33
     34
     35
     36
     37
     38
     39
     40
     41
     42
     43
     44
     45
     46
     47
     48
     49
     50
     51
     52
     53
     54
     55
     56
     57
     58
     59
     60
     61
     62
     63
     64
     65
     66
     67
     68
     69
     70
     71
     72
     73
     74
     75
     76
     77
     78
     79
     80
     81
     82
     83
     84
     85
     86
     87
     88
     89
     90
     91
     92
     93
     94
     95
     96
     97
     98
     99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    
    #!/bin/sh -e
    #
    # /etc/init.d/solr -- startup script for Apache Solr
    #
    #
    ### BEGIN INIT INFO
    # Provides:          solr
    # Required-Start:    $local_fs $remote_fs $network
    # Required-Stop:     $local_fs $remote_fs $network
    # Should-Start:      $named
    # Should-Stop:       $named
    # Default-Start:     2 3 4 5
    # Default-Stop:      0 1 6
    # Short-Description: Start Solr
    # Description:       Start Apache Solr jetty server
    ### END INIT INFO
    
    PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
    NAME=solr
    DESC="Solr search engine"
    JETTY_HOME=/opt/solr-4.10.4/example
    START_JAR="$JETTY_HOME/start.jar"
    
    if [ `id -u` -ne 0 ]; then
            echo "You need root privileges to run this script"
            exit 1
    fi
    
    # Make sure jetty is started with system locale
    if [ -r /etc/default/locale ]; then
            . /etc/default/locale
            export LANG
    fi
    
    . /lib/lsb/init-functions
    
    if [ -r /etc/default/rcS ]; then
            . /etc/default/rcS
    fi
    
    # Run Jetty as this user ID (default: jetty)
    # Set this to an empty string to prevent Jetty from starting automatically
    SOLR_USER=solr
    
    SOLR_GROUP=solr
    
    export JAVA="/usr/bin/java"
    
    # Extra options to pass to the JVM
    # Set java.awt.headless=true if JAVA_OPTIONS is not set so the
    # Xalan XSL transformer can work without X11 display on JDK 1.4+
    # It also sets the maximum heap size to 256M to deal with most cases.
    JAVA_OPTIONS="-Djava.awt.headless=true"
    
    # Timeout in seconds for the shutdown of all webapps
    JETTY_SHUTDOWN=30
    
    JETTY_STOP_PORT=17935
    JETTY_STOP_KEY=stopsolr
    JETTY_ARGS="-Djetty.home=$JETTY_HOME -DSTOP.PORT=$JETTY_STOP_PORT -DSTOP.KEY=$JETTY_STOP_KEY"
    
    # Define other required variables
    PIDFILE="/var/run/$NAME.pid"
    WEBAPPDIR="$JETTY_HOME/webapps"
    
    ##################################################
    # Do the action
    ##################################################
    case "$1" in
        start)
            log_daemon_msg "Starting $DESC." "$NAME"
            if start-stop-daemon --quiet --test --start --pidfile "$PIDFILE" \
                            --user "$SOLR_USER" --group "$SOLR_GROUP" --startas "$JAVA" > /dev/null; then
    
                    if [ -f $PIDFILE ] ; then
                          log_warning_msg "$PIDFILE exists, but solr was not running. Ignoring $PIDFILE"
                    fi
    
                    start-stop-daemon --start --pidfile "$PIDFILE" --chuid "$SOLR_USER:$SOLR_GROUP" \
                        --chdir "$JETTY_HOME" --background --make-pidfile --startas $JAVA -- \
                        $JAVA_OPTIONS $JETTY_ARGS -jar $START_JAR --daemon
    
                    log_daemon_msg "$DESC started" "$NAME"
    
                    sleep 5
                    if start-stop-daemon --test --start --pidfile "$PIDFILE" \
                            --user $SOLR_USER --group $SOLR_GROUP  --startas "$JAVA" > /dev/null; then
                            log_end_msg 1
                    else
                            log_end_msg 0
                    fi
    
            else
                    log_warning_msg "(already running)."
                    log_end_msg 0
                    exit 1
            fi
            ;;
    
        stop)
            log_daemon_msg "Stopping $DESC." "$NAME"
    
            if start-stop-daemon --quiet --test --start --pidfile "$PIDFILE" \
                    --user "$SOLR_USER" --group "$SOLR_GROUP" --startas "$JAVA" > /dev/null; then
                    if [ -x "$PIDFILE" ]; then
                          log_warning_msg "(not running but $PIDFILE exists)."
                    else
                            log_warning_msg "(not running)."
                    fi
            else
                    start-stop-daemon --quiet --stop \
                            --pidfile "$PIDFILE" --user "$SOLR_USER" --group "$SOLR_GROUP" \
                            --startas $JAVA -- $JAVA_OPTIONS $JETTY_ARGS -jar $START_JAR --stop > /dev/null
    
                    while ! start-stop-daemon --quiet --test --start \
                                --pidfile "$PIDFILE" --user "$SOLR_USER" --group "$SOLR_GROUP" \
                                --startas "$JAVA" > /dev/null; do
                            sleep 1
                            log_progress_msg "."
                            JETTY_SHUTDOWN=`expr $JETTY_SHUTDOWN - 1` || true
                            if [ $JETTY_SHUTDOWN -ge 0 ]; then
                                    start-stop-daemon --oknodo --quiet --stop \
                                            --pidfile "$PIDFILE" --user "$SOLR_USER" --group "$SOLR_GROUP" \
                                            --startas $JAVA -- $JAVA_OPTIONS $JETTY_ARGS -jar $START_JAR --stop
                            else
                                    log_progress_msg " (killing) "
                                    start-stop-daemon --stop --signal 9 --oknodo \
                                            --quiet --pidfile "$PIDFILE" \
                                            --user "$SOLR_USER" --group "$SOLR_GROUP"
                            fi
                    done
                    rm -f "$PIDFILE"
                    log_daemon_msg "$DESC stopped." "$NAME"
                    log_end_msg 0
            fi
            ;;
    
        status)
            if start-stop-daemon --quiet --test --start --pidfile "$PIDFILE" \
                    --user "$SOLR_USER" --group "$SOLR_GROUP" --startas "$JAVA" > /dev/null; then
    
                    if [ -f "$PIDFILE" ]; then
                        log_success_msg "$DESC is not running, but pid file exists."
                            exit 1
                    else
                        log_success_msg "$DESC is not running."
                            exit 3
                    fi
            else
                    log_success_msg "$DESC is running with pid `cat $PIDFILE`"
            fi
            ;;
    
        restart|force-reload)
            if ! start-stop-daemon --quiet --test --start --pidfile "$PIDFILE" \
                    --user "$SOLR_USER" --group "$SOLR_GROUP" --startas "$JAVA" > /dev/null; then
                    $0 stop $*
                    sleep 1
            fi
            $0 start $*
            ;;
    
        try-restart)
            if start-stop-daemon --quiet --test --start --pidfile "$PIDFILE" \
                    --user "$SOLR_USER" --group "$SOLR_GROUP" --startas "$JAVA" > /dev/null; then
                    $0 start $*
            fi
            ;;
    
        *)
            log_success_msg "Usage: $0 {start|stop|restart|force-reload|try-restart|status}"
            exit 1
            ;;
    esac
    
    exit 0
  3. After saving the script, run the following commands:

    sudo chmod ugo+x /etc/init.d/solr
    sudo update-rc.d  solr defaults
    sudo update-rc.d  solr enable
    sudo service solr start
    

Test Solr

Run the following command on the server where Solr is installed:

curl http://localhost:8983/solr/collection1/select

If it shows similar output, Solr is installed and configured correctly:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int><lst name="params"/></lst><result name="response" numFound="0" start="0"></result>
</response>

If Solr is installed on a different server from WordPress, repeat the test from the WordPress server by sending a request to the Solr server:

curl http://HOSTNAME-OR-IP-OF-SOLR-SERVER:8983/solr/collection1/select

Install and Configure WPSolr

Install the WPSolr Plugin

  1. Install the WPSolr WordPress plugin, either through your WordPress admin console or by downloading the files into your plugins/ directory.

  2. On the Plugins page, activate the plugin named Enterprise Search in seconds:

  3. WordPress then displays a plugin activated message and adds a WPSOLR menu item to the sidebar:

Configure WPSolr Plugin

  1. Open WPSolr page:

  2. Click on the button I uploaded my 2 compatible configuration files to my Solr core:

    WPSolr first configuration

  3. On the next page, select Self Hosted option:

  4. When selecting Self Hosted, the plugin prompts you to enter details about the Solr server:

    • Solr Host: This should be the same value as the host typed in /opt/solr-4.10.4/example/etc/jetty.xml. If Solr is installed on same server as WordPress, enter localhost. If Solr is installed on a different server, enter the same IP address or hostname.

    • Solr Port: This should be the same value as the port typed in /opt/solr-4.10.4/example/etc/jetty.xml.

    • Solr Path: Set this value to /solr/collection1, the default Solr core. The Solr server can run multiple Solr cores, each core serving a different set of search data. For more information on Solr cores, go through the Solr Core wiki.

  5. Press the Check Solr Status, then Save button. If everything is set correctly, it will show a green tick mark.

  6. Click on the Solr Options tab:

    • Post types to be indexed: Selecting all of them is recommended. Post indexes all blog posts, page all pages (such as about pages), and attachment all documents (such as PDFs and DOC files).

    • Custom taxonomies to be indexed: Generally there is no need to enter anything here; however, if WordPress has been customized to organize blog posts in ways other than categories and tags, enter the name of the taxonomy here.

    • Custom fields to be indexed: Generally there is no need to select anything here.

    • Index Comments: Select this if you want search results to include comments. This is suitable only for blogs where comments add some value to the post and are rigorously moderated.

    • Exclude Items: If you want some posts or pages to be excluded from search results, enter their IDs here.

  7. Press the Save Options button.

  8. Open the Solr Options > Result Options page:

    • Display Suggestions (Did you mean?): Recommended. If selected, Solr will suggest alternate search phrases if it doesn’t find any matches for the entered search phrase:

      WPSolr Suggestions

    • Display number of results and current page: Recommended. This is useful for paginating search results.

    • Replace default WordPress search: Recommended. This replaces the default WordPress search box with one that uses Solr to show autocompletion suggestions.

    • No. of results per page: Configures how many search results should be shown per page.

    • No. of values to be displayed by facets: Facets refer to the filters that Solr shows to enable visitors to shortlist from search results. This value is the maximum number of values shown for each facet. For example, in the image below, this value has been set to 5 so that the Tags facet shows a maximum of 5 tag filters.

      Facet count

  9. Press the Save Options button.

  10. Next, open the Solr Options > Facets Options page:

    Press the green “+” buttons to add a facet. The ones added here are shown as filters in the search results page. Generally, categories and tags are enough, but if the blog has multiple contributors or custom taxonomies, you may also want to add these values as additional facets.

  11. Go to the Solr Operations tab and click the Synchronize Wordpress with my Solr index button.

    Note
    Whenever you publish a new post or page or attachment, this button must be selected for the new pages to be indexed.

    After the operation has completed, the same page displays how many documents were indexed.

The following steps will be completed while on your blog.

  1. Test autocompletion by beginning to type a word you know is in one of your blog posts. As you are typing, the search box should display some suggestions in a dropdown:

    Autocompletion

  2. Test search results by entering a search phrase. Matching results should be displayed:

  3. Test autocorrection suggestions by entering a word with some spelling mistakes or a word that does not occur in any of your blog posts. It should show Did you mean suggestions:

    Did you mean suggestions

  4. Test the document search by creating and publishing some test posts with added file attachments (such as PDFs). Update the search data, then search for a phrase that you know occurs in your attachment. It should display matches inside those attachments:

    Document search results

Location of Search Data

Search engine data is stored in the /opt/solr-4.10.4/example/solr/collection1/data directory.

Back Up or Restore Search Data

If you have a data backup procedure for your server, you can back up search data by including the /opt/solr-4.10.4/example/solr/collection1/data directory in the backup.

The backing up of search data is not critical, since it can always be recreated from the WordPress database. However, for very large blogs with thousands of posts and attachments, backing up and restoring search data will be much faster than recreating it. Overall, when migrating or merging a blog from another WordPress server, the recommended approach is to recreate the search data.

After a migration or merger, go to the Solr Operations option located at the WPSOLR plugin section of your administration panel and press the Synchronize Wordpress with my Solr index to recreate the search data.

This page was originally published on


Your Feedback Is Important

Let us know if this guide was helpful to you.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.