Change port for Apache Solr from the default port 8983 on Ubuntu 16.04

This post introduces how to change the default port on which Apache Solr runs on Ubuntu 16.04.  (See my post if you have not installed Solr on your Ubuntu.)

The default port for Solr is 8983, but there are circumstances where you may want to change this. For example, if you wish to experiment with a new release, or you want your various Sitecore development instances to hit separate instances of Solr.  See below for two options for changing the port number on Ubuntu.

Step 1: use sudo service solr status to check your Solr status and the port it is running on.

yourusername@yourservername:~$ sudo service solr status
[sudo] password for yourusername: 
● solr.service - LSB: Controls Apache Solr as a Service
 Loaded: loaded (/etc/init.d/solr; bad; vendor preset: enabled)
 Active: active (exited) since Sun 2017-04-30 11:08:43 EDT; 1 weeks 0 days ago
 Docs: man:systemd-sysv-generator(8)

Apr 30 11:08:34 yourservername systemd[1]: Starting LSB: Controls Apache Solr as a Service...
Apr 30 11:08:34 yourservername su[2655]: Successful su for solr by root
Apr 30 11:08:34 yourservername su[2655]: + ??? root:solr
Apr 30 11:08:34 yourservername su[2655]: pam_unix(su:session): session opened for user solr by (uid=0)
Apr 30 11:08:42 yourservername solr[2652]: [194B blob data]
Apr 30 11:08:42 yourservername solr[2652]: Started Solr server on port 8983 (pid=2861). Happy searching!
Apr 30 11:08:43 yourservername solr[2652]: [14B blob data]
Apr 30 11:08:43 yourservername systemd[1]: Started LSB: Controls Apache Solr as a Service.

Step 2: use sudo service solr stop to  stop your Solr first before we go ahead and change  its default port.

yourusername@yourservername:/opt/solr-6.5.1/bin$ sudo service solr stop
yourusername@yourservername:/opt/solr-6.5.1/bin$ sudo service solr status
● solr.service - LSB: Controls Apache Solr as a Service
   Loaded: loaded (/etc/init.d/solr; bad; vendor preset: enabled)
   Active: inactive (dead) since Sun 2017-05-07 15:40:57 EDT; 17s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 15132 ExecStop=/etc/init.d/solr stop (code=exited, status=0/SUCCESS)

Apr 30 11:08:42 yourservername solr[2652]: Started Solr server on port 8983 (pid=2861). Happy searching!
Apr 30 11:08:43 yourservername solr[2652]: [14B blob data]
Apr 30 11:08:43 yourservername systemd[1]: Started LSB: Controls Apache Solr as a Service.
May 07 15:40:55 yourservername systemd[1]: Stopping LSB: Controls Apache Solr as a Service...
May 07 15:40:55 yourservername su[15135]: Successful su for solr by root
May 07 15:40:55 yourservername su[15135]: + ??? root:solr
May 07 15:40:55 yourservername su[15135]: pam_unix(su:session): session opened for user solr by (uid=0)
May 07 15:40:55 yourservername solr[15132]: Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow
May 07 15:40:57 yourservername solr[15132]: [56B blob data]
May 07 15:40:57 yourservername systemd[1]: Stopped LSB: Controls Apache Solr as a Service.

Step 3: Change config files

Check out all the following files for the port:

  • cd to /opt/solr-6.5.1/server/solr/
#the file path: /opt/solr-6.5.1/server/solr/solr.xml
yourusernmae:/opt/solr-6.5.1/server/solr$ sudo nano solr.xml
#change port here:  ${jetty.port:8983}
  • cd to /var/
# the file path: /var/solr/data/solr.xml
yourusernmae:/var$ sudo nano /solr/data/solr.xml
# change port here:  ${jetty.port:8983}
  • cd to /etc/default/
# the file path: /etc/default/solr.in.sh
yourusernmae:/etc/default$ sudo nano solr.in.sh
# change port here:  SOLR_PORT=8983

Once you save and close the solr.in.sh file you can return to your terminal and type this command to reload the file 

yourusernmae:/etc/default$ source solr.in.sh

Step 4: Start your solr service again using  sudo service solr start, you will see your solr is now running on the new port your changed just now in the step 3 above.

yourusername@yourservername:/etc/default$ sudo service solr start
yourusername@yourservername:/etc/default$ sudo service solr status
● solr.service - LSB: Controls Apache Solr as a Service
 Loaded: loaded (/etc/init.d/solr; bad; vendor preset: enabled)
 Active: active (exited) since Sun 2017-05-07 16:11:32 EDT; 3s ago
 Docs: man:systemd-sysv-generator(8)
 Process: 16988 ExecStop=/etc/init.d/solr stop (code=exited, status=1/FAILURE)
 Process: 17121 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)

May 07 16:11:29 yourservername systemd[1]: Starting LSB: Controls Apache Solr as a Service...
May 07 16:11:29 yourservername su[17125]: Successful su for solr by root
May 07 16:11:29 yourservername su[17125]: + ??? root:solr
May 07 16:11:29 yourservername su[17125]: pam_unix(su:session): session opened for user solr by (uid=0)
May 07 16:11:32 yourservername solr[17121]: [98B blob data]
May 07 16:11:32 yourservername solr[17121]: Started Solr server on port 8985 (pid=17327). Happy searching!
May 07 16:11:32 yourservername solr[17121]: [14B blob data]
May 07 16:11:32 yourservername systemd[1]: Started LSB: Controls Apache Solr as a Service.

Now you can reference Step 5: Creating a Solr search collection in my another post to create a Solr search collection for this port.

References:

 

 

Install Apache Solr 6 on Ubuntu 16.04

This post provides the tutorial to set up Apache Solr 6 on Ubuntu 16.04. (install Solr as a service that auto-starts when (re)boot Ubuntu.)

What is Apache Solr? Apache Solr is an open source enterprise-class search platform written in Java which enables you to create custom search engines that index databases, files, and websites. It has back end support for Apache Lucene. It can, for example, be used to search in multiple websites and can show recommendations for the searched content. Solr uses an XML (Extensible Markup Language) based query and result language. There are APIs (Applications program interfaces) available for Python, Ruby and JSON (Javascript Object Notation).

Some other features that Solr provides are:

  • Full-Text Search.
  • Snippet generation and highlighting.
  • Custom Document ordering/ranking.
  • Spell Suggestions.

This tutorial will show you how to install the latest Solr version on Ubuntu 16.04 LTS. The steps will most likely work with later Ubuntu versions as well.

Before Solr 5, Solr doesn’t work alone; it needs a Java servlet container such as Tomcat or Jetty. But after Solr 5, it does not need to run on Tomcat.  

Running Solr on Tomcat (No Longer Supported)

Beginning with Solr 5.0, Support for deploying Solr as a WAR in servlet containers like Tomcat is no longer supported.

For information on how to install Solr as a standalone server, please see Installing Solr.

To give an example:

Things need to do when installing Solr version before 6.

Download and install Tomcat (or some other servlet container)
Setup Tomcat as a service
Download and unpack Solr
Create a SOLR_HOME folder with correct content
copy solr.war into tomcat/webapps
set CATALINA_OPTS=“-Dsolr.solr.home=/path/to/home -Dsolr.x.y=z…. GC-flags etc”
Setup  Tomcat as a service
service tomcat start

With Solr 6.x, we just need to do:

Download Solr and unpack the install-script
solr/bin/install_solr_service solr-6.2.0.tgz  # Install
Tune /etc/default/solr.in.sh to your likings (mem, port, solr-home, Zk etc)
service solr start (or bin/solr start [options])

Your client would talk to Solr on typically http://host.name:8983/solr/ as a standalone server, not as one out of many webapps on 8080.

Apache Solr 6 required Java 8 or greater to run.

 There had been lots of scaling improvements in Solr 6.

Now let’s get started with the installation.

 

Step 1: Update your System

Use a non-root sudo user to login into your Ubuntu server. Through this user, you will have to perform all the steps and use the Solr later.

To update your system, execute the following command to update your system with latest patches and updates.

$ sudo apt-get update 
$ sudo apt-get upgrade -y   #note that this will update your ubuntu OS, skip this if you do not want to update your system.

Step 2: Install Java 

(Apache Solr 6 required Java 8 or greater to run. If you have installed Java 8 or greater on your machine, skip this.)

Solr is a Java application, so Java needs to be installed first in order to set up Solr. See my post for detailed Java 8 installation on Ubuntu 16.04.

Check the version of Java installed by running the following command

$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

Step 3: (Manually) install Solr 

Solr can be installed on Ubuntu in different ways, in this tutorial,we will install the latest package.  (If you would like to install the latest package from the source, check out How to install and configure Solr 6 on Ubuntu 16.04.)

Now Let’s download the required Solr version from its official site or mirrors.

First go to this Solr Download page, click the link to the latest version.

You would probably see something looks like the pic shown below. Get the download link you prefer. (for my case, I used this one http://apache.cs.utah.edu/lucene/solr/6.5.1). Click the download link you selected, and then you would see something like the pic shown below.

#If you do not have sudo privilege
#cd /path to one folder under your account 
# and you do not need to add "sudo" in the following commands
  
cd /opt
sudo wget http://apache.cs.utah.edu/lucene/solr/6.5.1/solr-6.5.1.tgz

Now extract solr service installer shell script from the downloaded Solr archive file and run installer using following commands.

sudo tar xzf solr-6.5.1.tgz solr-6.5.1/bin/install_solr_service.sh --strip-components=2

Then install Solr as a service using the script:

sudo ./install_solr_service.sh solr-6.5.1.tgz

The output will be similar to this: [Note that this installation will make Solr as a service that auto-starts when you (re)boot Ubuntu.]

myusername@myserver:/opt$ sudo ./install_solr_service.sh solr-6.5.1.tgz
id: ‘solr’: no such user
Creating new user: solr
Adding system user `solr’ (UID 117) …
Adding new group `solr’ (GID 126) …
Adding new user `solr’ (UID 117) with group `solr’ …
Creating home directory `/var/solr’ …

Extracting solr-6.5.1.tgz to /opt

Installing symlink /opt/solr -> /opt/solr-6.5.1 …

Installing /etc/init.d/solr script …

Installing /etc/default/solr.in.sh …

Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
● solr.service – LSB: Controls Apache Solr as a Service
Loaded: loaded (/etc/init.d/solr; bad; vendor preset: enabled)
Active: active (exited) since Sun 2017-04-30 11:08:43 EDT; 5s ago
Docs: man:systemd-sysv-generator(8)
Process: 2652 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)

Apr 30 11:08:34 myserver systemd[1]: Starting LSB: Controls Apache Solr as a Service…
Apr 30 11:08:34 myserver su[2655]: Successful su for solr by root
Apr 30 11:08:34 myserver su[2655]: + ??? root:solr
Apr 30 11:08:34 myserver su[2655]: pam_unix(su:session): session opened for user solr by (uid=0)
Apr 30 11:08:42 myserver solr[2652]: [194B blob data]
Apr 30 11:08:42 myserver solr[2652]: Started Solr server on port 8983 (pid=2861). Happy searching!
Apr 30 11:08:43 myserver solr[2652]: [14B blob data]
Apr 30 11:08:43 myserver systemd[1]: Started LSB: Controls Apache Solr as a Service.
myusername@myserver:/opt$


Step 4:  Start / Stop Solr Service

Use the following command to check the status of the service

$ sudo service solr status

See below for a sample output:

myusername@myserver:/opt$ sudo service solr status
● solr.service - LSB: Controls Apache Solr as a Service
   Loaded: loaded (/etc/init.d/solr; bad; vendor preset: enabled)
   Active: active (exited) since Sun 2017-04-30 11:08:43 EDT; 13min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 2652 ExecStart=/etc/init.d/solr start (code=exited, status=0/SUCCESS)

Apr 30 11:08:34 myserver systemd[1]: Starting LSB: Controls Apache Solr as a Service...
Apr 30 11:08:34 myserver su[2655]: Successful su for solr by root
Apr 30 11:08:34 myserver su[2655]: + ??? root:solr
Apr 30 11:08:34 myserver su[2655]: pam_unix(su:session): session opened for user solr by (uid=0)
Apr 30 11:08:42 myserver solr[2652]: [194B blob data]
Apr 30 11:08:42 myserver solr[2652]: Started Solr server on port 8983 (pid=2861). Happy searching!
Apr 30 11:08:43 myserver solr[2652]: [14B blob data]
Apr 30 11:08:43 myserver systemd[1]: Started LSB: Controls Apache Solr as a Service.

 

Use the following commands to Start, Stop and check status of Solr service.

$ sudo service solr stop
$ sudo service solr start
$ sudo service solr status

 

Step 5: Creating a Solr search collection

(Before we create a Solr search collection, check out this post first if you want to change the default port 8983 to another port.)

Using Solr, we can create multiple collections. Run the following command, give the name of your collection (here mysolrcollection) and specify its configurations.

$ sudo su - solr -c "/opt/solr/bin/solr create -c mysolrcollection -n data_driven_schema_configs"

Sample output:

myusername@myserver:/opt$ sudo su - solr -c "/opt/solr/bin/solr create -c mysolrcollection -n data_driven_schema_configs"
 [sudo] password for myusername:

Copying configuration to new core instance directory:
 /var/solr/data/mysolrcollection

Creating new core 'mysolrcollection' using command:
 http://localhost:8983/solr/admin/cores?action=CREATE&name=mysolrcollection&instanceDir=mysolrcollection

{
 "responseHeader":{
 "status":0,
 "QTime":1422},
 "core":"mysolrcollection"}


The new core directory for our first collection has been created. To view the default schema file, got to:

cd /opt/solr/server/solr/configsets/data_driven_schema_configs/conf

You will see some files shown in the picture below.

To view other configuration options , got to:

cd /opt/solr/server/solr/configsets/

 

Step 6: Use the Solr Web Interface (i.e., Access Solr Admin Panel)

Default Solr runs on port 8983. You can access Solr port in your web browser and you will get Solr dashboard.

The Apache Solr is now accessible on the default port, which is 8983. The admin UI should be accessible at http://your_server_ip:8983/solr. The port should be allowed by your firewall to run the links. 

(If you do not know your IP, check my post to find it out.)

For example:

http://192.168.1.100:8983/solr/

Or use your machine’s host name if you have one.

http://example.org:8983/solr/

 

Here you can view statics of created collection in previous steps named “mycollection”. Click on “Core Selector” on left sidebar and select created collection.

To see the details of the first collection that we created earlier, select the “mysolrcollection” collection in the left menu.

After you selected the “mysolrcollection” collection, select Documents in the left menu. There you can enter real data in JSON format that will be searchable by Solr. To add more data, copy and paste the following example JSON onto Document field:

{
"id": 1,
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
}

Note: You can add other formats of data such as CSV etc to Solr. (See the pic below)

Click on the submit document button after adding the data.

Status: success
Response:
{
 "responseHeader": {
 "status": 0,
 "QTime": 758
 }
}

Now we can click on Query on the left side then click on Execute Query,

We will see something like this:

Conclusion

After successfully installing the Solr Web Interface on Ubuntu, you can now insert the data or query the data with the Solr API and Web Interface.

You can write code to add a large set of documents into Solr. See my post for using Solr with Python. See this post for some useful Solr resources I collected.

 

References:

Install Tomcat & Solr (You can’t avoid this one) – This is for Solr before version 5, after Solr 5, Tomcat is not required to install Solr.

Apache Solr Reference Guide/ Installing Solr  & Running Solr  & Solr Quick Start (pdf. a very good concise intro, including some basic usages and indexing xml, json, csv files)

Configuring a schema.xml for Solr

First, rename the /opt/solr/solr/collection1 to an understandable name like apples (use whatever name you’d like). (This can be skipped if you installed it using apt-get. In that case, you can execute the following command instead: cd /usr/share/solr):

cd /opt/solr/solr
mv collection1 apples
cd apples

Also, if you installed Solr manually, open the file core.properties (nano core.properties) and change the name to the same name.

Then, remove the data directory and change the schema.xml:

rm -R data
nano conf/schema.xml

Paste your own schema.xml in here.

 

 

 

Write and run a bash file

This post introduces how to writ and run a bash file from terminal on Ubuntu. (If you prefer video style tutorials, check here for a post that is video based.)

The bash ( Bourne Again Shell) is the most common shell installed with Linux distributions and Mac OS.

  • Write a bash file

the most common is to write a file, make sure the first line is

#!/bin/bash

Then save the file. Next mark it executable using chmod +x file

Then when you click (or run the file from the terminal) the commands will be executed. By convention these files usually have no extension, however you can make then end in .sh or any other way.

For example,

#!/bin/bash          
echo Hello World

A Simple Bash Example

#!/bin/bash  
echo "This is a shell script"  
ls -lah  
echo "I am done running ls"  
SOMEVAR='text stuff'  
echo "$SOMEVAR"  
  • Run a bash file

go to the folder where you bash file is located and type:

./yourbashfile.sh

./ just means that you should call the script located in the current directory. (Alternatively, just type the full path of the yourbashfile.sh). If it doesn’t work then, check if yourbashfile.sh has execute permissions.

You can add execute permission by the following command:

$ chmod +x yourbashfile.sh

References:

Why Bother?

Why do you need to learn the command line anyway? Well, let me tell you a story. A few years ago we had a problem where I used to work. There was a shared drive on one of our file servers that kept getting full. I won’t mention that this legacy operating system did not support user quotas; that’s another story. But the server kept getting full and it stopped people from working. One of our software engineers spent the better part of a day writing a C++ program that would look through all the user’s directories and add up the space they were using and make a listing of the results. Since I was forced to use the legacy OS while I was on the job, I installed a Linux-like command line environment for it. When I heard about the problem, I realized I could do all the work this engineer had done with this single line:

du -s * | sort -nr > $HOME/user_space_report.txt

Graphical user interfaces (GUIs) are helpful for many tasks, but they are not good for all tasks. I have long felt that most computers today are not powered by electricity. They instead seem to be powered by the “pumping” motion of the mouse! Computers were supposed to free us from manual labor, but how many times have you performed some task you felt sure the computer should be able to do but you ended up doing the work yourself by tediously working the mouse? Pointing and clicking, pointing and clicking.

I once heard an author say that when you are a child you use a computer by looking at the pictures. When you grow up, you learn to read and write. Welcome to Computer Literacy 101. Now let’s get to work.

Contents

  1. What Is “The Shell”?
  2. Navigation
  3. Looking Around
  4. A Guided Tour
  5. Manipulating Files
  6. Working With Commands
  7. I/O Redirection
  8. Expansion
  9. Permissions
  10. Job Control

Here Is Where The Fun Begins

With the thousands of commands available for the command line user, how can you remember them all? The answer is, you don’t. The real power of the computer is its ability to do the work for you. To get it to do that, we use the power of the shell to automate things. We write shell scripts.

What Are Shell Scripts?

In the simplest terms, a shell script is a file containing a series of commands. The shell reads this file and carries out the commands as though they have been entered directly on the command line.

The shell is somewhat unique, in that it is both a powerful command line interface to the system and a scripting language interpreter. As we will see, most of the things that can be done on the command line can be done in scripts, and most of the things that can be done in scripts can be done on the command line.

We have covered many shell features, but we have focused on those features most often used directly on the command line. The shell also provides a set of features usually (but not always) used when writing programs.

Scripts unlock the power of your Linux machine. So let’s have some fun!

Contents

  1. Writing Your First Script And Getting It To Work
  2. Editing The Scripts You Already Have
  3. Here Scripts
  4. Variables
  5. Command Substitution And Constants
  6. Shell Functions
  7. Some Real Work
  8. Flow Control – Part 1
  9. Stay Out Of Trouble
  10. Keyboard Input And Arithmetic
  11. Flow Control – Part 2
  12. Positional Parameters
  13. Flow Control – Part3
  14. Errors And Signals And Traps (Oh My!) – Part 1
  15. Errors And Signals And Traps (Oh My!) – Part 2

https://askubuntu.com/questions/223691/how-do-i-create-a-script-file-for-terminal-commands/223698

http://stackoverflow.com/questions/17015449/how-do-i-run-sh-or-bat-files-from-terminal

Ubuntu – Shell script to execute/run (pdf)

wikiHow to Write a Shell Script Using Bash Shell in Ubuntu

How to create & execute a script file [closed]

Advanced Bash-Scripting Guide (An in-depth exploration of the art of shell scripting) by Mendel Cooper

 

Install Oracle Java 8 with PPA on Ubuntu 16.04

This post provides the instructions to install Oracle JDK 8 on Ubuntu 16.04. (Notes: Do not install JDK 9 yet, JDK 8 is the latest most stable version.)

(If you are not sure which JDK — OpenJDK or Oracle JDK — to install, check this post for the main difference between them.)

The PPA of Oracle Java for Ubuntu is being maintained by Webupd8 Team. JAVA 8 is released with many of new features and security updates, read more about whats new in Oracle Java 8.

  • Add Oracle’s PPA, then update your package repository.

We need to add webupd8team Java PPA repository onto our system. Then install Oracle Java 8 by issuing the following commands.

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Note that when issuing the command:
sudo add-apt-repository ppa:webupd8team/java
if you get the error:
sudo: add-apt-repository: command not found
do the following:
sudo apt-get install software-properties-common
And then rerun adding your repository.

Note that it is possible to install multiple Java installations on one machine, and set one of installed versions as the default. Check out How To Install Java with Apt-Get on Ubuntu 16.04 (April 23, 2016)  (pdf), in particular the “Managing Java” section.

  • Verify Installed Java Version

After successfully installing Oracle Java, use the following to verify what version we installed.

$ java -version 

java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
  • Configuring Java Environment and Set the JAVA_HOME Environment Variable

We also need to install java configuration package. The package should come with the latest operating systems during installation of JAVA packages. But it does no harm to run the following command to be sure we have it installed on our machine.

$ sudo apt-get install oracle-java8-set-default

Many programs use the JAVA_HOME environment variable to determine the Java installation location.

Copy the path from your preferred installation and then open /etc/environment configuration file using  nano or your favorite text editor, to set JAVA_HOME environment variable.

sudo nano /etc/environment

At the end of this file, add the following line, making sure to replace the highlighted path with your own copied path.


JAVA_HOME=/usr/lib/jvm/java-8-oracle

Save and close the file and exit nano editor environment. (Note: Ctrl+O to save the file, and then hit Enter, and then Ctrl +X to close and exit the file.)

Use the following command to reload the file.

  • source /etc/environment

You can now test whether the environment variable has been set by issuing the following command:

echo $JAVA_HOME

This will return the path you just set.

  • Conclusion

We have now installed Java 8 on our system and set it as default. We can now install software which runs on Java, such as Tomcat and Solr.

 

References:

How To Install Java with Apt-Get on Ubuntu 16.04 (April 23, 2016)  (pdf)

This is a very good post, it introduced the installation of both OpenJDK and Oracle JDK 6/7/8/9

How to Install Oracle JAVA 8 (JDK/JRE 8u121) on Ubuntu & LinuxMint with PPA (Mar 29, 2017 by Rahul K.)  – pdf

 

Using Apache Solr with Python

This post provides the instructions to use Apache Solr with Python in different ways.

======using Pysolr

Below are two small python snippets that the author of the post used for testing writing to and reading from a new SOLR server.

The script below will attempt to add a document to the SOLR server.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('http://some-solr-server.com:8080/solr/', timeout=10)

# How you would index data.
solr.add([  
    {
        "id": "doc_1",
        "title": "A very small test document about elmo",
    }
])

The snippet below will attempt to search for the document that was just added from the snippet above.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('http://some-solr-server.com:8080/solr/', timeout=10)

results = solr.search('elmo')

print("Saw {0} result(s).".format(len(results)))  

 

======GitHub repos

pysolr is a lightweight Python wrapper for Apache Solr. It provides an interface that queries the server and returns results based on the query.

install Pysolr using pip

pip install pysolr

Multicore Index

Simply point the URL to the index core:

# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)

SolrClient is a simple python library for Solr; built in python3 with support for latest features of Solr.

Components of SolrClient

 

References:

 

Apache Solr resources

Elasticsearch and Apache Solr are open source search engines, and they are the most widely used search servers. This post provides resources about Apache Solr.

Apache Solr is a fast open-source Java search server.

Solr enables you to easily create search engines which searches websites, databases and files.

Solr (pronounced “solar”) is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is the second-most popular enterprise search engine after Elasticsearch.

Solr runs as a standalone full-text search server. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages. Solr’s external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.

An Elasticsearch / Apache Solr index is the equivalent of a SQL table.

An Elasticsearch or Solr server (aka Solr instance, aka Solr engine) can maintain several indexes.

(Elasticsearch index configuration is done with HTTP / JSON commands. No files required. You define types, mappings, analysis with simple commands.)

In Apache Solr, each index is defined by a schema.xml file (it’s not mandatory in Solr 5/6, but recommended in production), and a solrconfig.xml file. The index schema is equivalent to a SQL table schema definition.  (See this post for Solr Schema related resources.)

An index contains several documents, equivalent to SQL table rows. Each document contains fields, equivalent to SQL table columns.

When an index document is inserted/updated/deleted, we say it is “indexed”.

To retrieve documents from an index, Elasticsearch (json) / Apache Solr (xml, json) provide an http API, with a proprietary syntax.

Elasticsearch and Apache Solr are web applications. A client will use their http API to query or store data.

A full-text search engine is built from the ground to tackle problems that a SQL search find difficult or impossible. The list of those features is huge: multi-language, dedicated plugins to extend the engine, synonyms, stop words, facets, boosts, …

The core search engine of Elasticsearch and Apache Solr is Apache LuceneThe relationship between Elasticsearch / Apache Solr and Lucene, is like that of the relationship between a car and its engine.

You can access Solr admin from your browser: http://localhost:8983/solr/

use the port number used in installation.

See below for some useful Solr related resources:

Check out his Unofficial Solr Guide (e.g., Solr 6.5 Features)

Configuring

Integrating Solr

Parallel Programming using MPI in Python

This post introduces Parallel Programming using MPI in Python.

The library is mpi4py (MPI and python extensions of MPI), see here for its code repo on bitbucket.

Laurent Duchesne provides an excellent step-by-step guide for parallelizing your Python code using multiple processors and MPI. Craig Finch has a more practical example for high throughput MPI on GitHub. See here for more mpi4py examples from Craig Finch.

An example of TensorFlow using MPI can be found here.

References:

 

OpenJDK or Oracle JDK? What is the main difference?

This post introduces what is OpenJDK and Oracle JDK and what is the difference and which one should we use on Ubuntu.

Both OpenJDK and Oracle JDK are created and maintained currently by Oracle only.

OpenJDK is the default version of Java that Ubuntu uses and is the easiest to install while Oracle Java 7/8 is Oracle’s own version of Java 7/8.

It entirely depends on the target platform on which you want to run JDK. Technical differences are a consequence of the goal of each one (OpenJDK is meant to be the reference implementation, open to the community, while Oracle is meant to be a commercial one)

They both have “almost” the same code of the classes in the Java API; but the code for the virtual machine itself is actually different, and when it comes to libraries, OpenJDK tends to use open libraries while Oracle tends to use closed ones.

OpenJDK was reported to work better for large number of users with small request count, while it become worse for small number of user with prolonged. This is an undocumented behaviour, and never seen anywhere other than experienced on some J2EE containers.

My conclusion:
I choose to install Oracle JDK, since there were complaints about using OpenJDK would meet bugs sometimes. (See this post if you decide to install Oracle Java 8 with PPA on Ubuntu.)

References:

Which Java package should I use: OpenJDK or Oracle JDK?

Performance OracleJDK or OpenJDK (pdf)

OpenJDK – Oracle is better? (pdf)

Is there any advantage of installing OpenJDK instead of Oracle Java Platform, Standard Edition on Ubuntu? (pdf)

 

Run R scripts from the command line on Ubuntu

Running R scripts from the command line can be a powerful way to:

  • Automate your R scripts
  • Integrate R into production
  • Call R through other tools or systems
There are basically two Linux commands that are used:
  1. RScript (preferred)
  2. The older command is  R CMD BATCH.

A better way to run R scripts in batch mode is Rscript, and its comes with R.

See the example below to see the difference between using RScript and R CMD BATCH.

Save 

print("hello world")

as a r script file and name it helloworld.r, and then run it in your terminal:

(Be sure to first cd to the path where you saved the helloworld.r file.)

then type the commands (the lines below in bold) to your temrinal

$ Rscript helloworld.r
[1] "hello world"
$ R CMD BATCH helloworld.r
$

We can see that Rscript directly  output to the terminal , and R CMD BATCH has done nothing. But actually, R CMD BATCH has written its output to a file called helloworld.r.Rout(it is located at the same place where you put helloworld.r), and that output includes both the commands and output, just like in interactive mode, along with some runtime stats:

> print("hello world")
[1] "hello world"
> 
> 
> proc.time()
   user  system elapsed 
  0.080   0.004   0.113

You can call these directly from the command line or integrate them into a bash script. You can also call these from any job scheduler.

Note, these are R related tools. The RStudio IDE does not currently come with tools that enhance or manage the RScript and R CMD BATCH functions. However, there is a shell built into the IDE and you could conceivably call these commands from there.

 The alternative to the using the Linux command line is to use the source() function inside of R. The source function will also call a script, but you have to be inside an R session to use it.

References:

How to run R scripts from the command line (Nathan Stephens on January 02, 2017)

Running R batch mode on Linux (pdf)

RScript man page

Setup R environment on Ubuntu 16.04 (R-Base and RStudio)

This post provides instructions for installing R-Base and RStudio on Ubuntu 16.04.

  • Install R-Base

You can find R-Base in the Software Center; this would be the easy way to do it. However, the Software Center versions are often out of date, which can be a pain moving foward when your packages are based on the most current version of R Base. The easy fix is to download and install R Base directly from the Cran servers.

1. Add R repository

First, we’ve got to add a line to our /etc/apt/sources.list file. This can be accomplished with the following. Note the “xenial” in the line, indicating Ubuntu 16.04. If you have a different version, just change that.

sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list

2. Add R to Ubuntu Keyring

First:

 gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9

Then:

 gpg -a --export E084DAB9 | sudo apt-key add -

3. Install R-Base

sudo apt-get update
sudo apt-get install r-base r-base-dev

 

If you would like to use R in IDE like RStudio, See below for the instructions.

  • Installing RStudio

Use CTRL + ALT + T to open your terminal, then use the commands below. If you would like to install the latest version, just change the link info after the wget command. (Note that you can get latest RStudio download link at here. See the picture below the install commands to see how to get the latest version of RStudio for you. Be sure to revise the command part associated with the version you would like to install accordingly, which I highlight in red and italic below.)

# Download and Install RStudio
sudo apt-get install gdebi-core
wget https://download1.rstudio.org/rstudio-1.0.136-amd64.deb
sudo gdebi rstudio-1.0.136-amd64.deb
rm rstudio-1.0.136-amd64.deb

References:

How to Install R on Linux Ubuntu 16.04 Xenial Xerus (April 26, 2016 By Kris Eberwein)

Install R and RStudio on Ubuntu 12.04/14.04/16.04 (Michael Galarnyk on Dec 17, 2016 )