Install and run IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (Desktop and Remote Server)

This post introduces how to install IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (both local Desktop and remote server.)

Step 0: install virtualenv and setup virtualenv environment

If you have not installed virtualenv yet, you need to do so before proceed.

Check my post for more details about how to setup python virtual environment and why it is better to install python libraries in Python virtual environment.

  • Install pip and Virtualenv for python 2.x and python 3.x:
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev
  • Create a Virtualenv environment in the directory for python2.x and python 3.x:
#for python 2.x
virtualenv --system-site-packages -p python ~/ipy-jupyter-venv

# for python 3.x 
virtualenv --system-site-packages -p python3 ~/ipy-jupyter-venv3

(Note: To delete a virtual environment, just delete the corresponding folder.  For example, In our cases, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)

Step 1: Install IPython

Before installing IPython and Jupyter, be sure to activate your python virtual environment first.

# for python 2.x
$ source ~/ipy-jupyter-venv/bin/activate  # If using bash
(ipy-jupyter-venv)$  # Your prompt should change

# for python 3.x
$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash
(ipy-jupyter-venv3)$  # Your prompt should change

Use the following command to install IPython

#for python 2.x

(ipy-jupyter-venv) liping:~$ pip install ipython

#for python 3.x

(ipy-jupyter-venv3) liping:~$ pip3 install ipython

Step 2: Install Jupyter

Use the following command to install Jupyter Notebook

#for python 2.x 

(ipy-jupyter-venv) liping:~$ pip install jupyter

#for python 3.x 

(ipy-jupyter-venv3) liping:~$ pip3 install jupyter

Step 3: Test

#for python 2.x


(ipy-jupyter-venv) liping:~$ which python
/home/liping/ipy-jupyter-venv/bin/python


#for python 3.x
(ipy-jupyter-venv3) liping:~$ which python3

/home/liping/ipy-jupyter-venv3/bin/python3


#for python 2.x
(ipy-jupyter-venv) liping:~$ which ipython

/home/liping/ipy-jupyter-venv/bin/ipython

#for python 3.x
(ipy-jupyter-venv3) liping:~$ which ipython3

/home/liping/ipy-jupyter-venv3/bin/ipython3

#for python 2.x
(ipy-jupyter-venv) liping:~$ which jupyter-notebook

/home/liping/ipy-jupyter-venv/bin/jupyter-notebook

#for python 3.x
(ipy-jupyter-venv3) liping:~$ which jupyter-notebook

/home/liping/ipy-jupyter-venv3/bin/jupyter-notebook

Step 4: Add Kernel

The Jupyter Notebook and other frontends automatically ensure that the IPython kernel is available. However, if you want to use a kernel with a different version of Python, or in a virtualenv or conda environment, you’ll need to install that manually. 

We are using virutalenv, so we need to install IPython kernel in the virtualenv we created in Step 0 above.

(ipy-jupyter-venv) liping:~$  python -m ipykernel install --user --name myipy_jupter_env --display-name "ipy-jupyter-venv"

Installed kernelspec myipy_jupter_env in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env

(ipy-jupyter-venv3) liping:~$  python3 -m ipykernel install --user --name myipy_jupter_env3 --display-name "ipy-jupyter-venv3"

Installed kernelspec myipy_jupter_env3 in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env3

Step 5: Run Jupyter Notebook

#for python2.x

 (ipy-jupyter-venv) liping:~$ jupyter-notebook

#for python 3.x

 (ipy-jupyter-venv3) liping:~$ jupyter-notebook

If you are running Jupyter Notebook on a local Linux computer (not on a remote server), you can simply navigate to localhost:8888 to connect to Jupyter Notebook. If you are running Jupyter Notebook on a remote server, you will need to connect to the server using SSH tunneling as outlined in the Step 5-2 below.

At this point, you can keep the SSH connection open and keep Jupyter Notebook running or can exit the app and re-run it once you set up SSH tunneling. Let’s keep it simple and stop the Jupyter Notebook process. We will run it again once we have SSH tunneling working. To stop the Jupyter Notebook process, press CTRL+C, type Y, and hit ENTER to confirm. The following will be displayed:

[C 08:08:04.232 NotebookApp] Shutdown confirmed

[I 08:08:04.232 NotebookApp] Shutting down 0 kernels

Step 5-2: Connecting to a remote Server Using SSH Tunneling

This step is only for those who are connecting to a Jupyter Notebook installed on a remote server.

Now we will learn how to connect to the Jupyter Notebook web interface using SSH tunneling. Since Jupyter Notebook is running on a specific port on the remote server (such as :8888:8889 etc.), SSH tunneling enables you to connect to the remote server’s port securely.

Below I will describe how to create an SSH tunnel from  a Mac or Linux (Windows users can check step 4 introduced at here). Note the following instructions in this step refer to your local computer (not the remote server).

SSH Tunneling with a Mac or Linux

Open a new terminal window on your Mac or Linux.

Issue the following ssh command to start SSH tunneling:

$ ssh -L 8000:localhost:8888 your_server_username@your_server_ip

Note: The ssh command opens an SSH connection, but -L specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote server. This means that whatever is running on the second port number (i.e. 8888) on the remote server will appear on the first port number (i.e. 8000) on your local computer. You should change 8888 to the port which Jupyter Notebook is running on. Optionally change port 8000 to one of your choosing (for example, if 8000 is used by another process). Use a port greater or equal to 8000 (ie 80018002, etc.) to avoid using a port already in use by another process. 

If no error shows up after running the ssh -L command, now be sure to activate your python virtual environment first.

# for python 2.x 

$ source ~/ipy-jupyter-venv/bin/activate  # If using bash (ipy-jupyter-venv)$  # Your prompt should change 

# for python 3.x 

$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash (ipy-jupyter-venv3)$  # Your prompt should change

you can run Jupyter Notebook by issuing the following command:

#for python2.x  
(ipy-jupyter-venv) liping:~$ jupyter-notebook 

#for python 3.x  
(ipy-jupyter-venv3) liping:~$ jupyter-notebook

Now, from a web browser on your local machine, open the Jupyter Notebook web interface with http://localhost:8000 (or whatever port number you chose above when you ssh -L into your remote server).

Note: see the following instructions for how to change the startup folder of your Jupyter notebook.

  • In your terminal window
  • Enter the startup folder by typing cd /some_folder_name.
  • Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).

Step 6 — Using Jupyter Notebook

By this point you should have Jupyter Notebook running, and you should be connected to it using a web browser. Jupyter Notebook is very powerful and has many features. Below I will outline a few of the basic features to get you started using the notebook. Automatically, Jupyter Notebook will show all of the files and folders in the directory it is run from.

To create a new notebook file, select New > Python 3 or New > ipy-jupyter-venv3 from the top right pull-down menu (Note: this is the so called kernel we installed in Step 4 above):

This will open a notebook. We can now run Python code in the cell or change the cell to markdown. For example, change the first cell to accept Markdown by clicking Cell > Cell Type > Markdown from the top navigation bar. We can now write notes using Markdown and even include equations written in LaTeX by putting them between the $$ symbols. For example, type the following into the cell after changing it to markdown:

# Simple Equation

Let us now implement the following equation:
$$ y = x^2$$

where $x = 2$

To turn the markdown into rich text, press CTRL+ENTER, and the following should be the results:

Note: You can use the markdown cells to make notes and document your code.

Now Let’s implement that simple equation and print the result. Select Insert > Insert Cell Below to insert and cell and enter the following code:

#for python 2.x
x = 2
y = x*x
print y

#for pyton 3.x
x = 2
y = x*x
print (y)

To run the code, press CTRL+ENTER. The following should be the results:

You now can include python libraries and use the notebook as you would with any other Python development environment! 

You should be now able to write reproducible Python code and notes using markdown using Jupyter notebook running on a remote server. To get a quick tour of Jupyter notebook, select Help > User Interface Tour from the top navigation menu.

Happy learning and coding!

Step 7: Deactivate your virtualenv

Each time you would like to use iPython and Jupyter, you need to activate the virtual environment into which it installed, and when you are done using iPython and Jupyter, deactivate the environment.

# for python 2
(ipy-jupyter-venv)$ deactivate
$  # Your prompt should change back

#for python 3
(ipy-jupyter-venv3)$ deactivate
$  # Your prompt should change back

Note: To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)

References:

Intalling Jupyter in a virtualenv (pdf)

Running iPython cleanly inside a virtualenv (pdf)

Using a virtualenv in an IPython notebook

Installing the IPython kernel

How To Set Up a Jupyter Notebook to Run IPython on Ubuntu 16.04 (pdf)

 Running the Jupyter Notebook (Change Jupyter Notebook startup folder (OS X))

 

 

 

Save figure as a pdf file in Python

This post introduces how to save a figure as a pdf file in Python using Matplotlib.

When using savefig, the file format can be specified by the extension:

savefig('foo.png')
savefig('foo.pdf')

The first line of code above will give a rasterized  output and the second will give a vectorized output.

In addition, you’ll find that pylab leaves a generous, often undesirable, whitespace around the image. You can remove it with:

savefig('foo.png', bbox_inches='tight')

You can also use  figure to set dpi of the figure:

import numpy as np
import matplotlib.pyplot as plt

fig  = plt.figure(figsize=(1,4),facecolor = 'red', dpi=100)
plt.savefig('test.png', dpi=100)

plt.show(fig)

 

 

Using Apache Solr with Python

This post provides the instructions to use Apache Solr with Python in different ways.

======using Pysolr

Below are two small python snippets that the author of the post used for testing writing to and reading from a new SOLR server.

The script below will attempt to add a document to the SOLR server.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('http://some-solr-server.com:8080/solr/', timeout=10)

# How you would index data.
solr.add([  
    {
        "id": "doc_1",
        "title": "A very small test document about elmo",
    }
])

The snippet below will attempt to search for the document that was just added from the snippet above.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('http://some-solr-server.com:8080/solr/', timeout=10)

results = solr.search('elmo')

print("Saw {0} result(s).".format(len(results)))  

 

======GitHub repos

pysolr is a lightweight Python wrapper for Apache Solr. It provides an interface that queries the server and returns results based on the query.

install Pysolr using pip

pip install pysolr

Multicore Index

Simply point the URL to the index core:

# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)

SolrClient is a simple python library for Solr; built in python3 with support for latest features of Solr.

Components of SolrClient

 

References:

 

Make a request to REST API using Python

This post introduces how to make a request to REST API using Python.

requests package is the commonly used one (its GitHub repo).

You can try out requests online here at codecademy, and here at runnable.com

Look at this post for a great tutorial using Requests with Python to make a request to REST API: Python API tutorial – An Introduction to using APIs (pdf) – a very good, comprehensive, and detailed tutorial.

To install Requests, simply:

$ pip install requests

See below for a simple example to make a request to REST API.

#Python 2.7
#RestfulClient.py

import requests
from requests.auth import HTTPDigestAuth
import json

# Replace with the correct URL
url = "http://api_url"

# It is a good practice not to hardcode the credentials. So ask the user to enter credentials at runtime
myResponse = requests.get(url,auth=HTTPDigestAuth(raw_input("username: "), raw_input("Password: ")), verify=True)
#print (myResponse.status_code)

# For successful API call, response code will be 200 (OK)
if(myResponse.ok):

    # Loading the response data into a dictionary variable
    # json.loads takes in only binary or string variables so using content to fetch binary content
    # Loads (Load String) takes a Json file and converts into python data structure (dictionary or list, depending on JSON)
    jData = json.loads(myResponse.content)
    #jData = json.loads(myResponse2.content, 'utf-8') #use this line if your data contains special characters

    print("The response contains {0} properties".format(len(jData)))
    print("\n")
    for key in jData:
        print key + " : " + jData[key]
else:
  # If response code is not ok (200), print the resulting http error code with description
    myResponse.raise_for_status()

 

======working with JSON data

For example, if data.json file looks like this:

{
 "maps":[
         {"id":"blabla","iscategorical":"0"},
         {"id":"blabla","iscategorical":"0"}
        ],
"masks":
         {"id":"mask-value"},
"om_points":"value",
"parameters":
         {"id":"blabla3"}
}

The python code should be something looks like this:

import json

with open('data.json') as data_file:    
    data = json.load(data_file)
print(data)

We can now  access single values in the json file — see below for some examples to get a sense of it:

data["maps"][0]["id"]  # will return 'blabla'
data["masks"]["id"]    # will return 'mask-value'
data["om_points"]      # will return 'value'

References:

Parallel Programming using MPI in Python

This post introduces Parallel Programming using MPI in Python.

The library is mpi4py (MPI and python extensions of MPI), see here for its code repo on bitbucket.

Laurent Duchesne provides an excellent step-by-step guide for parallelizing your Python code using multiple processors and MPI. Craig Finch has a more practical example for high throughput MPI on GitHub. See here for more mpi4py examples from Craig Finch.

An example of TensorFlow using MPI can be found here.

References:

 

Image format conversion and change of image quality and size using Python

The code to read tif files in a folder and convert them into jpeg automatically (with the same or reduced quality, and with reduced size).

See here for a pretty good handbook of Python Imaging Library (PIL).

import os
from PIL import Image

current_path = os.getcwd()
for root, dirs, files in os.walk(current_path, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        #if os.path.splitext(os.path.join(root, name))[1].lower() == ".tiff":
        if os.path.splitext(os.path.join(root, name))[1].lower() == ".tif":
            if os.path.isfile(os.path.splitext(os.path.join(root, name))[0] + ".jpg"):
                print "A jpeg file already exists for %s" % name
            # If a jpeg with the name does *NOT* exist, covert one from the tif.
            else:
                outputfile = os.path.splitext(os.path.join(root, name))[0] + ".jpg"
                try:
                    im = Image.open(os.path.join(root, name))
                    print "Converting jpeg for %s" % name
                    im.thumbnail(im.size)
                    im.save(outputfile, "JPEG", quality=100)
                except Exception, e:
                    print e

The above code will covert tif files to a jpg file with the same or reduced (change quality number less then 100 to reduce the quality) quality (file size), but the resulted image will keep the same size (the height and width of the image).

To covert tif file to a reduce size, use attribute size and method resize  provided by PIL.

Note that in resize() method, be sure to use ANTIALIAS filter (ANTIALIAS  is a high-quality downsampling filter) unless speed is much more important than quality. The bilinear and bicubic filters in the current version of PIL are not well-suited for large downsampling ratios (e.g. when creating thumbnails).

im = Image.open("my_image.jpg")
size =im.size   # get the size of the input image
ratio = 0.9  # reduced the size to 90% of the input image
reduced_size = int(size[0] * ratio), int(size[1] * ratio)     

im_resized = im.resize(reduced_size, Image.ANTIALIAS)
im_resized.save("my_image_resized.jpg", "JPEG")
#im_resized.save(outputfile, "JPEG", quality=100) # uncomment this line if you want to reduce image size without quality loss.
#im_resized.save(outputfile, "JPEG", quality=100, optimize=True) # uncomment this line if you want to optimize the result image.

Code snippet to save as different dpi

from PIL import Image 
im = Image.open("test.jpg") 
im.save("test_600.jpg", dpi=(600,600) )

Referenced materials:

The Image Module (resize, size, save)

os.path document

Using Python to Reduce JPEG and PNG Image File Sizes Without Loss of Quality (Jan 3, 2015)

http://stackoverflow.com/questions/28870504/converting-tiff-to-jpeg-in-python

Install gensim and nltk into Virtualenv

This post introduces how to install gensim and nltk into a virtualenv. It is always a good strategy to install some package(s)/library(s) you often use (together) into a separate virtualenv, so it will not be interrupted by other libraries (because different libraries may depend on different versions of another libraries).

$ sudo apt-get install python-pip python-dev python-virtualenv
  • Create a Virtualenv environment in the directory for python  ~/gensim-venv:

Note that you can setup your virtualenv in any directory you want — just replace the ~/gensim-venv with your preferred directory path. But be sure to change it accordingly when you follow the rest of the tutorial.

$ virtualenv --system-site-packages -p python ~/gensim-venv # for python 

$ virtualenv --system-site-packages -p python3 ~/gensim-venv3 # for python 3, it is better to add "venv3" when naming your virtualenv, so you know it is for python3

The --system-site-packages Option

If you build with virtualenv --system-site-packages ENV, your virtual environment will inherit packages from /usr/lib/python2.7/site-packages(or wherever your global site-packages directory is).

This can be used if you have control over the global site-packages directory, and you want to depend on the packages there. If you want isolation from the global system, do not use this flag (and note that if you do not use the “system-site-packages” flag, you NEED to install the version of the python you need in the virtualenv that you just created)

  • Activate the virtual environment:
$ source ~/gensim-venv/bin/activate  # If using bash
(gensim-venv)$  # Your prompt should change
  • Install gensim in the virtualenv for python:
pip install --upgrade gensim
  • After the install you will activate the Virtualenv environment each time you want to use gensim.
  • With the Virtualenv environment activated, you can now test your gensim installation.
  •  When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate

$  # Your prompt should change back

To use gensim later you will have to activate the Virtualenv environment again:

$ source ~/gensim-venv/bin/activate  # If using bash.

(gensim-venv)$  # Your prompt should change.
# Run Python programs that use gensim.
...
# When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate
  • To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf gensim-venv.)
  • To install nltk into the virtualenv gensim-venv, just issue this command after you activated the gensim-venv
(gensim-venv)$ pip install -U nltk

Note: you may notice that those commands issued inside a virtualenv (i.e., after you create a virtualenv and activated it), you do not need to add sudo before the commands. Because all those commands will all be confined within that virtualenv folder you created – that is why  it will not affect other packages installed in other virtualenv or system-wide, and it will not be affected by other packages installed in another virtualenv(s). You can create as many as virtualenv you want, so please naming your virtualenv with meaning name, otherwise, over time, we will forget what was actually being installed in some virtualenv:)

 

Books and Courses and Tutorials: Python

This page provides some useful resources (books and courses) with Python. (See this post for text processing related resources, and this post for a collection of python packages and related resources)

(I have created a page for this topic, so stop editing this post, BUT KEEP this post, because not every book on this page has been moved to the page.)

A collection of useful Python packages and related resources

This page provides some useful Python packages and related resources. (See this post for how to install them on Ubuntu, and this post for books and courses resources about using Python. This post for text processing related resources. See this post for python libraries for NLP.)

Integrate the contents in this post to here, and then delete that post.

(Stay tuned, the list is growing over time)

Check this link (pdf) and organize the libraries to this post and to the post this post for NLP.

NumPy is a Python package which provides an array data structure that is widely used in the Python community to represent a two-dimensional table of data. Such a structure is useful in representing a text corpus as a table of document-term frequencies.

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures.

 

References:

Read file from line 2 or skip header row in Python

This post introduces how to read file from line 2 in Python.

  • method 1:
with open(fname) as f:
  next(f)
  for line in f:
    #do something

Note: If you need the header later, instead of next(f) use f.readline() and store it as a variable.
Or use header_line = next(f).

  • method 2
f = open(fname,'r')
lines = f.readlines()[1:]
f.close()

This will skip 1 line. for example, [‘a’, ‘b’, ‘c’][1:] => [‘b’, ‘c’]