Matplotlib default figure size

This post introduces how to check the default figure  size in Matplotlib, and how to change the figure size.

The default value is [8.0, 6.0] which can be changed of course.
To know all the default values just inspect the value of ‘rcParams’

print(plt.rcParams)  # it will tell you all default setting in Matplotlib

To change the figure size.

you can use the following:


fig, ax = plt.subplots(figsize=(20, 10))

Using Apache Solr with Python

This post provides the instructions to use Apache Solr with Python in different ways.

======using Pysolr

Below are two small python snippets that the author of the post used for testing writing to and reading from a new SOLR server.

The script below will attempt to add a document to the SOLR server.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('', timeout=10)

# How you would index data.
        "id": "doc_1",
        "title": "A very small test document about elmo",

The snippet below will attempt to search for the document that was just added from the snippet above.

# Using Python 2.X
from __future__ import print_function  
import pysolr

# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('', timeout=10)

results ='elmo')

print("Saw {0} result(s).".format(len(results)))  


======GitHub repos

pysolr is a lightweight Python wrapper for Apache Solr. It provides an interface that queries the server and returns results based on the query.

install Pysolr using pip

pip install pysolr

Multicore Index

Simply point the URL to the index core:

# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)

SolrClient is a simple python library for Solr; built in python3 with support for latest features of Solr.

Components of SolrClient




Make a request to REST API using Python

This post introduces how to make a request to REST API using Python.

requests package is the commonly used one (its GitHub repo).

You can try out requests online here at codecademy, and here at

Look at this post for a great tutorial using Requests with Python to make a request to REST API: Python API tutorial – An Introduction to using APIs (pdf) – a very good, comprehensive, and detailed tutorial.

To install Requests, simply:

$ pip install requests

See below for a simple example to make a request to REST API.

#Python 2.7

import requests
from requests.auth import HTTPDigestAuth
import json

# Replace with the correct URL
url = "http://api_url"

# It is a good practice not to hardcode the credentials. So ask the user to enter credentials at runtime
myResponse = requests.get(url,auth=HTTPDigestAuth(raw_input("username: "), raw_input("Password: ")), verify=True)
#print (myResponse.status_code)

# For successful API call, response code will be 200 (OK)

    # Loading the response data into a dictionary variable
    # json.loads takes in only binary or string variables so using content to fetch binary content
    # Loads (Load String) takes a Json file and converts into python data structure (dictionary or list, depending on JSON)
    jData = json.loads(myResponse.content)
    #jData = json.loads(myResponse2.content, 'utf-8') #use this line if your data contains special characters

    print("The response contains {0} properties".format(len(jData)))
    for key in jData:
        print key + " : " + jData[key]
  # If response code is not ok (200), print the resulting http error code with description


======working with JSON data

For example, if data.json file looks like this:


The python code should be something looks like this:

import json

with open('data.json') as data_file:    
    data = json.load(data_file)

We can now  access single values in the json file — see below for some examples to get a sense of it:

data["maps"][0]["id"]  # will return 'blabla'
data["masks"]["id"]    # will return 'mask-value'
data["om_points"]      # will return 'value'


Lambda, map, filter, and reduce functions in python 3

After migration to Python 3 from Python 2,  lambda operator, map() and filter()  functions are still part of core Python; only reduce() function had to go, and it was moved into the module functools

This post introduces how to use lambda, map, filter, and reduce functions in Python 3 (for python 2.7 version, check the references below.)

  • Lambda operator

Some people like it, others hate it and many are afraid of the lambda operator.

The lambda operator or lambda function is a way to create small anonymous functions (i.e., functions without a name). These functions are throw-away functions (i.e., they are just needed where they have been created).

Lambda functions are mainly used in combination with the functions filter(), map() and reduce(). The lambda feature was added to Python due to the demand from Lisp programmers. 

The general syntax of a lambda function is quite simple: 

lambda argument_list: expression 

The argument list consists of a comma separated list of arguments and the expression is an arithmetic expression using these arguments. You can assign the function to a variable, so you can  use it as a function. 

The following example of a lambda function returns the sum of its two arguments:

>>> sum = lambda x, y : x + y
>>> sum(3,4)


The above example might look like a game for a mathematician — A formalization that turns a straight forward operation into an abstract  formalization.

The above has the same effect by using the following conventional function definition: 

>>> def sum(x,y):
...     return x + y
>>> sum(3,4)

But, when you learn how to use the map() function, you will see the apparent advantages of this lambda operation.

  • The map function 

The advantage of the lambda operator will be obvious when it is used in combination with the map() function. 

map() is a function which takes two arguments: 

r = map(func, seq)

The first argument func is the name of a function and the second a sequence (e.g. a list).

seqmap() applies the function func to all the elements of the sequence seq. Before Python3, map() used to return a list, where each element of the result list was the result of the function func applied on the corresponding element of the list or tuple “seq”. In Python 3, map() returns an iterator.

The following examples illustrate how map() function works:

>>> def fahrenheit(T):

...   return ((float(9)/5)*T + 32)

# hit Return/Enter to exit to the >>> in your terminal.

>>> def celsius(T):

...   return (float(5)/9)*(T-32)

# hit Return/Enter to exit to the >>> in your terminal.

>>> temperatures = (36.5, 37, 37.5, 38, 39)

>>> F = map(fahrenheit, temperatures)

>>> print(F)

<map object at 0x106d1c3c8>

>>> temperatures_in_Fahrenheit = list(F)

>>> print(temperatures_in_Fahrenheit) 
[97.7, 98.60000000000001, 99.5, 100.4, 102.2]

>>> C = map(celsius, map(fahrenheit, temperatures))

>>> print(C)

<map object at 0x106d1c438>

>>> temperatures_in_Celsius = list(C)

>>> print(temperatures_in_Celsius)
[36.5, 37.00000000000001, 37.5, 38.00000000000001, 39.0]


In the example above we haven’t used lambda. When using lambda, we do not need to define and name the functions fahrenheit() and celsius(). You can see this in the following interactive session:

>>> C = [39.2, 36.5, 37.3, 38, 37.8]

>>> F = list(map(lambda x: (float(9)/5)*x + 32, C))

>>> print(F)

[102.56, 97.7, 99.14, 100.4, 100.03999999999999]

>>> C = list(map(lambda x: (float(5)/9)*(x-32), F))

>>> print(C)

[39.2, 36.5, 37.300000000000004, 38.00000000000001, 37.8]


map() can be applied to more than one list.

The lists don’t have to have the same length.

map() will apply its lambda function to the elements of the argument lists (i.e., it first applies to the elements with the 0th index, then to the elements with the 1st index until the n-th index is reached). See the following for an illustration example:

>>> a = [1, 2, 3, 4]
>>> b = [17, 12, 11, 10]
>>> c = [-1, -4, 5, 9]
>>> list(map(lambda x, y : x+y, a, b))
[18, 14, 14, 14]
>>> list(map(lambda x, y, z : x+y+z, a, b, c))
[17, 10, 19, 23]
>>> list(map(lambda x, y, z : 2.5*x + 2*y - z, a, b, c))
[37.5, 33.0, 24.5, 21.0]

We can see in the example above that the parameter x gets its values from the list a, while y gets its values from b, and z from list c. 

If one list has less elements than the others, map() will stop when the shortest list has been completed the mapping:

>>> a = [1, 2, 3]
>>> b = [17, 12, 11, 10]
>>> c = [-1, -4, 5, 9]
>>> list(map(lambda x, y, z : 2.5*x + 2*y - z, a, b, c))
[37.5, 33.0, 24.5]


  • The filter function 

filter(function, sequence) 

offers an elegant way to filter out all the elements of a sequence “sequence”, according to the return value of the function function (i.e., if the function returns True it will be kept in the returned iterator object of the filter function). 

In other words: The function filter(f,l) needs a function f as its first argument. f has to return a Boolean value (i.e. either True or False). This function will be applied to every element of the list l. Only if f returns True will the element be produced by the iterator — which is the return value of filter function. 

In the following example, we filter out first the odd and then the even elements of the sequence of the first 10 Fibonacci numbers: 

>>> fibonacci = [0,1,1,2,3,5,8,13,21,34]
>>> odd_numbers = list(filter(lambda x: x % 2, fibonacci))
>>> print(odd_numbers)
[1, 1, 3, 5, 13, 21]
>>> even_numbers = list(filter(lambda x: x % 2 == 0, fibonacci))
>>> print(even_numbers)
[0, 2, 8]

  • An example of combining filter() and lambda functions
>>> filter(lambda x: x % 2 == 0, list(range(10,100)))

<filter object at 0x106d1c208>

#this will return all even number between 10 and 100.
>>> list(filter(lambda x: x % 2 == 0, list(range(10,100))))

[10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]



  • The reduce() function 

reduce() had been dropped from the core of Python when migrating to Python 3. It was moved into the module functools.

reduce(func, seq) 

continually applies the function func() to the sequence seq. It returns a single value. 

If seq = [ s1, s2, s3, … , sn ], calling reduce(func, seq) works like this:

  • the first two elements of seq will be applied to func, i.e. func(s1,s2). The list on which reduce() applied to looks like this now: [ func(s1, s2), s3, … , sn ]
  • Then,  func will be applied on the previous result and the third element of the list, that is, func(func(s1, s2),s3)
    The list now looks like this: [ func(func(s1, s2),s3), … , sn ]
  • repeat the steps until just one element is left and return this element as the result of reduce() function.

If n is equal to 4 the previous explanation can be illustrated like this: Reduce

The following  simple example illustrates how reduce() works. 

>>> import functools
>>> functools.reduce(lambda x,y: x+y, [47,11,42,13])

The following diagram shows the intermediate steps of the calculation: 


See below for some examples of using reduce() function.


#get maximum number from a list using reduce():

>>> from functools import reduce
>>> f = lambda a,b: a if (a > b) else b
>>> reduce(f, [47,11,42,102,13])
# Calculating the sum of the numbers from 1 to 100:
>>> from functools import reduce
>>> reduce(lambda x, y: x+y, range(1,101))

It’s very straightforward to change the previous example to calculate the product (the factorial) from 1 to a number. We just need to change the “+” into “*”:

>>> reduce(lambda x, y: x*y, range(1,5))






Print multiple variables in Python3

This post introduces several ways to print multiple arguments in python 3.

  • Pass it as a tuple:
print("The cost for %s is %s" % (name, cost))
  • Pass it as a dictionary:
print("The cost for %(n)s is %(c)s" % {'n': name, 'c': cost})
  • Use the new-style string formatting:
print("the cost for {} is {}".format(name, cost))
  • Use the new-style string formatting with numbers (useful for reordering or printing the same one multiple times):
print("The cost for {0} is {1}".format(name, cost))
  • Use the new-style string formatting with explicit names:
print("The cost for {n} is {c}".format(n=name, c=cost))
  • Pass the values as parameters and print will do it:
print("The cost for", name, "is", cost)

If you don’t want spaces to be inserted automatically by print in the above example, change the sep parameter:

print("The cost for ", name, " is ", cost, sep='')
  • Use string concatenation
print("The cost for " + name + " is " + cost)

NOTE: If cost  is an int, then, you should convert it to str:

print("The cost for " + name + " is " + str(cost))
  • Note that %s mentioned above can be replace by %d or %f.

If cost is a number, then

print("The cost for %s is %d" % (name, cost))

If cost is a string, then

print("The cost for %s is %s" % (name, cost))

If cost is a number, then it’s %d, if it’s a string, then it’s %s, if cost is a float, then it’s %f

  • Use the new f-string formatting in Python 3.6:
print(f'The cost for {name} is {cost}')

Parallel Programming using MPI in Python

This post introduces Parallel Programming using MPI in Python.

The library is mpi4py (MPI and python extensions of MPI), see here for its code repo on bitbucket.

Laurent Duchesne provides an excellent step-by-step guide for parallelizing your Python code using multiple processors and MPI. Craig Finch has a more practical example for high throughput MPI on GitHub. See here for more mpi4py examples from Craig Finch.

An example of TensorFlow using MPI can be found here.



Overcoming frustration: Correctly using unicode in python2

>>> string = unicode(raw_input(), 'utf8')
>>> log = open('/var/tmp/debug.log', 'w')
>>> log.write(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

Okay, this is simple enough to solve: Just convert to a byte str and we’re all set:

>>> string = unicode(raw_input(), 'utf8')
>>> string_for_output = string.encode('utf8', 'replace')
>>> log = open('/var/tmp/debug.log', 'w')
>>> log.write(string_for_output)

Deal exclusively with unicode objects as much as possible by decoding things to unicode objects when you first get them and encoding them as necessary on the way out.

If your string is actually a unicode object, you’ll need to convert it to a unicode-encoded string object before writing it to a file:

(if you are not sure what is the type of your string. use type(your string) to check it, if it is something looks like u ‘….’, it is a unicode string.)

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')

When you read that file again, you’ll get a unicode-encoded string that you can decode to a unicode object:

f = file('test', 'r')


Image format conversion and change of image quality and size using Python

The code to read tif files in a folder and convert them into jpeg automatically (with the same or reduced quality, and with reduced size).

See here for a pretty good handbook of Python Imaging Library (PIL).

import os
from PIL import Image

current_path = os.getcwd()
for root, dirs, files in os.walk(current_path, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        #if os.path.splitext(os.path.join(root, name))[1].lower() == ".tiff":
        if os.path.splitext(os.path.join(root, name))[1].lower() == ".tif":
            if os.path.isfile(os.path.splitext(os.path.join(root, name))[0] + ".jpg"):
                print "A jpeg file already exists for %s" % name
            # If a jpeg with the name does *NOT* exist, covert one from the tif.
                outputfile = os.path.splitext(os.path.join(root, name))[0] + ".jpg"
                    im =, name))
                    print "Converting jpeg for %s" % name
          , "JPEG", quality=100)
                except Exception, e:
                    print e

The above code will covert tif files to a jpg file with the same or reduced (change quality number less then 100 to reduce the quality) quality (file size), but the resulted image will keep the same size (the height and width of the image).

To covert tif file to a reduce size, use attribute size and method resize  provided by PIL.

Note that in resize() method, be sure to use ANTIALIAS filter (ANTIALIAS  is a high-quality downsampling filter) unless speed is much more important than quality. The bilinear and bicubic filters in the current version of PIL are not well-suited for large downsampling ratios (e.g. when creating thumbnails).

im ="my_image.jpg")
size =im.size   # get the size of the input image
ratio = 0.9  # reduced the size to 90% of the input image
reduced_size = int(size[0] * ratio), int(size[1] * ratio)     

im_resized = im.resize(reduced_size, Image.ANTIALIAS)"my_image_resized.jpg", "JPEG"), "JPEG", quality=100) # uncomment this line if you want to reduce image size without quality loss., "JPEG", quality=100, optimize=True) # uncomment this line if you want to optimize the result image.

Code snippet to save as different dpi

from PIL import Image 
im ="test.jpg")"test_600.jpg", dpi=(600,600) )

Referenced materials:

The Image Module (resize, size, save)

os.path document

Using Python to Reduce JPEG and PNG Image File Sizes Without Loss of Quality (Jan 3, 2015)

Install gensim and nltk into Virtualenv

This post introduces how to install gensim and nltk into a virtualenv. It is always a good strategy to install some package(s)/library(s) you often use (together) into a separate virtualenv, so it will not be interrupted by other libraries (because different libraries may depend on different versions of another libraries).

$ sudo apt-get install python-pip python-dev python-virtualenv
  • Create a Virtualenv environment in the directory for python  ~/gensim-venv:

Note that you can setup your virtualenv in any directory you want — just replace the ~/gensim-venv with your preferred directory path. But be sure to change it accordingly when you follow the rest of the tutorial.

$ virtualenv --system-site-packages -p python ~/gensim-venv # for python 

$ virtualenv --system-site-packages -p python3 ~/gensim-venv3 # for python 3, it is better to add "venv3" when naming your virtualenv, so you know it is for python3

The --system-site-packages Option

If you build with virtualenv --system-site-packages ENV, your virtual environment will inherit packages from /usr/lib/python2.7/site-packages(or wherever your global site-packages directory is).

This can be used if you have control over the global site-packages directory, and you want to depend on the packages there. If you want isolation from the global system, do not use this flag (and note that if you do not use the “system-site-packages” flag, you NEED to install the version of the python you need in the virtualenv that you just created)

  • Activate the virtual environment:
$ source ~/gensim-venv/bin/activate  # If using bash
(gensim-venv)$  # Your prompt should change
  • Install gensim in the virtualenv for python:
pip install --upgrade gensim
  • After the install you will activate the Virtualenv environment each time you want to use gensim.
  • With the Virtualenv environment activated, you can now test your gensim installation.
  •  When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate

$  # Your prompt should change back

To use gensim later you will have to activate the Virtualenv environment again:

$ source ~/gensim-venv/bin/activate  # If using bash.

(gensim-venv)$  # Your prompt should change.
# Run Python programs that use gensim.
# When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate
  • To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf gensim-venv.)
  • To install nltk into the virtualenv gensim-venv, just issue this command after you activated the gensim-venv
(gensim-venv)$ pip install -U nltk

Note: you may notice that those commands issued inside a virtualenv (i.e., after you create a virtualenv and activated it), you do not need to add sudo before the commands. Because all those commands will all be confined within that virtualenv folder you created – that is why  it will not affect other packages installed in other virtualenv or system-wide, and it will not be affected by other packages installed in another virtualenv(s). You can create as many as virtualenv you want, so please naming your virtualenv with meaning name, otherwise, over time, we will forget what was actually being installed in some virtualenv:)


Books and Courses and Tutorials: Python

This page provides some useful resources (books and courses) with Python. (See this post for text processing related resources, and this post for a collection of python packages and related resources)

(I have created a page for this topic, so stop editing this post, BUT KEEP this post, because not every book on this page has been moved to the page.)