Image format conversion and change of image quality and size using Python

The code to read tif files in a folder and convert them into jpeg automatically (with the same or reduced quality, and with reduced size).

See here for a pretty good handbook of Python Imaging Library (PIL).

import os
from PIL import Image

current_path = os.getcwd()
for root, dirs, files in os.walk(current_path, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        #if os.path.splitext(os.path.join(root, name))[1].lower() == ".tiff":
        if os.path.splitext(os.path.join(root, name))[1].lower() == ".tif":
            if os.path.isfile(os.path.splitext(os.path.join(root, name))[0] + ".jpg"):
                print "A jpeg file already exists for %s" % name
            # If a jpeg with the name does *NOT* exist, covert one from the tif.
                outputfile = os.path.splitext(os.path.join(root, name))[0] + ".jpg"
                    im =, name))
                    print "Converting jpeg for %s" % name
          , "JPEG", quality=100)
                except Exception, e:
                    print e

The above code will covert tif files to a jpg file with the same or reduced (change quality number less then 100 to reduce the quality) quality (file size), but the resulted image will keep the same size (the height and width of the image).

To covert tif file to a reduce size, use attribute size and method resize  provided by PIL.

Note that in resize() method, be sure to use ANTIALIAS filter (ANTIALIAS  is a high-quality downsampling filter) unless speed is much more important than quality. The bilinear and bicubic filters in the current version of PIL are not well-suited for large downsampling ratios (e.g. when creating thumbnails).

im ="my_image.jpg")
size =im.size   # get the size of the input image
ratio = 0.9  # reduced the size to 90% of the input image
reduced_size = int(size[0] * ratio), int(size[1] * ratio)     

im_resized = im.resize(reduced_size, Image.ANTIALIAS)"my_image_resized.jpg", "JPEG"), "JPEG", quality=100) # uncomment this line if you want to reduce image size without quality loss., "JPEG", quality=100, optimize=True) # uncomment this line if you want to optimize the result image.

Code snippet to save as different dpi

from PIL import Image 
im ="test.jpg")"test_600.jpg", dpi=(600,600) )

Referenced materials:

The Image Module (resize, size, save)

os.path document

Using Python to Reduce JPEG and PNG Image File Sizes Without Loss of Quality (Jan 3, 2015)

Install gensim and nltk into Virtualenv

This post introduces how to install gensim and nltk into a virtualenv. It is always a good strategy to install some package(s)/library(s) you often use (together) into a separate virtualenv, so it will not be interrupted by other libraries (because different libraries may depend on different versions of another libraries).

$ sudo apt-get install python-pip python-dev python-virtualenv
  • Create a Virtualenv environment in the directory for python  ~/gensim-venv:

Note that you can setup your virtualenv in any directory you want — just replace the ~/gensim-venv with your preferred directory path. But be sure to change it accordingly when you follow the rest of the tutorial.

$ virtualenv --system-site-packages -p python ~/gensim-venv # for python 

$ virtualenv --system-site-packages -p python3 ~/gensim-venv3 # for python 3, it is better to add "venv3" when naming your virtualenv, so you know it is for python3

The --system-site-packages Option

If you build with virtualenv --system-site-packages ENV, your virtual environment will inherit packages from /usr/lib/python2.7/site-packages(or wherever your global site-packages directory is).

This can be used if you have control over the global site-packages directory, and you want to depend on the packages there. If you want isolation from the global system, do not use this flag (and note that if you do not use the “system-site-packages” flag, you NEED to install the version of the python you need in the virtualenv that you just created)

  • Activate the virtual environment:
$ source ~/gensim-venv/bin/activate  # If using bash
(gensim-venv)$  # Your prompt should change
  • Install gensim in the virtualenv for python:
pip install --upgrade gensim
  • After the install you will activate the Virtualenv environment each time you want to use gensim.
  • With the Virtualenv environment activated, you can now test your gensim installation.
  •  When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate

$  # Your prompt should change back

To use gensim later you will have to activate the Virtualenv environment again:

$ source ~/gensim-venv/bin/activate  # If using bash.

(gensim-venv)$  # Your prompt should change.
# Run Python programs that use gensim.
# When you are done using gensim, deactivate the environment.
(gensim-venv)$ deactivate
  • To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf gensim-venv.)
  • To install nltk into the virtualenv gensim-venv, just issue this command after you activated the gensim-venv
(gensim-venv)$ pip install -U nltk

Note: you may notice that those commands issued inside a virtualenv (i.e., after you create a virtualenv and activated it), you do not need to add sudo before the commands. Because all those commands will all be confined within that virtualenv folder you created – that is why  it will not affect other packages installed in other virtualenv or system-wide, and it will not be affected by other packages installed in another virtualenv(s). You can create as many as virtualenv you want, so please naming your virtualenv with meaning name, otherwise, over time, we will forget what was actually being installed in some virtualenv:)


TensorFlow: raise ValueError(“GraphDef cannot be larger than 2GB.”)

While I was using TensorFlow’s imageNet trained model to extract the last pooling layer’s features as representation vectors for a new dataset of images, it worked just fine for around 21 – 22 images but then crashed with the following error:

File ".../lib/python2.7/site-packages/tensorflow/python/framework/", line 2152, in _as_graph_def
     raise ValueError("GraphDef cannot be larger than 2GB.")
 ValueError: GraphDef cannot be larger than 2GB.

The reason caused the error: each call to run_inference_on_image() adds nodes to the same TensorFlow graph, which eventually exceeds the maximum size (i.e., 2 GB).

The efficient solution:

Modify run_inference_on_image() to run on multiple images. Call the instance of in a for loop that reads your image files. In this way, we will no longer need to reconstruct the entire model on each call, which will make processing each image much faster.

See the snippet below for some hints:

Rewrite the function run_inference_on_image()

I rename it to (and with one parameter) run_inference_on_multiple_images(path_to_your_image_files)

In the main function:

directory = os.path.dirname(os.getcwd()) + "/path-to-images/"

In the run_inference_on_multiple_images function

def run_inference_on_multiple_images(path_to_image_files):
  for filename in os.listdir(path_to_image_files):
    if filename.endswith(".jpg"):
    image = (path_to_image_files+ filename)
    with tf.Session() as sess:



Books and Courses and Tutorials: Python

This page provides some useful resources (books and courses) with Python. (See this post for text processing related resources, and this post for a collection of python packages and related resources)


A collection of useful Python packages and related resources

This page provides some useful Python packages and related resources. (See this post for how to install them on Ubuntu, and this post for books and courses resources about using Python. This post for text processing related resources. See this post for python libraries for NLP.)

Integrate the contents in this post to here, and then delete that post.

(Stay tuned, the list is growing over time)

Check this link (pdf) and organize the libraries to this post and to the post this post for NLP.

NumPy is a Python package which provides an array data structure that is widely used in the Python community to represent a two-dimensional table of data. Such a structure is useful in representing a text corpus as a table of document-term frequencies.

xarray (formerly xray) is an open source project and Python package that aims to bring the labeled data power of pandas to the physical sciences, by providing N-dimensional variants of the core pandas data structures.