Saving IPython/Jupyter notebook as PDF on Ubuntu

When you would like to save your Jupyter notebook as a PDF file, and you encouter the following problems on Ubuntu OS. This post is for you.

The solution:

XeLatex is a part of texlive-xetex package.

To install on Ubuntu, run the following command: 

$ sudo apt-get install texlive-xetex

Now you can download  your ipynb file as PDF!

Import CSV using Pandas to Django models

This post introduces how to import CSV data using Pandas to Django models.

Python has a built-in csv library, but do not use that, it is not flexible for csv data that has both string and number based data. See the reasons below:

builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas. So, your ultimate answer is indeed stop using builtin csv import and start using pandas.

Do not import the data in csv file to Django models via row by row method– that is too slow.

Django (version > 1.4 )  provides  bulk_create as an object manager method which takes as input an array of objects created using the class constructor.

See my example code below:

import pandas as pd

df=pd.read_csv('test_csv.txt',sep=';')

#print(df)

row_iter = df.iterrows()

objs = [

    myClass_in_model(

        field_1 = row['Name'],

        field_2  = row['Description'],

        field_3  = row['Notes'],

        field_4  = row['Votes']

    )

    for index, row in row_iter

]

myClass_in_model.objects.bulk_create(objs)

#Note: myClass_in_model: the class (i.e., the table you want to populate data from csv) we defined in Django model.py
#Note: field_1 to filed_4 are the fields you defined in your Django model.

 

References:

Import csv data into django models

How to write a Pandas Dataframe to Django model

Django bulk_create function example

Changing strings to Floats in an imported .csv 

 

 

 

Install and use htop on Ubuntu 16.04 Desktop and Server

This post introduces an interactive tool for visually monitoring the memory and process usages of your Ubuntu 16.04 Desktop or Server in real time.

  • What is Htop?

Htop is an interactive system-monitor process-viewer and process-manager. It is designed as an alternative to the Unix program top. It shows a frequently updated list of the processes running on a computer, normally ordered by the amount of CPU usage. Unlike top, htop provides a full list of processes running, instead of the top resource-consuming processes. Htop uses color and gives visual information about processor, swap and memory status.

  • Install Htop on Ubuntu 16.04 LTS Desktop and Server

(This works on both an Ubuntu 16.04 Desktop and  Server.)

Installing htop package on Ubuntu 16.04 (Xenial Xerus) is as easy as running the following command on terminal:

Step 1. First make sure that all your system packages are up-to-date by running these following apt-get commands in the terminal.

$ sudo apt-get update

Step 2. Installing Htop. Install htop process monitoring tool using apt-get command:

$ sudo apt-get install htop
  • Use Htop to monitor your Ubuntu 16.04 LTS Desktop and Server in real-time

Now that htop is installed on your server you’ll want to start the program by running the following in a command prompt:

$ htop

This will open the program and you’ll see something similar to the following:

Leave this terminal open, you can use CTRL + ALT + T  to open another new terminal for your other work. Then htop will help you monitor your memory usage in real-time:)  Enjoy!

Compile and Run C/C++ Programs on Linux

This post provides instructions how to compile and run c/C++ code on Linux.

Check whether gcc is installed

The following commands will display the installation path and version of gcc compiler.

$ whereis gcc
$ which gcc
$ gcc -v

 

Compile And Run C/C++ Programs In Linux

Write your code/program in your favorite text editor . Use extension .c for C programs or .cpp for C++ programs.

Here is a simple “C” program.

$ cat hellow.c
#include <stdio.h>
int main()
{
   printf("hello world!");
   return 0;
}

To compile the program, run:

$ gcc hello.c -o hello1

Or,

$ g++ hello.c -o hello1

In the above example, we used C++ compiler to compile the program. To use C compiler instead, run:

$ cc hello.c -o hello1

If there is any syntax or semantic errors in your code/program, they will be displayed on the screen. You need to fix them first to proceed further.

If there is no error then the compiler will successfully generate an executable file named hello1 in the current working directory (where you put the source c/c++ file).

Now you can execute the program using the following command:

$ ./hello1

To compile multiple source files (e.g., source1 and source2) into executable, run:

$ gcc sourcecode1.c sourcecode2.c -o executable

To allow warnings, debug symbols in the output:

$ gcc sourcecode.c -Wall -Og -o executable

To compile the source code into Assembler instructions:

$ gcc -S sourcecode.c

To compile the source code without linking:

$ gcc -c sourcecode.c

The above command will create a executable called sourcecode.o.

If your program contains math functions:

$ gcc sourcecode.c -o executable -lm

For more details, refer the man pages.

$ man gcc

 

Quotes About Data and Information

This post provides some quotes about data and information.

“We are moving slowly into an era where big data is the starting point, not the end.” – Pearl Zhu, author of the “Digital Master” book series.

 

“We’re entering a new world in which data may be more important than software.” – Tim O’Reilly, founder, O’Reilly Media.

 

“Information is the oil of the 21st century, and analytics is the combustion engine.” – Peter Sondergaard, senior vice president, Gartner Research.
“Getting information off the internet is like taking a drink from a firehose.” – Mitchell Kapor, founder of Lotus Development Corporation and designer of Lotus 1-2-3, co-founder of the Electronic Frontier Foundation.
Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” – Mike Loukides, editor, O’Reilly Media.

 

“A data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product.” – Hillary Mason, founder, Fast Forward Labs.

 

“Think analytically, rigorously, and systematically about a business problem and come up with a solution that leverages the available data.” – Michael O’Connell, chief analytics officer, TIBCO.
“Errors using inadequate data are much less than those using no data at all.” – Charles Babbage, mathematician, engineer, inventor, and philosopher.
 “Without big data, you are blind and deaf in the middle of a freeway.” – Geoffrey Moore, management consultant and theorist.

 

Data are just summaries of thousands of stories–tell a few of those stories to help make the data meaningful.” – Chip and Dan Heath, authors of “Made to Stick” and “Switch.”
The goal is to turn data into information, and information into insight.” – Carly Fiorina, former executive, president, and chair of Hewlett-Packard Co.
You can have data without information, but you cannot have information without data.” – Daniel Keys Moran, an American computer programmer and science fiction writer.
When we have all data online it will be great for humanity. It is a prerequisite to solving many problems that humankind faces.” – Robert Cailliau, Belgian informatics engineer and computer scientist who, together with Tim Berners-Lee, developed the World Wide Web.
“Data is a precious thing and will last longer than the systems themselves.” Tim Berners-Lee, inventor of the World Wide Web.
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” Geoffrey Moore, author and consultant.
“Things get done only if the data we gather can inform and inspire those in a position to make [a] difference.” Mike Schmoker, former school administrator, English teacher and football coach, author.
“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” Jim Barksdale, former Netscape CEO
Passion provides purpose, but data drives decisions. 
 -- Andy Dunn

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. 

-- Alvin Toffler

“The world is one big data problem.” – Andrew McAfee, principal research scientist, MIT.
Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” – Aaron Levenstein, business professor at Baruch College.
“Big data will replace the need for 80% of all doctors.” – Vinod Khosla, co-founder of Sun Microsystems and founder of Khosla Ventures.
“Every company has big data in its future, and every company will eventually be in the data business.” – Thomas H. Davenport, American academic and author specializing in analytics, business process innovation, and knowledge management.
“Everything we do in the digital realm—from surfing the web to sending an email to conducting a credit card transaction to, yes, making a phone call—creates a data trail. And if that trail exists, chances are someone is using it—or will be soon enough.” – Douglas Rushkoff, author of “Throwing Rocks at the Google Bus.”

 

“You happily give Facebook terabytes of structured data about yourself, content with the implicit tradeoff that Facebook is going to give you a social service that makes your life better.” – John Battelle, founder, Wired magazine.

 

“It’s so cheap to store all data. It’s cheaper to keep it than to delete it. And that means people will change their behavior because they know anything they say online can be used against them in the future.”- Mikko Hypponen, security and privacy expert.

 

 

References

 

 

Install and run IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (Desktop and Remote Server)

This post introduces how to install IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (both local Desktop and remote server.)

Step 0: install virtualenv and setup virtualenv environment

If you have not installed virtualenv yet, you need to do so before proceed.

Check my post for more details about how to setup python virtual environment and why it is better to install python libraries in Python virtual environment.

  • Install pip and Virtualenv for python 2.x and python 3.x:
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev
  • Create a Virtualenv environment in the directory for python2.x and python 3.x:
#for python 2.x
virtualenv --system-site-packages -p python ~/ipy-jupyter-venv

# for python 3.x 
virtualenv --system-site-packages -p python3 ~/ipy-jupyter-venv3

(Note: To delete a virtual environment, just delete the corresponding folder.  For example, In our cases, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)

Step 1: Install IPython

Before installing IPython and Jupyter, be sure to activate your python virtual environment first.

# for python 2.x
$ source ~/ipy-jupyter-venv/bin/activate  # If using bash
(ipy-jupyter-venv)$  # Your prompt should change

# for python 3.x
$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash
(ipy-jupyter-venv3)$  # Your prompt should change

Use the following command to install IPython

#for python 2.x

(ipy-jupyter-venv) liping:~$ pip install ipython

#for python 3.x

(ipy-jupyter-venv3) liping:~$ pip3 install ipython

Step 2: Install Jupyter

Use the following command to install Jupyter Notebook

#for python 2.x 

(ipy-jupyter-venv) liping:~$ pip install jupyter

#for python 3.x 

(ipy-jupyter-venv3) liping:~$ pip3 install jupyter

Step 3: Test

#for python 2.x


(ipy-jupyter-venv) liping:~$ which python
/home/liping/ipy-jupyter-venv/bin/python


#for python 3.x
(ipy-jupyter-venv3) liping:~$ which python3

/home/liping/ipy-jupyter-venv3/bin/python3


#for python 2.x
(ipy-jupyter-venv) liping:~$ which ipython

/home/liping/ipy-jupyter-venv/bin/ipython

#for python 3.x
(ipy-jupyter-venv3) liping:~$ which ipython3

/home/liping/ipy-jupyter-venv3/bin/ipython3

#for python 2.x
(ipy-jupyter-venv) liping:~$ which jupyter-notebook

/home/liping/ipy-jupyter-venv/bin/jupyter-notebook

#for python 3.x
(ipy-jupyter-venv3) liping:~$ which jupyter-notebook

/home/liping/ipy-jupyter-venv3/bin/jupyter-notebook

Step 4: Add Kernel

The Jupyter Notebook and other frontends automatically ensure that the IPython kernel is available. However, if you want to use a kernel with a different version of Python, or in a virtualenv or conda environment, you’ll need to install that manually. 

We are using virutalenv, so we need to install IPython kernel in the virtualenv we created in Step 0 above.

(ipy-jupyter-venv) liping:~$  python -m ipykernel install --user --name myipy_jupter_env --display-name "ipy-jupyter-venv"

Installed kernelspec myipy_jupter_env in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env

(ipy-jupyter-venv3) liping:~$  python3 -m ipykernel install --user --name myipy_jupter_env3 --display-name "ipy-jupyter-venv3"

Installed kernelspec myipy_jupter_env3 in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env3

Step 5: Run Jupyter Notebook

#for python2.x

 (ipy-jupyter-venv) liping:~$ jupyter-notebook

#for python 3.x

 (ipy-jupyter-venv3) liping:~$ jupyter-notebook

If you are running Jupyter Notebook on a local Linux computer (not on a remote server), you can simply navigate to localhost:8888 to connect to Jupyter Notebook. If you are running Jupyter Notebook on a remote server, you will need to connect to the server using SSH tunneling as outlined in the Step 5-2 below.

At this point, you can keep the SSH connection open and keep Jupyter Notebook running or can exit the app and re-run it once you set up SSH tunneling. Let’s keep it simple and stop the Jupyter Notebook process. We will run it again once we have SSH tunneling working. To stop the Jupyter Notebook process, press CTRL+C, type Y, and hit ENTER to confirm. The following will be displayed:

[C 08:08:04.232 NotebookApp] Shutdown confirmed

[I 08:08:04.232 NotebookApp] Shutting down 0 kernels

Step 5-2: Connecting to a remote Server Using SSH Tunneling

This step is only for those who are connecting to a Jupyter Notebook installed on a remote server.

Now we will learn how to connect to the Jupyter Notebook web interface using SSH tunneling. Since Jupyter Notebook is running on a specific port on the remote server (such as :8888:8889 etc.), SSH tunneling enables you to connect to the remote server’s port securely.

Below I will describe how to create an SSH tunnel from  a Mac or Linux (Windows users can check step 4 introduced at here). Note the following instructions in this step refer to your local computer (not the remote server).

SSH Tunneling with a Mac or Linux

Open a new terminal window on your Mac or Linux.

Issue the following ssh command to start SSH tunneling:

$ ssh -L 8000:localhost:8888 your_server_username@your_server_ip

Note: The ssh command opens an SSH connection, but -L specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote server. This means that whatever is running on the second port number (i.e. 8888) on the remote server will appear on the first port number (i.e. 8000) on your local computer. You should change 8888 to the port which Jupyter Notebook is running on. Optionally change port 8000 to one of your choosing (for example, if 8000 is used by another process). Use a port greater or equal to 8000 (ie 80018002, etc.) to avoid using a port already in use by another process. 

If no error shows up after running the ssh -L command, now be sure to activate your python virtual environment first.

# for python 2.x 

$ source ~/ipy-jupyter-venv/bin/activate  # If using bash (ipy-jupyter-venv)$  # Your prompt should change 

# for python 3.x 

$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash (ipy-jupyter-venv3)$  # Your prompt should change

you can run Jupyter Notebook by issuing the following command:

#for python2.x  
(ipy-jupyter-venv) liping:~$ jupyter-notebook 

#for python 3.x  
(ipy-jupyter-venv3) liping:~$ jupyter-notebook

Now, from a web browser on your local machine, open the Jupyter Notebook web interface with http://localhost:8000 (or whatever port number you chose above when you ssh -L into your remote server).

Note: see the following instructions for how to change the startup folder of your Jupyter notebook.

  • In your terminal window
  • Enter the startup folder by typing cd /some_folder_name.
  • Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).

Step 6 — Using Jupyter Notebook

By this point you should have Jupyter Notebook running, and you should be connected to it using a web browser. Jupyter Notebook is very powerful and has many features. Below I will outline a few of the basic features to get you started using the notebook. Automatically, Jupyter Notebook will show all of the files and folders in the directory it is run from.

To create a new notebook file, select New > Python 3 or New > ipy-jupyter-venv3 from the top right pull-down menu (Note: this is the so called kernel we installed in Step 4 above):

This will open a notebook. We can now run Python code in the cell or change the cell to markdown. For example, change the first cell to accept Markdown by clicking Cell > Cell Type > Markdown from the top navigation bar. We can now write notes using Markdown and even include equations written in LaTeX by putting them between the $$ symbols. For example, type the following into the cell after changing it to markdown:

# Simple Equation

Let us now implement the following equation:
$$ y = x^2$$

where $x = 2$

To turn the markdown into rich text, press CTRL+ENTER, and the following should be the results:

Note: You can use the markdown cells to make notes and document your code.

Now Let’s implement that simple equation and print the result. Select Insert > Insert Cell Below to insert and cell and enter the following code:

#for python 2.x
x = 2
y = x*x
print y

#for pyton 3.x
x = 2
y = x*x
print (y)

To run the code, press CTRL+ENTER. The following should be the results:

You now can include python libraries and use the notebook as you would with any other Python development environment! 

You should be now able to write reproducible Python code and notes using markdown using Jupyter notebook running on a remote server. To get a quick tour of Jupyter notebook, select Help > User Interface Tour from the top navigation menu.

Happy learning and coding!

Step 7: Deactivate your virtualenv

Each time you would like to use iPython and Jupyter, you need to activate the virtual environment into which it installed, and when you are done using iPython and Jupyter, deactivate the environment.

# for python 2
(ipy-jupyter-venv)$ deactivate
$  # Your prompt should change back

#for python 3
(ipy-jupyter-venv3)$ deactivate
$  # Your prompt should change back

Note: To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)

References:

Intalling Jupyter in a virtualenv (pdf)

Running iPython cleanly inside a virtualenv (pdf)

Using a virtualenv in an IPython notebook

Installing the IPython kernel

How To Set Up a Jupyter Notebook to Run IPython on Ubuntu 16.04 (pdf)

 Running the Jupyter Notebook (Change Jupyter Notebook startup folder (OS X))

 

 

 

The Data Science Venn Diagram by Drew Conway

What is Data Science?

Data Science is a surprisingly hard definition to nail down, especially given the fact that how ubiquitous the term has become.

Vocal critics have variously dismissed the term as a superfluous label (after all, what science doesn’t involve data?)But, these critiques miss something important.

Data science, is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia. This cross-disciplinary piece is the key.

In VanderPlas’s opinion, the best existing definition of data science is illustrated by Drew Conway’s Data Science Venn Diagram (see the figure below), first published on Drew Conway’s blog in September 2010.

The Data Science Venn Diagram above captures the essence of what people mean when they say “data science”:

it is fundamentally an interdisciplinary subject. Data science comprises three distinct and overlapping areas:

the skills of a statistician who knows how to model and summarize (big) datasets;

the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and

the domain expertise — what we might think of as “classical” training in a subject — necessary both to formulate the right questions and to put their answers in context.

With this in mind, it would be better to think of data science not as a new domain of knowledge to learn, but as a new set of skills that you can apply within your current area of expertise.

(If you want to get started with your data science journey and apply it in your area of expertise, check out this page for some useful resources that I have collected for you.)

 

References and Further Reading List:

 

 

Printing data in nice format in python

This post introduces how to print data in a nice formatted way in python using the built-in module pprint.

The pprint module provides a capability to “pretty-print” arbitrary Python data structures in a well-formatted and more readable way.

>>> from pprint import pprint

>>> my_list = ["1","2","3","4"]

>>> print(my_list)

['1', '2', '3', '4']

>>> pprint(my_list)

['1', '2', '3', '4']

# You may wonder the output of pprint is not working, because it looks the same as the output of print.

# however, the output is entirely correct and expected. From the pprint module documentation:
# The formatted representation keeps objects on a single line if it can, and breaks them onto multiple lines if they don’t fit within the allowed width.

#You could set the width keyword argument to 1 to force every key-value pair being printed on a separate line: 

>>> pprint(my_list, width =1)

['1',

 '2',

 '3',

 '4']

>>> 

Note that if you use import pprint, instead of fromp pprint import pprint, use the following:

>>> import pprint 
>>> my_list = ["1","2","3","4"]  
>>> pprint.pprint(my_list)

see below for an example of printing json data.

>>> from pprint import pprint

>>> my_json = { "fruit": "Apple", "size": "Large", "color": "Red" }

>>> print(my_json)

{'color': 'Red', 'fruit': 'Apple', 'size': 'Large'}

>>> pprint(my_json)

{'color': 'Red', 'fruit': 'Apple', 'size': 'Large'}

>>> pprint(my_json, width =1)

{'color': 'Red',

 'fruit': 'Apple',

 'size': 'Large'}

>>> from pprint import pprint

>>> my_json = {'children': [], 'lastName': 'Smith', 'phoneNumbers': [{'number': '212 555-1234', 'type': 'home'}, {'number': '646 555-4567', 'type': 'office'}, {'number': '123 456-7890', 'type': 'mobile'}], 'address': {'city': 'New York', 'postalCode': '10021-3100', 'state': 'NY', 'streetAddress': '21 2nd Street'}, 'firstName': 'John', 'age': 27}

>>> print(my_json)

{'children': [], 'lastName': 'Smith', 'phoneNumbers': [{'number': '212 555-1234', 'type': 'home'}, {'number': '646 555-4567', 'type': 'office'}, {'number': '123 456-7890', 'type': 'mobile'}], 'address': {'city': 'New York', 'streetAddress': '21 2nd Street', 'state': 'NY', 'postalCode': '10021-3100'}, 'firstName': 'John', 'age': 27}

>>> pprint(my_json)

{'address': {'city': 'New York',

             'postalCode': '10021-3100',

             'state': 'NY',

             'streetAddress': '21 2nd Street'},

 'age': 27,

 'children': [],

 'firstName': 'John',

 'lastName': 'Smith',

 'phoneNumbers': [{'number': '212 555-1234', 'type': 'home'},

                  {'number': '646 555-4567', 'type': 'office'},

                  {'number': '123 456-7890', 'type': 'mobile'}]}

>>> 

As you can see, the output is now well formatted and more readable.

What we did is to import the pprint function of pprint module. And use pprint() function rather than the print function:)

List Node process and kill specific process if needed

This post provides instructions on how to list running nodeJS application, and  how to kill specific process.

  • List node process:
$ ps -e|grep node
  • If you want know, how many nodejs processes running, you can use the following command
$ ps -aef | grep node

It will give list of nodejs process with it’s project name. This is helpful when you are running multipe nodejs application & you want kill specific process for the specific project.

  • Kill  specific process using the following command:
$kill -9 XXXX

XXXX is the process number to be killed.

  • You can kill all node processes using 
$ pkill node

If all those kill process commands mentioned above don’t work for you, probably you will need to check if you were using any other packages to run your node process.

For example, if you were running your node process using PM2(a NPM package), the kill [processID] command indeed will disable the process but it will keep the port that you ran the nodejs app occupied. You will  need to go into PM2 and dump all node process to free up the port again.

 

You can use the following comment to check all running process, including node and java apps etc.

$ ps -ef

you will see a list with the headers as the following line:

UID   PID   PPID   C   STIME   TTY   TIME   CMD

 Among the line,

STIME means the time your app starts,

TIME   means total CPU usage of the corresponding process

and

CMD is the app’s name running (i.e., name of the process, including arguments, if any)

 

if you only want to list node process,

use the following command,

$ ps -ef | grep node

if you only want to list java process,

use the following command,

$ ps -ef | grep java

You got the idea…

-e and -f are options to the ps command, and pipes  (i.e., | ) take the output of one command and pass it as the input to another. Here is a full breakdown of this command:

ps – list processes

-e – show all processes, not just those belonging to the user

-f – show processes in full format (more detailed than default)

command 1 | command 2 – pass output of command 1 as input to command 2

grep find lines containing a pattern

processname – the pattern for grep to search for in the output of ps -ef

So, the following command means:

look for lines containing processname in a detailed overview/snapshot of all current processes, and display those lines

ps -ef | grep processname

 

Commonly used commands for Node.js (Ubuntu)

This post provides some commonly used commands for Node.js

  • Start a Node.js application
$node server.js

Note: After you start your node application, go to the host name address (e.g., example.com) of your server and then a semicolon, and then give port name. (For example, if it is running on port 8081, then type the server host name address:8081/) then you should see your node application web page.

  • Stop a Node.js application
$Ctrl + C