Quotes About Data and Information

This post provides some quotes about data and information.

“We are moving slowly into an era where big data is the starting point, not the end.” – Pearl Zhu, author of the “Digital Master” book series.


“We’re entering a new world in which data may be more important than software.” – Tim O’Reilly, founder, O’Reilly Media.


“Information is the oil of the 21st century, and analytics is the combustion engine.” – Peter Sondergaard, senior vice president, Gartner Research.
“Getting information off the internet is like taking a drink from a firehose.” – Mitchell Kapor, founder of Lotus Development Corporation and designer of Lotus 1-2-3, co-founder of the Electronic Frontier Foundation.
Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.” – Mike Loukides, editor, O’Reilly Media.


“A data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product.” – Hillary Mason, founder, Fast Forward Labs.


“Think analytically, rigorously, and systematically about a business problem and come up with a solution that leverages the available data.” – Michael O’Connell, chief analytics officer, TIBCO.
“Errors using inadequate data are much less than those using no data at all.” – Charles Babbage, mathematician, engineer, inventor, and philosopher.
 “Without big data, you are blind and deaf in the middle of a freeway.” – Geoffrey Moore, management consultant and theorist.


Data are just summaries of thousands of stories–tell a few of those stories to help make the data meaningful.” – Chip and Dan Heath, authors of “Made to Stick” and “Switch.”
The goal is to turn data into information, and information into insight.” – Carly Fiorina, former executive, president, and chair of Hewlett-Packard Co.
You can have data without information, but you cannot have information without data.” – Daniel Keys Moran, an American computer programmer and science fiction writer.
When we have all data online it will be great for humanity. It is a prerequisite to solving many problems that humankind faces.” – Robert Cailliau, Belgian informatics engineer and computer scientist who, together with Tim Berners-Lee, developed the World Wide Web.
“Data is a precious thing and will last longer than the systems themselves.” Tim Berners-Lee, inventor of the World Wide Web.
“Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway.” Geoffrey Moore, author and consultant.
“Things get done only if the data we gather can inform and inspire those in a position to make [a] difference.” Mike Schmoker, former school administrator, English teacher and football coach, author.
“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” Jim Barksdale, former Netscape CEO
Passion provides purpose, but data drives decisions. 
 -- Andy Dunn

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. 

-- Alvin Toffler

“The world is one big data problem.” – Andrew McAfee, principal research scientist, MIT.
Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” – Aaron Levenstein, business professor at Baruch College.
“Big data will replace the need for 80% of all doctors.” – Vinod Khosla, co-founder of Sun Microsystems and founder of Khosla Ventures.
“Every company has big data in its future, and every company will eventually be in the data business.” – Thomas H. Davenport, American academic and author specializing in analytics, business process innovation, and knowledge management.
“Everything we do in the digital realm—from surfing the web to sending an email to conducting a credit card transaction to, yes, making a phone call—creates a data trail. And if that trail exists, chances are someone is using it—or will be soon enough.” – Douglas Rushkoff, author of “Throwing Rocks at the Google Bus.”


“You happily give Facebook terabytes of structured data about yourself, content with the implicit tradeoff that Facebook is going to give you a social service that makes your life better.” – John Battelle, founder, Wired magazine.


“It’s so cheap to store all data. It’s cheaper to keep it than to delete it. And that means people will change their behavior because they know anything they say online can be used against them in the future.”- Mikko Hypponen, security and privacy expert.






Install and run IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (Desktop and Remote Server)

This post introduces how to install IPython and Jupyter Notebook in virtualenv on Ubuntu 16.04 (both local Desktop and remote server.)

Step 0: install virtualenv and setup virtualenv environment

If you have not installed virtualenv yet, you need to do so before proceed.

Check my post for more details about how to setup python virtual environment and why it is better to install python libraries in Python virtual environment.

  • Install pip and Virtualenv for python 2.x and python 3.x:
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev
  • Create a Virtualenv environment in the directory for python2.x and python 3.x:
#for python 2.x
virtualenv --system-site-packages -p python ~/ipy-jupyter-venv

# for python 3.x 
virtualenv --system-site-packages -p python3 ~/ipy-jupyter-venv3

(Note: To delete a virtual environment, just delete the corresponding folder.  For example, In our cases, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)

Step 1: Install IPython

Before installing IPython and Jupyter, be sure to activate your python virtual environment first.

# for python 2.x
$ source ~/ipy-jupyter-venv/bin/activate  # If using bash
(ipy-jupyter-venv)$  # Your prompt should change

# for python 3.x
$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash
(ipy-jupyter-venv3)$  # Your prompt should change

Use the following command to install IPython

#for python 2.x

(ipy-jupyter-venv) liping:~$ pip install ipython

#for python 3.x

(ipy-jupyter-venv3) liping:~$ pip3 install ipython

Step 2: Install Jupyter

Use the following command to install Jupyter Notebook

#for python 2.x 

(ipy-jupyter-venv) liping:~$ pip install jupyter

#for python 3.x 

(ipy-jupyter-venv3) liping:~$ pip3 install jupyter

Step 3: Test

#for python 2.x

(ipy-jupyter-venv) liping:~$ which python

#for python 3.x
(ipy-jupyter-venv3) liping:~$ which python3


#for python 2.x
(ipy-jupyter-venv) liping:~$ which ipython


#for python 3.x
(ipy-jupyter-venv3) liping:~$ which ipython3


#for python 2.x
(ipy-jupyter-venv) liping:~$ which jupyter-notebook


#for python 3.x
(ipy-jupyter-venv3) liping:~$ which jupyter-notebook


Step 4: Add Kernel

The Jupyter Notebook and other frontends automatically ensure that the IPython kernel is available. However, if you want to use a kernel with a different version of Python, or in a virtualenv or conda environment, you’ll need to install that manually. 

We are using virutalenv, so we need to install IPython kernel in the virtualenv we created in Step 0 above.

(ipy-jupyter-venv) liping:~$  python -m ipykernel install --user --name myipy_jupter_env --display-name "ipy-jupyter-venv"

Installed kernelspec myipy_jupter_env in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env

(ipy-jupyter-venv3) liping:~$  python3 -m ipykernel install --user --name myipy_jupter_env3 --display-name "ipy-jupyter-venv3"

Installed kernelspec myipy_jupter_env3 in /home/liping/.local/share/jupyter/kernels/myipy_jupter_env3

Step 5: Run Jupyter Notebook

#for python2.x

 (ipy-jupyter-venv) liping:~$ jupyter-notebook

#for python 3.x

 (ipy-jupyter-venv3) liping:~$ jupyter-notebook

If you are running Jupyter Notebook on a local Linux computer (not on a remote server), you can simply navigate to localhost:8888 to connect to Jupyter Notebook. If you are running Jupyter Notebook on a remote server, you will need to connect to the server using SSH tunneling as outlined in the Step 5-2 below.

At this point, you can keep the SSH connection open and keep Jupyter Notebook running or can exit the app and re-run it once you set up SSH tunneling. Let’s keep it simple and stop the Jupyter Notebook process. We will run it again once we have SSH tunneling working. To stop the Jupyter Notebook process, press CTRL+C, type Y, and hit ENTER to confirm. The following will be displayed:

[C 08:08:04.232 NotebookApp] Shutdown confirmed

[I 08:08:04.232 NotebookApp] Shutting down 0 kernels

Step 5-2: Connecting to a remote Server Using SSH Tunneling

This step is only for those who are connecting to a Jupyter Notebook installed on a remote server.

Now we will learn how to connect to the Jupyter Notebook web interface using SSH tunneling. Since Jupyter Notebook is running on a specific port on the remote server (such as :8888:8889 etc.), SSH tunneling enables you to connect to the remote server’s port securely.

Below I will describe how to create an SSH tunnel from  a Mac or Linux (Windows users can check step 4 introduced at here). Note the following instructions in this step refer to your local computer (not the remote server).

SSH Tunneling with a Mac or Linux

Open a new terminal window on your Mac or Linux.

Issue the following ssh command to start SSH tunneling:

$ ssh -L 8000:localhost:8888 your_server_username@your_server_ip

Note: The ssh command opens an SSH connection, but -L specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote server. This means that whatever is running on the second port number (i.e. 8888) on the remote server will appear on the first port number (i.e. 8000) on your local computer. You should change 8888 to the port which Jupyter Notebook is running on. Optionally change port 8000 to one of your choosing (for example, if 8000 is used by another process). Use a port greater or equal to 8000 (ie 80018002, etc.) to avoid using a port already in use by another process. 

If no error shows up after running the ssh -L command, now be sure to activate your python virtual environment first.

# for python 2.x 

$ source ~/ipy-jupyter-venv/bin/activate  # If using bash (ipy-jupyter-venv)$  # Your prompt should change 

# for python 3.x 

$ source ~/ipy-jupyter-venv3/bin/activate  # If using bash (ipy-jupyter-venv3)$  # Your prompt should change

you can run Jupyter Notebook by issuing the following command:

#for python2.x  
(ipy-jupyter-venv) liping:~$ jupyter-notebook 

#for python 3.x  
(ipy-jupyter-venv3) liping:~$ jupyter-notebook

Now, from a web browser on your local machine, open the Jupyter Notebook web interface with http://localhost:8000 (or whatever port number you chose above when you ssh -L into your remote server).

Note: see the following instructions for how to change the startup folder of your Jupyter notebook.

  • In your terminal window
  • Enter the startup folder by typing cd /some_folder_name.
  • Type jupyter notebook to launch the Jupyter Notebook App (it will appear in a new browser window or tab).

Step 6 — Using Jupyter Notebook

By this point you should have Jupyter Notebook running, and you should be connected to it using a web browser. Jupyter Notebook is very powerful and has many features. Below I will outline a few of the basic features to get you started using the notebook. Automatically, Jupyter Notebook will show all of the files and folders in the directory it is run from.

To create a new notebook file, select New > Python 3 or New > ipy-jupyter-venv3 from the top right pull-down menu (Note: this is the so called kernel we installed in Step 4 above):

This will open a notebook. We can now run Python code in the cell or change the cell to markdown. For example, change the first cell to accept Markdown by clicking Cell > Cell Type > Markdown from the top navigation bar. We can now write notes using Markdown and even include equations written in LaTeX by putting them between the $$ symbols. For example, type the following into the cell after changing it to markdown:

# Simple Equation

Let us now implement the following equation:
$$ y = x^2$$

where $x = 2$

To turn the markdown into rich text, press CTRL+ENTER, and the following should be the results:

Note: You can use the markdown cells to make notes and document your code.

Now Let’s implement that simple equation and print the result. Select Insert > Insert Cell Below to insert and cell and enter the following code:

#for python 2.x
x = 2
y = x*x
print y

#for pyton 3.x
x = 2
y = x*x
print (y)

To run the code, press CTRL+ENTER. The following should be the results:

You now can include python libraries and use the notebook as you would with any other Python development environment! 

You should be now able to write reproducible Python code and notes using markdown using Jupyter notebook running on a remote server. To get a quick tour of Jupyter notebook, select Help > User Interface Tour from the top navigation menu.

Happy learning and coding!

Step 7: Deactivate your virtualenv

Each time you would like to use iPython and Jupyter, you need to activate the virtual environment into which it installed, and when you are done using iPython and Jupyter, deactivate the environment.

# for python 2
(ipy-jupyter-venv)$ deactivate
$  # Your prompt should change back

#for python 3
(ipy-jupyter-venv3)$ deactivate
$  # Your prompt should change back

Note: To delete a virtual environment, just delete its folder. (In this case, it would be rm -rf ipy-jupyter-venv or rm -rf ipy-jupyter-venv3.)


Intalling Jupyter in a virtualenv (pdf)

Running iPython cleanly inside a virtualenv (pdf)

Using a virtualenv in an IPython notebook

Installing the IPython kernel

How To Set Up a Jupyter Notebook to Run IPython on Ubuntu 16.04 (pdf)

 Running the Jupyter Notebook (Change Jupyter Notebook startup folder (OS X))




The Data Science Venn Diagram by Drew Conway

What is Data Science?

Data Science is a surprisingly hard definition to nail down, especially given the fact that how ubiquitous the term has become.

Vocal critics have variously dismissed the term as a superfluous label (after all, what science doesn’t involve data?)But, these critiques miss something important.

Data science, is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia. This cross-disciplinary piece is the key.

In VanderPlas’s opinion, the best existing definition of data science is illustrated by Drew Conway’s Data Science Venn Diagram (see the figure below), first published on Drew Conway’s blog in September 2010.

The Data Science Venn Diagram above captures the essence of what people mean when they say “data science”:

it is fundamentally an interdisciplinary subject. Data science comprises three distinct and overlapping areas:

the skills of a statistician who knows how to model and summarize (big) datasets;

the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and

the domain expertise — what we might think of as “classical” training in a subject — necessary both to formulate the right questions and to put their answers in context.

With this in mind, it would be better to think of data science not as a new domain of knowledge to learn, but as a new set of skills that you can apply within your current area of expertise.

(If you want to get started with your data science journey and apply it in your area of expertise, check out this page for some useful resources that I have collected for you.)


References and Further Reading List: