Install Ubuntu 16.04 on Oracle VirtualBox that runs on Windows or Mac

This post provides some notes and useful resources about installing Ubuntu 16.04 on Oracle VirtualBox that runs on your Mac or Windows.

Note: check the RAM and hard disk size of your machine before creating a virtual machine on it.

Notes about which version of Ubuntu to download and install:

For Ubuntu, it is not always a wise choice to pick the newest version. My suggestion is that (unless you are aware that you need to install a particular version), download and install the latest LTS (Long Term Support) version (see the picture below from Ubuntu wiki page). Every two years, a Ubuntu LTS version is released, which will be supported for updates for five years. For example, as of now, Ubuntu 16.04 LTS is the latest LTS version.

The two main things you need to pay attention to when you create a virtual machine:

  • Memory allocation for your virtual machine.

You can set it as half of your RAM (e.g., if your RAM is 8 G, set it as 4 G or 5G for your virtual machine should be fine.)

  • Storage type:  Select “Dynamically allocated” if you are not sure how large storage you actually will need.

There are already several very good tutorials about this along with snapshots, so I won’t create a tutorial for this. See below for some useful resources I collected. (See some notes I wrote below for some posts.)

My notes: This one is very good (with snapshots), including  Guest additions and Shared folders settings. (Note that Guest additions are required if you want to set Shared folder, so be sure to install Guest additions first).

You can use the following command to check whether Guest additions were installed on your Ubuntu virtual machine if you are not sure because you installed your Ubuntu VM a while ago. (Note: even though you may find Guest additions was installed, you will still need to install Guest additions for your newly installed VM, otherwise the Shared folders wont work for you.)

Use lsmod from the command line, as it will tell you not only if it’s installed, but properly loaded:

$ lsmod | grep vboxguest
vboxguest             282624  6 vboxsf

I have tested Shared folders instructions (with pictures) in this tutorial on my Ubuntu 16.04 VM, and it works. The only difference is that on Ubuntu 16.04 VM, after you issued the following command on your terminal and  restart the Ubuntu guest machine, you do not need to do anything as the tutorial said, the shared folder is automatically mounted each time you start you Ubuntu VM. (After you restart, click the Files icon on the task bar, and you will see the shared folder you just set just now is automatically mounted there:))

  • sudo adduser brb vboxsf   # Replace 'brb' with your account name on Ubuntu. 

One more note: Although Shared Folder setting in VM is very convenient, using VirtualBox shared folder directly for fastq data, annotation or output directory can significantly reduce the performance compared to a native (Ubuntu) system or VirtualBox native system, so my recommendation is only use the folder to transfer files between windows/mac and your Ubuntu VM.

P.S. If you see some tutorials tell you that you need to enter some command like “sudo mount -t vboxsf sharing /mnt/share” to automatically mount the shared folder each time you start your Ubuntu VM, that is outdated instructions.

Fortunately, new VirtualBox version (4.x +) has a (GUI) Auto-mount option (see pics below) when you set your shared folder. (Note that you can choose your customized folder to share, instead of using a system predefined folder such as Documents or Downloads.)

If you want to share the clipboard between your host and your virtual machine, check out the picture below.

 

Answers to some frequently asked questions:

Q: Do I need to backup my files when I upgrade my VirtualBox to newer version.

A: just install the latest version and you will have all your files in the new one. You need not have to uninstall the old virtual machine.

Q: After I install the updates of Windows 10, my VirtualBox won’t start…

A: just install the latest version and you will have all your files in the new one. You need not have to uninstall the old virtual machine.

 

My notes:  this one is very good (with snapshots) on Mac. My notes above about VM settings running on Windows work the same for VM settings running on Mac.

 

Commonly used Linux commands (Ubuntu)

This page lists commonly used Linux commands to help those who are not very familiar with Linux command environment. I have been collecting and recording those from my experience.

I was once a beginner, so I can understand the pain for Linux beginners. I have not yet seen any post that has done comprehensive collection of commands on Linux, so I thought I could help this out. That is why you see this post. Here you go. Happy Learning!

You can see further reading list at the end of this post.

Note: Do not contain space in your filename or directory name, use underscore instead of space.

======Basic commands:

  • cd 

this command will goes back to the home directory of your account, no matter where your current directory is located in your terminal.

  • cd ../

this command will go to the parent directory of your current directory in terminal

  • rmdir 

remove/delete an empty folder.

example:

first cd into the parent folder of the empty folder to remove

rmdir test

  • rm -rf ./*

    first cd to a directory, and this command will empty all of things under the current directory

  • ls  

list all the files and folders under current path

  • ls -l  

list all the files and folders with details like dates.

  • ls -l -t

list files and folders ordered by time.

  • ls -ltrh 

list all the csv files under the current directory in long format by time and in reverse order, the file size in human readable format (e.g., in mb, or gb, instead of byte size)

  • -l List in long format. If the output is to a terminal, a total sum for all the file sizes is output on a line before the long listing.
  • -r Reverse the order of the sort to get reverse lexicographical order or the oldest entries first (or largest files last, if combined with sort by size.
  • -t Sort by time modified (most recently modified first) before sorting the operands by lexicographical order.
  • ls -ltrh *.csv

list all the csv files under the current directory in long format by time and in reverse order, the file size in human readable format (e.g., in mb, or gb, instead of byte size)

  • find

find -name ‘*.jpg’ -exec cp {} ./test/ \;

Find all jpg files  and then copy the found files to the folder test which is subfolder of current path.
Note that: the current path should be the path where the files to search are located. (i.e., use cd to locate to the directory where the files are in before type in the cmd below into terminal.)

  • rm -r -f

-r means recursive, it will remove folders and subfolders and files within the folders and subfolders

-f means force

  • mkdir [folder name]

create new folder

example:

mkdir image

  • cp [filename] [new filename]

copy and rename file

  • cp [filename] [path/to/new/lotcation/filename]

copy the file to another location

if you use this commond to copy a directory, you would meet this error:

cp: omitting directory ...

The error notice means you told cp to copy files and not directories. The warning is about cp finding a directory and informing you it will be skipped.

  • cp -r [directory] [path/to/new/location/directory]

copy a directory to another location.

cp -r means recursive and this option will make cp also include sub-directories.

If you meet permission denied error, add sudo before the command, and it will ask your password.

  • mv [directory] [path/to/new/location/directory]

If you meet permission denied error, add sudo before the command, and it will ask your password.

  • nano [new file name or /path/to/new file/new file name]

example:

nano  myexample   #it will create a new empty file named “myexample” under the current directory

  • nano [file name]

If the file name already exists, it will open the file and you can edit it.

Note: Ctrl+O to save the file, and then hit Enter, and then Ctrl +X to close the file.

 

 

======More advanced commands:

  • cd into directory without having permission

When cd into a directory and the following error occurs

bash: cd: your-dirctory: Permission denied

The solution is:

Enter super user mode, and cd into the directory that you are not permissioned to go into. Sudo requires administrator password.

sudo su  
cd directory  # you will notice that your prompt changes after your enter your root password. now you can cd to the directory.

# to exit "super user" mode, type exit.
  • lspci

check GPU information on Ubuntu

look for “VGA compatible controller:”…

  • sudo nvidia-smi

check GPU info and GPU usage.

  • sudo reboot -h now

reboot a server from terminal

  • sudo shutdown -h now

shut down a server from terminal

Note: If your Ubuntu Server 16.o4 LTS has Black Screen after reboot or shut down, try pressing (simultaneously) Ctrl + Alt + F2 to see whether you can switch to different console

  • vncserver -kill :1  

This is a vncserver command. It is used to kill a port of a GUI by VNC server, where 1 is the port you would like to kill.

  • vncviewer -via username@yourserver_hostname :1

connect to a server via vncviewer from a linux-based client. you need to change the port number 1 to yours.

echo is a built-in command in the bash and C shells that writes its arguments to standard output.

See here, and here, and herefor example usage of it.

  • cat

See here for example usage of cat command.

  • chmod

see here for example usage of chmod command.

 

  • check supercomputing Cluster’s Linux distribution and version

$ lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID:    RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 6.4 (Santiago)
Release:    6.4
Codename:    Santiago

  • show the list of top processes ordered by RAM and CPU  use in descendant form

(remove the pipeline and head if you want to see the full list):

$ ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

Brief explanation of the options used in the command above:

— The -o (or –format) option of ps allows us to specify the output format.

— the processes’ PIDs (pid), PPIDs (pid)

— the name of the executable file associated with the process (cmd), and

— the RAM and CPU utilization (%mem and %cpu, respectively).

We can use --sort to sort by either %mem or %cpu. By default, the output will be sorted in ascendant form, but usually we prefer to reverse that order by adding a minus sign in front of the sort criteria to make it list in descendant.

To add other fields to the output, or change the sort criteria, refer to the OUTPUT FORMAT CONTROL section in the man page of ps command.


======File Transfer: getting files to/from your account on a server

  • On Linux generally the command line scp command.
 Examples of using the command line are:

scp -p file_name username@yourserver_hostname:destination/directory

  • or for a full directory tree:

scp -pr dir_name username@yourserver_hostname:destination/directory

 

Note that if you want to transfer files from server to your client computer, just reverse the directory.

e.g., scp -pr username@yourserver_hostname:source/directory dir_name_on_your_client

 

======download files

  • wget (tool for downloading files)  (pdf)
  • See Linux wget command (pdf), which provides detailed and comprehensive different tags (options) to use with wget command.

======Save terminal output to a file

  • sudo command -option | tee logThis command will show output on terminal and save to a file at the same time.
  • Save terminal output to a fileredirect the output to a file: someCommand > someFile.txt Or if you want to append data: someCommand >> someFile.txt If you want stderr too use this: someCommand &> someFile.txt or this to append:  someCommand &>> someFile.txt
  • Tail -f log.txt

Python related commands:

  • enter python environment

type python in terminal, and it will show python 2.7 version info and also enter into python 2 environment

  • enter python 3 environment

python3 

and it will show python 3 version info (e.g., python 3.5.2) and also enter into python 3 environment.

 

======Git related commands

  • git clone the url to gitclone

for example:

first cd into the folder you want the models to be cloned to in your terminal, and then issue this command. it will clone the model foder from https://github.com/tensorflow/models under your current folder in your terminal.

git clone https://github.com/tensorflow/models

 

======Some useful shortcuts on linux

  • you can open multiple terminals

open each terminal by pressing Ctrl + Alt + T.

  • Shortcut to bring all open terminals to the front

After you bring one terminal window in the front, press Alt+~ to bring all other terminal windows in the front one by one:

  • CTRL + C − terminate the current command. 

======References and further reading list:

Linux and Unix top 10 command pages  (See here for links to more commands intro)

Below is a listing of the top 10 Unix command pages by the amount of times they have been accessed on the Computer Hope server.

  1. Linux and Unix tar command help
  2. Linux and Unix chmod command help
  3. Linux and Unix ls command help
  4. Linux and Unix find command help
  5. Information about the Linux and Unix grep command
  6. Linux and Unix cp command help
  7. Linux and Unix vi command help
  8. Linux and Unix ifconfig command help
  9. Linux and Unix date command help
  10. Linux and Unix kill command help

======TOC of the nice tutorial: Linux Shell Commands: A Tutorial Quick Reference for Desktop Users

Table of Contents

1. A Short Intro to the Command Line

This chapter will acquaint you with the basics of the command line. To maximize your learning, you should follow along by typing in the example commands given. Every major Linux distribution has a menu item called “shell”, “console”, “terminal” or the like, which will give you a window with a command line interface. In this book, I assume that readers work in a graphical desktop environment and use the Bash shell in a terminal window. Bash is the default shell in all major Linux distributions.

2. Getting Information

The commands presented in this chapter provide valuable information on the state and configuration of your system.

3. Managing Files and Directories

The command line offers you great flexibility in creating, copying, moving and editing files and directories, as this chapter shows.

  • cd (change directory)  (pdf)
  • chgrp (change group ownership)  (pdf)
  • chmod (change file permissions)  (pdf)
  • chown (change file ownership)  (pdf)
  • cp (copy files and directories)  (pdf)
  • dd (write data to devices)  (pdf)
  • find (search for files)  (pdf)
  • ln (make links between files)  (pdf)
  • locate (find files by name)  (pdf)
  • mkdir (create a directory)  (pdf)
  • mount (mount file systems)  (pdf)
  • mv (rename files)  (pdf)
  • rm (remove files or directories)  (pdf)
  • rmdir (remove empty directories)  (pdf)
  • shred (delete a file securely)  (pdf)
  • touch (change file timestamps)  (pdf)
  • umount (unmount file systems)  (pdf)

4. Managing Processes

Linux provides powerful tools for controlling the execution of your programs. Some of the most important tools are presented in this chapter.

  • disown (detach a job from the shell)   (pdf)
  • kill (terminate a process)  (pdf)
  • ps (list running processes)  (pdf)
  • pstree (display a tree of processes)  (pdf)
  • shutdown (halt or reboot the system)  (pdf)
  • sudo (execute a command as root)  (pdf)

5. Working with Text

Processing plain text files is a big strength of Linux. The commands presented in this chapter allow you to display particular parts of files (e.g. head, tail), reorder their contents (e.g. sort), carry out search/replace operations (e.g. grep, sed), and much more.

  • cat (concatenate and output files)  (pdf)
  • cut (output columns from files)  (pdf)
  • diff (show differences between files)  (pdf)
  • grep (print lines matching a pattern)  (pdf)
  • head (output the first part of files)  (pdf)
  • less (view file by pages)  (pdf)
  • pdftk (manipulate PDF files)  (pdf)
  • sed (search and replace text)  (pdf)
  • sort (sort lines of text files)  (pdf)
  • tail (output the last part of files)  (pdf)
  • wc (count lines, words and characters)  (pdf)

6. Being Productive

This chapter collects some commands that can help you accomplish everyday tasks quickly and efficiently. Many of the commands are faster or more reliable replacements for popular graphical applications. For example, wget can replace a graphical download manager.

  • alias (define command shortcuts)  (pdf)
  • alsamixer (audio mixer)  (pdf)
  • bc (command line calculator)  (pdf)
  • history (display command history)  (pdf)
  • rsync (fast, versatile file copying tool)  (pdf)
  • tar (Linux archiving utility)  (pdf)
  • unrar (extract files from RAR archives)  (pdf)
  • unzip (extract files from ZIP archives)  (pdf)
  • wget (tool for downloading files)  (pdf)
  • xmodmap (change key bindings)  (pdf)

======The end of the TOC of the nice tutorialLinux Shell Commands: A Tutorial Quick Reference for Desktop Users

======apt-get usages

======curl command examples

cURL can be used in many different and useful ways. Using cURL, we can download, upload and manage files, check email address, or even update status on some of the social media websites or check the weather outside.

cURL is very useful command line tool to transfer data from / to a server. cURL supports various protocols, including FILE, HTTP, HTTPS, IMAP, IMAPS, LDAP, DICT, LDAPS, TELNET, FTP, FTPS, GOPHER, RTMP, RTSP, SCP, SFTP, POP3, POP3S, SMB, SMBS, SMTP, SMTPS, and TFTP.

This tutorial covers five of the most useful and basic uses of cURL tool:

–Check URL

One of the most common and simplest uses of cURL is typing the command itself, followed by the URL you want to check

curl https://example.com
#This command will display the content of the URL on your terminal

–Save the output of the URL to a file

The output of the cURL command can be easily saved to a file by adding the -o option to the command, as shown below

curl -o website https://example.com
#the output will be save to a file named ‘website’ in the current working directory

–Download files with cURL

curl -O https://example.com/file.zip

# the -O option used for saving files to current working directory without renaming
# e.g.,  the ‘file.zip’ zip archive will be downloaded to the current working directory.
curl -o archive.zip https://domain.com/file.zip

# the ‘file.zip’ archive will be downloaded and saved as ‘archive.zip’.
curl -O https://domain.com/file.zip -O https://domain.com/file2.zip

# cURL can be also used to download multiple files simultaneously
#cURL can be also used to download files securely via SSH

curl -u user sftp://server.domain.com/path/to/file

# Note that the full path of the file to be downloaded is required

–Get HTTP header information from a website

You can easily get HTTP header information from any website you want by adding the -I option (capital ‘i’) to cURL.

curl -I http://example.com

–Access an FTP server

#  access your FTP server with cURL 
curl ftp://ftp.domain.com --user username:password

# cURL will connect to the FTP server and list all files and directories in user’s home directory
curl ftp://ftp.domain.com/file.zip --user username:password
# download a file via FTP using curl
curl -T file.zip ftp://ftp.domain.com/ --user username:password
# upload a file to  the FTP server

–check cURL manual page to see all available cURL options and functionalities

man curl

This post covers detailed and comprehensive explanation of different options to use with curl command.

  • $ free -m

Linux has the habit of caching lots of things for faster performance, so that memory can be freed and used if needed.

  • $ cat /proc/meminfo
  • $ vmstat -s

Great Apache Spark tutorial videos on YouTube

This post provides great Apache Spark video available on YouTube.

Sameer Farooqui delivers a hands-on tutorial using Spark SQL and DataFrames to retrieve insights and visualizations from datasets published by the City of San Francisco. [Topics Indexed Below]

The labs are targeted for an audience with some general programming or SQL query experience, but little to no experience with Spark. Sameer will begin with some brief theory and lecture on Spark, before diving into several demos performing visualizations and analysis on calls made to the San Francsico Fire Department on July 4th.

Follow Along:
+ Databricks Community Edition: https://databricks.com/try
+ Labs: https://bit.ly/sfopenlabs
+ Learning Material: https://bit.ly/sfopenreadalong

—–Jump to Topic—–
00:00:06 – Workshop Intro & Environment Setup
00:13:06 – Brief Intro to Spark
00:17:32 – Analysis Overview: SF Fire Department Calls for Service
00:23:22 – Analysis with PySpark DataFrames API
00:29:32 – Doing Date/Time Analysis
00:47:53 – Memory, Caching and Writing to Parquet
01:00:40 – SQL Queries
01:21:11 – Convert a Spark DataFrame to a Pandas DataFrame
—–Q & A—–
01:24:43 – Spark DataFrames vs. SQL: Pros and Cons?
01:26:57 – Workflow for Chaining Databricks notebooks into Pipeline?
01:30:27 – Is Spark 2.0 ready to use in production?

———————————————————————————————-
SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes
———————————————————————————————-
+ Programming for Spark 2.0 (3 days)
+ Spark 2.0 for Machine Learning & Data Science (3 days)
Learn more: https://newcircle.com/category/apache…

++Code for San Francisco++
http://www.meetup.com/Code-for-San-Fr…

++Learn more about Databricks++
https://databricks.com/product/databr…

All Apache Spark Courses from newcircle training:

Adam Breindel, lead Spark instructor at NewCircle, talks about which APIs to use for modern Spark with a series of brief technical explanations and demos that highlight best practices, latest APIs, and new features. (Topics Indexed Below)

We’ll look at how Dataset and DataFrame behave in Spark 2.0, Whole-Stage Code Generation, and go through a simple example of Spark 2.0 Structured Streaming (Streaming with DataFrames) that you can run in your own free instance of Databricks.

00:00:40 – Intro: What is “Modern Spark”
00:01:26 – DataFrame
00:05:07 – Why not use RDD?
00:09:15 – Intro to DataFrame and Dataset
00:10:13 – DataFrame versus Dataset
00:14:42 – Dataset Queries and Dataset with Scala classes
00:19:07 – Spark Query Optimizer
00:23:26 – Whole-Stage Codegen
00:27:21 – Hive integration
00:29:28 – Wrapping Up DataFrame/Dataset Benefits
00:30:54 – One More Thing – Structured Streaming
00:36:47 – Conclusion

Try the Examples:
+ Databricks Community Edition: https://databricks.com/try
+ Get this Notebook: https://bit.ly/get-notebook

———————————————————————————————-
SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes
———————————————————————————————-
+ Programming for Spark 2.0 (3 days):
http://bit.ly/spark-prog-newcircle

+ Spark 2.0 for Machine Learning & Data Science (3 days):
http://bit.ly/spark-ml-newcircle

 

“As Apache Spark becomes more widely adopted, we have focused on creating higher-level APIs that provide increased opportunities for automatic optimization. In this talk, I give an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark. I’ll focus on specific examples of how developers can build their analyses more quickly and efficiently simply by providing Spark with more information about what they are trying to accomplish.” – Michael

Slides: http://www.slideshare.net/databricks/…

Databricks Blog: “Deep Dive into Spark SQL’s Catalyst Optimizer”
https://databricks.com/blog/2015/04/1…

// About the Presenter //
Michael Armbrust is the lead developer of the Spark SQL project at Databricks. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.

Follow Michael on –
Twitter: https://twitter.com/michaelarmbrust
LinkedIn: https://www.linkedin.com/in/michaelar…