Essential (Python) Resources to get on the Data Science “express train”

(Stay tuned, as I keep updating this page while I grow and plow in my deep learning garden:))

If you do not have any experience with machine learning or deep learning, check out those set of cheatsheets on the topics here  (it has website version as well for better readability).

As I stated in this Data Science Venn Diagram post, “It would be better to think of data science not as a new domain of knowledge to learn, but as a new set of skills that you can apply within your current area of expertise”, so people in different areas of expertise can apply Data Science to their current domains of expertise.

That said, this page provides some essential (Python) Resources that I have been collecting to help folks in different areas to get on the Data Science “express train”.

  • If you’ve already had some basic Python programming background, perhaps the best resources to get on Data Science (I have seen so far) is the following:

Python Data Science Handbook: Essential Tools for Working with Data (by Jake VanderPlas)

We are lucky enough — we can read this great book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/

You can find its accompanying GitHub repository at here.

(Thanks for the author Jake VanderPlas ‘s generosity:))

Note: The book is using Python 3, which is a good thing, because the support of Python 2.x will end in 2020.

Two great (Amazon) reviews about this book:

This is an excellent reference book for people working with data science. Remember, 80% of the effort in machine learning, data analysis or data science in general is about processing data and understanding data. This book is for that purpose and I think it’s the best book out there about data processing, analysis and visualization using python. If you are looking for hardcore machine learning, go for other books. (on June 9, 2017)

When I first received this book, I was surprised that it didn’t get to scikit-learn until the last third of the book. The first third is about numpy and pandas, and the middle third is about matplotlib. Now that I’ve been applying it at work, however, I’ve found that the items covered in the first two thirds were really essential. I wouldn’t be nearly as productive if I had just jumped straight to the sections on scikit-learn. The author does an excellent job covering broad terrain with enough detail that you are able to apply it to your problems. You will find yourself going back to use this book as a reference. (on August 5, 2017)

===See below for some other (online) books in relation to Python and Scientific Computing.

Note: If you do not know which version (i.e., Python 2.x or Python 3.x) of python to use, I encourage you to start with Python 3.x, because the support of Python 2.x will end in 2020.