(Stay tuned, as I keep updating this page while I grow and plow in my deep learning garden:))
As I stated in this Data Science Venn Diagram post, “It would be better to think of data science not as a new domain of knowledge to learn, but as a new set of skills that you can apply within your current area of expertise”, so people in different areas of expertise can apply Data Science to their current domains of expertise.
That said, this page provides some essential (Python) Resources that I have been collecting to help folks in different areas to get on the Data Science “express train”.
- If you’ve already had some basic Python programming background, perhaps the best resources to get on Data Science (I have seen so far) is the following:
Python Data Science Handbook: Essential Tools for Working with Data (by Jake VanderPlas)
We are lucky enough — we can read this great book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/
You can find its accompanying GitHub repository at here.
(Thanks for the author Jake VanderPlas ‘s generosity:))
Note: The book is using Python 3, which is a good thing, because the support of Python 2.x will end in 2020.
Two great (Amazon) reviews about this book:
This is an excellent reference book for people working with data science. Remember, 80% of the effort in machine learning, data analysis or data science in general is about processing data and understanding data. This book is for that purpose and I think it’s the best book out there about data processing, analysis and visualization using python. If you are looking for hardcore machine learning, go for other books. (on June 9, 2017)
When I first received this book, I was surprised that it didn’t get to scikit-learn until the last third of the book. The first third is about numpy and pandas, and the middle third is about matplotlib. Now that I’ve been applying it at work, however, I’ve found that the items covered in the first two thirds were really essential. I wouldn’t be nearly as productive if I had just jumped straight to the sections on scikit-learn. The author does an excellent job covering broad terrain with enough detail that you are able to apply it to your problems. You will find yourself going back to use this book as a reference. (on August 5, 2017)
- If you have some other programming background beyond Python, and you are looking for a guide to the Python language itself, A Whirlwind Tour of the Python Language would be a very good starting point. This short report provides a tour of the essential features of the Python language, aimed at data scientists who already are familiar with one or more other programming languages.
- If you have experience in another language and prefer learning by video, check out this video Python Programming — Learn Python in One Video (duration: 43mins, by Derek Banas). For sure, the goal of looking at this video is not to learn everything about Python and programming. Instead, the focus is on the intuition.
- If you have no any programming background before, and want to get started with Python, check out Python 3 Programming Introduction Tutorial, and then Intermediate Python Programming introduction. This free online book (Automate the Boring Stuff with Python) is an excellent source to get started with Python, too.
- And, of course, check out Python Graph Gallery.
- S. Raschka. Python Machine Learning. Packt Publishing Ltd., 2015 & 2017 (1st edition book code repository and info resource, 2nd edition book code repository and info resource).
===See below for some other (online) books in relation to Python and Scientific Computing.
Note: If you do not know which version (i.e., Python 2.x or Python 3.x) of python to use, I encourage you to start with Python 3.x, because the support of Python 2.x will end in 2020.
- An Introduction to Python (See here for all other topics from Cornell Virutal Workshop, pdf)
Python is a programming language designed with ease of programming and readable code as its foremost goals. Python has risen to prominence in scientific computing as the ideal tool for doing data conversions, scripting parameter studies, and in facilitating the scientific workflow. In this online course, a quick overview of the language is presented, along with a few tricks to maximize the utility of Python for engineering and science modeling.
- Python for High Performance (See here for all other topics from Cornell Virutal Workshop, pdf)
While Python is a scripting language, it has plenty of facilities for high performance computing. This article covers some of its features and libraries that are particularly helpful when moving scientific code to a large cluster resource. It also includes specific recipes for compilation and execution on the TACC clusters.
- Book: Python Programming for the Humanities by Folgert Karsdorp and Maarten van Gompel (e.g., Chapter 3 deals with preprocessing text.) — this book used Python 3.4
- Course: (TOC of) Computational Statistics in Python (e.g., Basics of Python, Working with text, Preprocessing text data)
- (Fundamental) Python 3 Programming Introduction Tutorial – basics (pythonprogramming.net)
- Python 3 Tutorial (python-course.eu)
- Intermediate Python Programming introduction (pythonprogramming.net)
- Python 2 Tutorial (python-course.eu)
- Advanced Topics (python-course.eu)
- Good website about Python resources
- pythonprogramming.net (Robotics tutorials, Data analysis tutorials, Tokenizing Words and Sentences with NLTK, machine learning tutorials)