Regularization and Bias/Variance


in category: Machine Learning_tricks4better performance

Source: from the paper by Prof. Domingos :

Domingos, P. (2012). A few useful things to know about machine learningCommunications of the ACM55(10), 78-87. (PDF)


*****source for the content below: from machine-learning on Coursera by Dr. Andrew Ng 

See this post for how regularization can help prevent over-fitting. But how does it affect the bias and variances of a learning algorithm? This post will go deeper into the issue of bias and variances and talk about how it interacts with and is affected by the regularization of your learning algorithm. 

If lambda is small then we’re not using much regularization and we run a larger risk of over fitting whereas if lambda is large that is if we were on the right part of this horizontal axis then, with a large value of lambda, we run the higher risk of having a biased problem, so if you plot J train and J cv, what you find is that, for small values of lambda, you can fit the trading set relatively way cuz you’re not regularizing. So, for small values of lambda, the regularization term basically goes away, and you’re just minimizing pretty much just gray arrows. So when lambda is small, you end up with a small value for Jtrain, whereas if lambda is large, then you have a high bias problem, and you might not feel your training that well, so you end up the value up there. So Jtrain of theta will tend to increase when lambda increases, because a large value of lambda corresponds to high bias where you might not even fit your trainings that well, whereas a small value of lambda corresponds to, if you can really fit a very high degree polynomial to your data, let’s say. After the cost validation error we end up with a figure like this.



When I’m trying to pick the regularization parameter lambda for learning algorithm, often I find that plotting a figure like this one shown below helps me understand better what’s going on and helps me verify that I am indeed picking a good value for the regularization parameter monitor. 

Overfitting and Underfitting With Machine Learning Algorithms

source: from  (Good posts sometimes disappear, so I repost it here for my and for your information.)

in category: Machine Learning_tricks4better performance

The cause of poor performance in machine learning is either overfitting or underfitting the data.

In this post you will discover the concept of generalization in machine learning and the problems of overfitting and underfitting that go along with it.

Let’s get started.

Approximate a Target Function in Machine Learning

Supervised machine learning is best understood as approximating a target function (f) that maps input variables (X) to an output variable (Y).

Y = f(X)

This characterization describes the range of classification and prediction problems and the machine algorithms that can be used to address them.

An important consideration in learning the target function from the training data is how well the model generalizes to new data. Generalization is important because the data we collect is only a sample, it is incomplete and noisy.

Generalization in Machine Learning

In machine learning we describe the learning of the target function from training data as inductive learning.

Induction refers to learning general concepts from specific examples which is exactly the problem that supervised machine learning problems aim to solve. This is different from deduction that is the other way around and seeks to learn specific concepts from general rules.

Generalization refers to how well the concepts learned by a machine learning model apply to specific examples not seen by the model when it was learning.

The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen.

There is a terminology used in machine learning when we talk about how well a machine learning model learns and generalizes to new data, namely overfitting and underfitting.

Overfitting and underfitting are the two biggest causes for poor performance of machine learning algorithms.

Statistical Fit

In statistics a fit refers to how well you approximate a target function.

This is good terminology to use in machine learning, because supervised machine learning algorithms seek to approximate the unknown underlying mapping function for the output variables given the input variables.

Statistics often describe the goodness of fit which refers to measures used to estimate how well the approximation of the function matches the target function.

Some of these methods are useful in machine learning (e.g. calculating the residual errors), but some of these techniques assume we know the form of the target function we are approximating, which is not the case in machine learning.

If we knew the form of the target function, we would use it directly to make predictions, rather than trying to learn an approximation from samples of noisy training data.

Overfitting in Machine Learning

Overfitting refers to a model that models the training data too well.

Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance on the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the models ability to generalize.

Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when learning a target function. As such, many nonparametric machine learning algorithms also include parameters or techniques to limit and constrain how much detail the model learns.

For example, decision trees are a nonparametric machine learning algorithm that is very flexible and is subject to overfitting training data. This problem can be addressed by pruning a tree after it has learned in order to remove some of the detail it has picked up.

Underfitting in Machine Learning

Underfitting refers to a model that can neither model the training data not generalize to new data.

An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.

Underfitting is often not discussed as it is easy to detect given a good performance metric. The remedy is to move on and try alternate machine learning algorithms. Nevertheless, it does provide good contrast to the problem of concept of overfitting.

A Good Fit in Machine Learning

Ideally, you want to select a model at the sweet spot between underfitting and overfitting.

This is the goal, but is very difficult to do in practice.

To understand this goal, we can look at the performance of a machine learning algorithm over time as it is learning a training data. We can plot both the skill on the training data an the skill on a test dataset we have held back from the training process.

Over time, as the algorithm learns, the error for the model on the training data goes down and so does the error on the test dataset. If we train for too long, the performance on the training dataset may continue to decrease because the model is overfitting and learning the irrelevant detail and noise in the training dataset. At the same time the error for the test set starts to rise again as the model’s ability to generalize decreases.

The sweet spot is the point just before the error on the test dataset starts to increase where the model has good skill on both the training dataset and the unseen test dataset.

You can perform this experiment with your favorite machine learning algorithms. This is often not useful technique in practice, because by choosing the stopping point for training using the skill on the test dataset it means that the testset is no longer “unseen” or a standalone objective measure. Some knowledge (a lot of useful knowledge) about that data has leaked into the training procedure.

There are two additional techniques you can use to help find the sweet spot in practice: resampling methods and a validation dataset.

How To Limit Overfitting

Both overfitting and underfitting can lead to poor model performance. But by far the most common problem in applied machine learning is overfitting.

Overfitting is such a problem because the evaluation of machine learning algorithms on training data is different from the evaluation we actually care the most about, namely how well the algorithm performs on unseen data.

There are two important techniques that you can use when evaluating machine learning algorithms to limit overfitting:

  1. Use a resampling technique to estimate model accuracy.
  2. Hold back a validation dataset.

The most popular resampling technique is k-fold cross validation. It allows you to train and test your model k-times on different subsets of training data and build up an estimate of the performance of a machine learning model on unseen data.

A validation dataset is simply a subset of your training data that you hold back from your machine learning algorithms until the very end of your project. After you have selected and tuned your machine learning algorithms on your training dataset you can evaluate the learned models on the validation dataset to get a final objective idea of how the models might perform on unseen data.

Using cross validation is a gold standard in applied machine learning for estimating model accuracy on unseen data. If you have the data, using a validation dataset is also an excellent practice.

Further Reading

This section lists some recommended resources if you are looking to learn more about generalization, overfitting and underfitting in machine learning.


In this post you discovered that machine learning is solving problems by the method of induction.

You learned that generalization is a description of how well the concepts learned by a model apply to new data. Finally you learned about the terminology of generalization in machine learning of overfitting and underfitting:

  • Overfitting: Good performance on the training data, poor generliazation to other data.
  • Underfitting: Poor performance on the training data and poor generalization to other data

Do you have any questions about overfitting, underfitting or this post? Leave a comment and ask your question and I will do my best to answer it.


Diagnosing Bias vs. Variance

source: from machine-learning on Coursera by Dr. Andrew Ng 

in category: Machine Learning_tricks4better performance

(see this post for a deeper intro to bias and variances and talk about how it interacts with and is affected by the regularization of your learning algorithm.)

If you run the learning algorithm and it doesn’t do as well as you are hoping, almost all the time it will be because you have either a high bias problem or a high variance problem. In other words they’re either an underfitting problem or an overfitting problem. 

And in this case it’s very important to figure out which of these two problems is bias or variance or a bit of both that you actually have. Because knowing which of these two things is happening would give a very strong indicator for whether the useful and promising ways to try to improve your algorithm.





Machine Learning for Programmers

source: from  (Good posts sometimes disappear, so I repost it here for my and for your information.)

in category: Machine Learning_tricks4better performance

Leap From Developer To
Machine Learning Practitioner

or, my answer to the question:

How Do I Get Started In Machine Learning?

I’m a developer. I have read a book or some posts on machine learning. I have watched some of the Coursera machine learning course. I still don’t know how to get started…

Does this sound familiar?

Machine Learning Frustration

Frustrated with machine learning books and courses?
How do you get started in machine learning?
Photo by Peter Alfred Hess, some rights reserved

The most common question I’m asked by developers on my newsletter is:

How do I get started in machine learning?

I honestly cannot remember how many times I have answered it.

In this post, I lay out all of my very best thinking on this topic.

  • You will discover why the traditional approach to teaching machine learning does not work for you.
  • You will discover how to flip the entire model on its head.
  • And you will discover my simple but very effective antidote that you can use to get started.

Let’s get into it…

A Developer Interested in Machine Learning

You are a developer and you’re interested in getting into machine learning. And why not? It’s a hot topic at the moment, and it’s a fascinating and fast growing field.

You read some blog posts. You tried to go deeper but the books are dreadful. Math focused. Theory focused. Algorithm focused.


Machine Learning for Programmers - How Do I Get Started

Sound familiar? Have you tried books, MOOCs, blog posts and still not know how to get started in machine learning?

You try some video courses. You sign-up and make an honest attempt at the oft-cited Coursera Stanford Machine Learning MOOC. It’s not much better than the books and detailed blog posts. You can’t see what all the fuss is about, why it is recommended to beginners.

You may have even attempted some small data sets, perhaps an entry level Kaggle competition.

The problem is you can’t connect the theory, algorithms and math from the books and courses to the problem. There’s a huge gap. A gulf. How ARE you supposed to get started in machine learning?

Machine Learning Engineer

When you think forward into the future, once you have captured this elusive understanding of machine learning, what does your job look like? How are you using your newfound machine learning skills in your day-to-day?

I think I can see it. You’re a machine learning engineer. You’re a developer that knows how to “do” machine learning.

Machine Learning for Programmers - Dream

Do you want to transition from developer to a developer that can do machine learning?

Scenario 1: The one-off model

Your boss walks over and says:

Hey, you know machine learning, right? Can you use the customer data from last year to predict which current customers in our sales pipeline are likely to convert? I want to use it in a presentation to the board next week…

I call this the one-off model.

The problem is well defined by your boss. She gives you the data, which is small enough to look at and understand in MS Excel if you had to. She wants accurate and reliable predictions.

You can deliver. And more importantly, you can explain all the relevant caveats on the results.

Scenario 2: The embedded model

You and your team are collecting requirements from stakeholders on a software project. There is a requirement for the user to be able to freehand draw shapes in the software, and for the software to figure out which shape it is and turn it into a crisp unambiguous version and label it appropriately.

You quickly see that the best (and only viable?) way to solve this problem is to devise and train a predictive model and embed it in your software product.

I call this the embedded model. There are variations (such as whether the model is static or updated, and whether it is local or called remotely via an API), but that’s just detail.

What’s key in this scenario is that you have the experience to notice a problem that is best solved with a predictive model and the skills to devise, train and deploy it.

Scenario 3: The deep model

You have started a new job and the system you are working on is made up of at least one predictive model. Maintenance and the addition of features to this system require an understanding of the model, its inputs and its outputs. The accuracy of the model is a feature of the software product and part of your job will be to improve it.

For example, as a part of regular pre-release system testing, you must demonstrate that the accuracy of the model (when validated on historical data) has the same or better skill than the previous version.

I call this the deep model. You will be expected to build a deep understanding of one specific predictive model and use your experience and skill to improve and verify its accuracy as part of your routine duties.

The Developer That “Does” Machine Learning

These scenarios give you a glimpse at what it’s like to be a developer that knows how to do machine learning. They’re realistic because they are all variations on scenarios I’ve been in or tasks that I have had to complete.

All three of these scenarios make one thing very clear. Although machine learning is a fascinating area, to a developer machine learning algorithms are just another bag of tricks, like multi-threading or 3d graphics programming. Nevertheless, they are a powerful group of methods that are absolutely required for a specific class of problem.

Traditional Answer To: “how do I get started?

So how do you get started in machine learning?

If you crack a book on machine learning seeking an answer to this question, you’ll get a shock. They start with definitions and move on to mathematical descriptions of concepts and algorithms of ever increasing complexity.

Machine Learning for Programmers - The Traditional Approach

The traditional answer to the question “how do I get started in machine learning” is bottom-up.

Definitions and mathematical descriptions are clear, succinct and often unambiguous. The thing is, they are dry, boring and require the requisite mathematical background to parse and interpret.

There is a reason why machine learning is often taught as a graduate level subject at university. It’s because this “first principles” way of teaching the subject requires years of prerequisites to understand.

For example, it is advisable that you have a good footing in:

  • Statistics
  • Probability
  • Linear Algebra
  • Multivariate Statistics
  • Calculus

This gets worse if you stray slightly into some of the more exotic and interesting algorithms.

This bottom-up and algorithm fixated approach to machine learning is pervasive.

Online courses, MOOCs and YouTube videos mimic the university approach to teaching machine learning. Again, this is great if you have the background or you’ve already put in your half-to-full-decade of studies to earn those higher degrees. It does not help your average developer.

If you skulk off to a question and answer forum like Quora, StackExchange or Reddit and meekly ask how to get started, you’re slapped with the same response. Often this response comes from fellow developers who are just as lost. It’s one big echo chamber of the same bad advice.

It’s no wonder that honest and hard working developers seeking to do the right thing think they have to go back to school and get a Masters or Ph.D. before they feel qualified to “do” machine learning.

The Traditional Approach is DEAD WRONG!

Think about this bottom-up approach to teaching machine learning for a second. It’s rigorous and systematic and sounds like the right idea on the surface. How could it be wrong?

Bottom-Up Programming (or, how to kill off budding programmers)

Imagine you’re a young developer. You’ve picked up some of this and that language and you’re starting to learn how to create standalone software.

You tell friends and family that you want to get into a career where you get to program every day. They tell you that you need to do a degree in computer science before you can get a job as a programmer.

You sign-up and start a computer science degree. Semester after semester you are exposed to more and more esoteric algebra, calculus and discrete math. You use antiquated programming languages. Your passion for programming and building software wavers.

Machine Learning for Programmers - Gap in Bottom Up

The traditional approach to getting started in machine learning has a gap on the path to practitioner.

Perhaps you somehow make it to the other side. Looking back, you realize you were not taught one thing about modern software development practices, languages, tooling, or anything that you can use in your pursuit of creating and delivering software.

See the parallels to the teaching of machine learning?

Thankfully, programming has been around long enough, is popular enough and is important enough to the economy that we have found other ways to give budding young (or old) programmers the skills they need to actually do the thing they want to do – e.g. create software.

It does not make sense to load up a budding programmer’s head with theory on computability or computational complexity, or even deep details of algorithms and data structures. Some of this useful material (the latter on algorithmic complexity and data structures) can come later. Perhaps with focused material – but importantly in the context of an engineer that is already programming and delivering software, not in isolation.

Thankfully we have focused software engineering degrees. We also have resources like codecademy where you learn to program by… yep, actually programming.

If a developer wants to “do” machine learning, should they really have to go and spend a bunch of years and tens or hundreds of thousands of dollars to get the requisite math and higher degrees?

The answer is of course not! There is a better way.

A Better Approach

As with computer science, you can’t just invert the model and teach the same material top-down.

The reason is, like a computer science course never making it to the subjects that cover the practical concerns of developing and delivering software, machine learning courses and books fall well short. They stop at algorithms.

You need a top-down approach to machine learning. An approach where you focus on the actual result you want: working real machine learning problems from end-to-end using modern and “best of breed” tools and platforms.

Machine Learning for Programmers - A Better Approach

A better approach to learning machine learning that starts with working machine learning problems end-to-end.

Here’s what I think your yellow brick road looks like.

1. Repeatable Results with a Systematic Process

Once you know some tooling, it is relatively easy to blast a problem with a machine learning algorithm and call it “done“.

This could be dangerous.

How do you know you’re done? How do you know the results are any good? How do you know the results are reliable on the dataset?

You need to be systematic when working a machine learning problem. It’s a project, like a software project, and good processes can make achieving a high-quality result repeatable from project to project.

Contemplating such a process you can think of some clear requirements, such as:

  • A process that guides you from end-to-end, from problem specification to presentation or deployment of results. Like a software project, you can think you’re done, but you’re probably not. Having the end deliverable in mind from the beginning sets an unambiguous project stop condition and focuses effort.
  • A process that is step-by-step so that you always know what to do next. Not knowing what to do next is a project killer.
  • A process that guarantees “good” results, e.g. better than average or good enough for the needs of the project. It is very common for projects to need good results delivered reliably with known confidence levels, not necessarily the very best accuracy possible.
  • A process that is invariant to the specific tools, programming languages and algorithm fads. Tools come and go and the process must be adaptive. Given the algorithm obsession in the field, there are always new and powerful algorithms coming out of academia.

Machine Learning for Programmers - Select a Systematic Process

Select a systematic and repeatable process that you can use to deliver results consistently.

There are many great processes out there, including some older processes that you can adapt to your needs.

For example:

Pick or adapt a process that works best for you and meets the requirements above.

2. Mapping of “Best of Breed” Tools onto Your Process

Machine learning tools and libraries come and go, but at any single point in time you have to use something that best maps onto your chosen process of delivering results.

You don’t want to evaluate and select any old algorithm or library, you want the so-called “best of breed” that is going to give you fast, reliable and high-quality results and automate as much of your process that you can afford.

Again, you are going to have to make these selections yourself. If you ask anyone, you’re going to hear their biases, often the latest tool they’re using.

I have my own biases, and I like to use different tools and platforms for different types of work.

For example, in the scenarios listed above, I would advise the following best of breed tools:

  • One-off predictive model: The Weka platform, because I can load a CSV, design an experiment and get the best model in no time at all without a line of programming (see my mapping onto the process).
  • Embedded predictive model: Python with scikit-learn, because I can develop the model in the same language in which it is deployed. IPython is a great way to demonstrate your pipeline and results to the broader team. A MLaaS is also an option for bigger data.
  • Deep-dive model: R with the caret package, because I can quickly and automatically try a lot of state-of-the-art models and devise more and more elaborate feature selection, feature engineering and algorithm tuning experiments using the whole R platform.

In reality, these three tools bleed across the three scenarios depending on the specifics of a situation.

Machine Learning for Programmers - Select Tools

Map your preferred machine learning tools onto your chosen systematic process for working through problems.

Like development, you need to study your tools to get the most from them. You also need to keep your ear to the ground and jump to newer better tools if and when they are available, forever adapting them to your repeatable process.

3. Targeted Practice with Semi-Formal Work Product

You get good at development by practicing – by developing lots of software. Use this familiar approach to get good at machine learning. The more of the process you practice in each project, the better (ideally, work problems end-to-end).

Carefully Select Your Practice Datasets

You want to pick datasets that are realistic rather than contrived. There are hundreds of free datasets out there of ever increasing complexity.

  • I would advise starting with small in-memory datasets from places like the UCI Machine Learning Repository. They are well known, relatively clean and a good place to start as you feel out your new process and tooling.
  • From there, I would recommend larger in-memory datasets, like those from some Kaggle and KDD cup competitions. They are little more dirty and require you to flex work on more and different skills.

Stick with tabular data, this is what I advise all of my students.

Working with image and text data are new and different fields in their own right (computer vision and natural language processing respectively) that require you to learn specialized methods and tooling to those fields. If these are types of problems you want or need to work then it might be best to start there, and there are great resources available.

I go into a lot more detail on how to do targeted practice in the post “Practice Machine Learning with Small In-Memory Datasets from the UCI Machine Learning Repository“.

Write-Up Your Results and Build A Public Portfolio of Work

Create and retain semi-formal outcomes (I refer to outcomes as “work product”) from each project. By this I mean, write up what you did and what you learned into some kind of standalone document so that you can refer back and leverage the results on future and following projects.

This is akin to keeping a directory for each programming project as a developer and reusing code and ideas from previous projects. It speeds up the journey a lot, and I strongly recommend it.

Keep any scripts, code and generated images, but it is also important to write up your findings. Think of it as akin to comments in your code. A standalone write-up could be a simple PPT or text file, or as elaborate as a presentation at a meet-up or video on YouTube.

Machine Learning for Programmers - Targeted Practice

Work through and complete discrete projects, write up results and build a portfolio of projects.


Save each project in a public version control repository (like GitHub) so that other beginners can learn from you and extend your work. Link to the projects from your blog, LinkedIn or wherever and use the public portfolio to demonstrate your growing skills and capabilities.

See more on this important idea in my post titled “Build a Machine Learning Portfolio: Complete Small Focused Projects and Demonstrate Your Skills“.

A portfolio of public GitHub repositories is fast becoming the resume in the hiring process at companies that actually care about skills and delivering results.

Yes, This Approach Is Tailored For Developers

What we have laid out above is an approach that you can use as a developer to learn, get started and make progress in machine learning.

It’s natural you may have some doubts about whether this approach is really suited to you. Let me address some of your concerns.

You do not need to write code

You may be a web developer or similar where you do not write a lot of code. You can use this approach to get started and apply machine learning. Tools like Weka make it easy to devise machine learning experiments and build models without any programming at all.

Writing code can unlock more and different tools and capabilities, but it is not required, and it does not need to come first.

You do not need to be good at mathematics

Just like development where you don’t need to know a thing about computability or big-O notation to write code and ship useful and reliable software, you can work machine learning problems end-to-end without a background in statistics, probability and linear algebra.

It is important to note that we do not start with theory, but we do not ignore it. Dive in and pull out what you need on a method or algorithm, when you need it. In fact, you won’t be able to hold yourself back. The reason is, working machine learning problems is addictive and consuming. In the pursuit of getting better results and more accurate predictions, you will draw from any resources you can find, learning just enough to extract the nuggets of wisdom for you to apply on your problem.

If your goal is to master the theory, this approach is slower and less efficient. And this is why it so uncomfortable when seen through that lens. When viewed from the goal of being a developer that does machine learning, it makes a lot of sense.

You do not need a higher degree

There are no gatekeepers to this knowledge. It’s all available and you can study it yourself, today, now. You do not need to trade a lot of time and money for a degree before you can start working on machine learning problems.

If you heart is set on getting that higher degree, why not just start working on machine learning problems first and take a look at a degree in a few weeks or months after you have a small portfolio of completed projects built up. You will have a much clearer idea of the extent of the field and the parts you like.

I did go back and get those higher degrees. I love doing research, but I love working real problems and delivering results that clients actually care about a whole lot more. Also, I was working machine learning problems before I started the degree, I just didn’t realize I already had the resources and a path in front of me.

It’s one of the reasons I’m so passionate to convince developers like you that you have what you need to get started right now.

Machine Learning for Programmers - Limiting Beliefs2

It is so very easy to come up with excuses to not get started in machine learning.

You do not need big data

Machine learning algorithms were developed and are best understood on small data. Data small enough for you to review in MS Excel, to load into memory and to work through on your desktop workstation.

Big data != machine learning. You can build predictive models using big data, but see this as a specialization of your skill set to a domain. I generally advise my students to start with small in memory datasets when starting in machine learning.

If big data machine learning is the area you want to work, then start there.

You do not need a desktop supercomputer

It is true that some of the state-of-the-art algorithms like deep learning require very powerful bazillion-core GPUs. They are powerful and exciting algorithms. They are also algorithms that work on smaller problems that you can compute with your desktop CPU.

You do not need to hold off getting started in machine learning until you have access to a big-fast computer.

Before you go off and buy a desktop supercomputer or rent very large EC2 instances, it might be worth spending some time learning how to get the most from these algorithms on smaller better-understood datasets.

You do not need a lot of time

We all have busy lives, but if you really want something you need to put in the time.

I’ve said it before, working machine learning problems is addictive. If you get caught up in machine learning competitions you will gladly sacrifice a month of evening television to squeeze a few more percent from your algorithm.

That being said, if you start small with a clear process and a best of breed tool, you can work a dataset from end-to-end in an hour or two, perhaps spread over one or two nights. A few of these and you have a beachhead on a portfolio of completed machine learning projects that you can begin to leverage on larger and more interesting problems.

Break it down into snack-size tasks on your Kanban board and make the time to get started.

Biggest Mistakes Developers Make and How To Avoid Them

I have been giving variations on this advice for close to two years now since I launched Machine Learning Mastery. Over that time I’ve seen five common pitfalls that I want you to avoid.

  1. Not Taking Action: It’s all laid out and yet I see so many developers not take action. It is so much easier to watch TV or read news than to build a new and valuable skill in a fascinating field of study. You can lead a horse to water…
  2. Picking Problems that are Too Big: I commonly see the first or second dataset a developer selects to work on being too difficult. It’s too large, too complex or too dirty and they’re not ready for the challenge. The awful thing is that the “failure” kills motivation and the developer flunks out of the process. Pick small problems that you can finish and write up in 60 minutes. Do that for a while before you take on something bigger.
  3. Implementing Algorithms from Scratch: We have algorithm implementations. It’s done. At least done enough for you to do interesting things for the next few years. If your goal is to learn how to develop and deliver reliable and accurate predictive models, do not spend time implementing algorithms from scratch, use a library. On the other hand, if you want to focus on implementing algorithms, then clearly make this your objective and focus in on it.
  4. Not Sticking to a Process: As with agile software development, if you deviate from the process, the wheels can come off your project pretty quickly and the result is often a big mess. Sticking to a process from start-to-finish that systematically works through a problem end-to-end is key. You can revisit “that interesting thing you found…” as a follow-up mini-project (an “ideas for follow-up work” section in your write-up), but finish the process and deliver.
  5. Not Using Resources: There are many great papers, books and blog posts on machine learning. You can leverage these resources to improve your process, usage of tools and accuracy of your results. Use third party resources to get more from your algorithm and your dataset. Get ideas for algorithms and framings of the problem. A nugget of wisdom can change the course of your project. Remember, if you adopt a top-down process, the theory has to come in at the back-end. Take the time to understand your final models.

Don’t let any of these happen to you!

Your Next Step

We have covered a lot of ground and I hope I am starting to convince you that you can get started and make progress in machine learning. That a future in which you are a developer that can do machine learning is very real and very obtainable.

Your very next steps are:

  1. Select a process (or just use this one).
  2. Select a tool or platform (or just use this one).
  3. Select your first dataset (or just use this one).
  4. Report back in the comment below and execute!

Hey, did you find this post useful? Leave a comment!

Update: Check out this handy mind map that summarizes the important concepts in this post (thanks for the suggestion Simeon!).

Machine Learning For Programmers Mind Map

A hand mind map that summarizes the important concepts in this post.
[Click image to enlarge]

Frustrated With Machine Learning Math?

See How Algorithms Work in Minutes

…with just arithmetic and simple examples

Discover how in my new Ebook: Master Machine Learning Algorithms

It covers explanations and examples of 10 top algorithms, including:
Linear Regression, k-Nearest Neighbors, Support Vector Machines and much more…

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

Machine Learning Performance Improvement Cheat Sheet

source: from  (Good posts sometimes disappear, so I repost it here for my and for your information.)

in category: Machine Learning_tricks4better performance

32 Tips, Tricks and Hacks That You Can Use To Make Better Predictions.

The most valuable part of machine learning is predictive modeling.

This is the development of models that are trained on historical data and make predictions on new data.

And the number one question when it comes to predictive modeling is:

How can I get better results?

This cheat sheet contains my best advice distilled from years of my own application and studying top machine learning practitioners and competition winners.

With this guide, you will not only get unstuck and lift performance, you might even achieve world-class results on your prediction problems.

Let’s dive in.

Note, the structure of this guide is based on an early guide that you might fine useful on improving performance for deep learning titled: How To Improve Deep Learning Performance.

Machine Learning Performance Improvement Cheat Sheet

Machine Learning Performance Improvement Cheat Sheet
Photo by NASA, some rights reserved.


This cheat sheet is designed to give you ideas to lift performance on your machine learning problem.

All it takes is one good idea to get a breakthrough.

Find that one idea, then come back and find another.

I have divided the list into 4 sub-topics:

  1. Improve Performance With Data.
  2. Improve Performance With Algorithms.
  3. Improve Performance With Algorithm Tuning.
  4. Improve Performance With Ensembles.

The gains often get smaller the further you go down the list.

For example, a new framing of your problem or more data is often going to give you more payoff than tuning the parameters of your best performing algorithm. Not always, but in general.

1. Improve Performance With Data

You can get big wins with changes to your training data and problem definition. Perhaps even the biggest wins.

Strategy: Create new and different perspectives on your data in order to best expose the structure of the underlying problem to the learning algorithms.

Data Tactics

  • Get More Data. Can you get more or better quality data? Modern nonlinear machine learning techniques like deep learning continue to improve in performance with more data.
  • Invent More Data. If you can’t get more data, can you generate new data? Perhaps you can augment or permute existing data or use a probabilistic model to generate new data.
  • Clean Your Data. Can you improve the signal in your data? Perhaps there are missing or corrupt observations that can be fixed or removed, or outlier values outside of reasonable ranges that can be fixed or removed in order to lift the quality of your data.
  • Resample Data. Can you resample data to change the size or distribution? Perhaps you can use a much smaller sample of data for your experiments to speed things up or over-sample or under-sample observations of a specific type to better represent them in your dataset.
  • Reframe Your Problem: Can you change the type of prediction problem you are solving? Reframe your data as a regression, binary or multiclass classification, time series, anomaly detection, rating, recommender, etc. type problem.
  • Rescale Your Data. Can you rescale numeric input variables? Normalization and standardization of input data can result in a lift in performance on algorithms that use weighted inputs or distance measures.
  • Transform Your Data. Can you reshape your data distribution? Making input data more Gaussian or passing it through an exponential function may better expose features in the data to a learning algorithm.
  • Project Your Data: Can you project your data into a lower dimensional space? You can use an unsupervised clustering or projection method to create an entirely new compressed representation of your dataset.
  • Feature Selection. Are all input variables equally important? Use feature selection and feature importance methods to create new views of your data to explore with modeling algorithms.
  • Feature Engineering. Can you create and add new data features? Perhaps there are attributes that can be decomposed into multiple new values (like categories, dates or strings) or attributes that can be aggregated to signify an event (like a count, binary flag or statistical summary).

Outcome: You should now have a suite of new views and versions of your dataset.

Next: You can evaluate the value of each with predictive modeling algorithms.

2. Improve Performance With Algorithms

Machine learning is all about algorithms.

Strategy: Identify the algorithms and data representations that perform above a baseline of performance and better than average. Remain skeptical of results and design experiments that make it hard to fool yourself.

Algorithm Tactics

  • Resampling Method. What resampling method is used to estimate skill on new data? Use a method and configuration that makes the best use of available data. The k-fold cross-validation method with a hold out validation dataset might be a best practice.
  • Evaluation Metric. What metric is used to evaluate the skill of predictions? Use a metric that best captures the requirements of the problem and the domain. It probably isn’t classification accuracy.
  • Baseline Performance. What is the baseline performance for comparing algorithms? Use a random algorithm or a zero rule algorithm (predict mean or mode) to establish a baseline by which to rank all evaluated algorithms.
  • Spot Check Linear Algorithms. What linear algorithms work well? Linear methods are often more biased, are easy to understand and are fast to train. They are preferred if you can achieve good results. Evaluate a diverse suite of linear methods.
  • Spot Check Nonlinear Algorithms. What nonlinear algorithms work well? Nonlinear algorithms often require more data, have greater complexity but can achieve better performance. Evaluate a diverse suite of nonlinear methods.
  • Steal from Literature. What algorithms are reported in the literature to work well on your problem? Perhaps you can get ideas of algorithm types or extensions of classical methods to explore on your problem.
  • Standard Configurations. What are the standard configurations for the algorithms being evaluated? Each algorithm needs an opportunity to do well on your problem. This does not mean tune the parameters (yet) but it does mean to investigate how to configure each algorithm well and give it a fighting chance in the algorithm bake-off.

Outcome: You should now have a short list of well-performing algorithms and data representations.

Next: The next step is to improve performance with algorithm tuning.

3. Improve Performance With Algorithm Tuning

Algorithm tuning might be where you spend the most of your time. It can be very time-consuming. You can often unearth one or two well-performing algorithms quickly from spot-checking. Getting the most from those algorithms can take, days, weeks or months.

Strategy: Get the most out of well-performing machine learning algorithms.

Tuning Tactics

  • Diagnostics. What diagnostics and you review about your algorithm? Perhaps you can review learning curves to understand whether the method is over or underfitting the problem, and then correct. Different algorithms may offer different visualizations and diagnostics. Review what the algorithm is predicting right and wrong.
  • Try Intuition. What does your gut tell you? If you fiddle with parameters for long enough and the feedback cycle is short, you can develop an intuition for how to configure an algorithm on a problem. Try this out and see if you can come up with new parameter configurations to try on your larger test harness.
  • Steal from Literature. What parameters or parameter ranges are used in the literature? Evaluating the performance of standard parameters is a great place to start any tuning activity.
  • Random Search. What parameters can use random search? Perhaps you can use random search of algorithm hyperparameters to expose configurations that you would never think to try.
  • Grid Search. What parameters can use grid search? Perhaps there are grids of standard hyperparameter values that you can enumerate to find good configurations, then repeat the process with finer and finer grids.
  • Optimize. What parameters can you optimize? Perhaps there are parameters like structure or learning rate than can be tuned using a direct search procedure (like pattern search) or stochastic optimization (like a genetic algorithm).
  • Alternate Implementations. What other implementations of the algorithm are available? Perhaps an alternate implementation of the method can achieve better results on the same data. Each algorithm has a myriad of micro-decisions that must be made by the algorithm implementor. Some of these decisions may affect skill on your problem.
  • Algorithm Extensions. What are common extensions to the algorithm? Perhaps you can lift performance by evaluating common or standard extensions to the method. This may require implementation work.
  • Algorithm Customizations. What customizations can be made to the algorithm for your specific case? Perhaps there are modifications that you can make to the algorithm for your data, from loss function, internal optimization methods to algorithm specific decisions.
  • Contact Experts. What do algorithm experts recommend in your case? Write a short email summarizing your prediction problem and what you have tried to one or more expert academics on the algorithm. This may reveal leading edge work or academic work previously unknown to you with new or fresh ideas.

Outcome: You should now have a short list of highly tuned algorithms on your machine learning problem, maybe even just one.

Next:One or more models could be finalized at this point and used to make predictions or put into production. Further lifts in performance can be gained by combining the predictions from multiple models.

4. Improve Performance With Ensembles

You can combine the predictions from multiple models. After algorithm tuning, this is the next big area for improvement. In fact, you can often get good performance from combining the predictions from multiple “good enough” models rather than from multiple highly tuned (and fragile) models.

Strategy: Combine the predictions of multiple well-performing models.

Ensemble Tactics

  • Blend Model Predictions. Can you combine the predictions from multiple models directly? Perhaps you could use the same or different algorithms to make multiple models. Take the mean or mode from the predictions of multiple well-performing models.
  • Blend Data Representations. Can you combine predictions from models trained on different data representations? You may have many different projections of your problem which can be used to train well-performing algorithms, whose predictions can then be combined.
  • Blend Data Samples. Can you combine models trained on different views of your data? Perhaps you can create multiple subsamples of your training data and train a well-performing algorithm, then combine predictions. This is called bootstrap aggregation or bagging and works best when the predictions from each model are skillful but in different ways (uncorrelated).
  • Correct Predictions. Can you correct the predictions of well-performing models? Perhaps you can explicitly correct predictions or use a method like boosting to learn how to correct prediction errors.
  • Learn to Combine. Can you use a new model to learn how to best combine the predictions from multiple well-performing models? This is called stacked generalization or stacking and often works well when the submodels are skillful but in different ways and the aggregator model is a simple linear weighting of the predictions. This process can be repeated multiple layers deep.

Outcome: You should have one or more ensembles of well-performing models that outperform any single model.

Next: One or more ensembles could be finalized at this point and used to make predictions or put into production.

Final Word

This cheat sheet is jam packed full of ideas to try to improve performance on your problem.

How To Get Started

You do not need to do everything. You just need one good idea to get a lift in performance.

Here’s how to handle the overwhelm:

  1. Pick one group
    1. Data.
    2. Algorithms.
    3. Tuning.
    4. Ensembles.
  2. Pick one method from the group.
  3. Pick one thing to try of the chosen method.
  4. Compare the results, keep if there was an improvement.
  5. Repeat.