Testing the model’s accuracy in ML

Before proceeding, we hope you have covered our previous articles:
- Starting in Machine Learning.
- Preparing training data in Machine Learning.
- Selecting an algorithm for processing training data.
- Training the model in Machine Learning.
In this article, we going to test the accuracy of the model that how well it can predict the likelihood of developing diabetes in a person. So, we are going to make use of the testing data that we have collected in the previous article.

Further, in this article, we are going to:
- Evaluate the model against the testing data.
- Interpret the results.
- Improve the results.
Hence, we are starting with evaluating the performance of the model against training data:

And the performance of the model against testing data:

As per the above results, there’s a need to improve the performance and for that, we have the following options:
- Adjust the current algorithm.
- Get more data or improve the data.
- Improve training.
- Switch algorithms.
At this stage, we have only the last option to choose i.e. Switch algorithms. So, we are switching to the Random Forest algorithm which is:
- An ensemble algorithm.
- Fits multiple trees with subsets of data.
- It includes average tree results to improve performance and control overfitting.
So, let's identify the performance on training data using Random forest algorithm:

We have got an excellent accuracy with the training data with Random Forest. Now, let’s test with the test data:

That means our model has learned training data too well in comparison to the test data and this is called as overfitting.
So, we are going to switch different algorithm i.e. Logistic Regression:

As we can see, it has better accuracy than the previous selected models/algorithms and the gap between the accuracy of both training and test data is very less.
Hence, this is the model we are going to adopt for predicting whether the person has a chance to have diabetes or not.