Training the model in Machine Learning
In this article we are going to get the trained model using the algorithm selected in the previous article.
Before proceeding, we hope you have covered our previous articles:
- Starting in Machine Learning.
- Preparing training data in Machine Learning.
- Selecting an algorithm for processing training data.
Let’s first understand the term Machine Learning Training:
Letting specific data teach a Machine Learning algorithm to create a specific prediction model.
This is the purpose of this article and here algorithm is “Naive Bayes” and data is the “prepared data” which we have already selected in the previous articles.
So, we are going to provide the “prepared data” to the “Naive Bayes” algorithm to get the trained model in the end.
And we are going to split the “prepared data” into two parts: 70% training data and rest 30% testing data. The reason because the data we have is all related to real world entities and hence, we need some part of data to be available for testing the model so that it can make the accurate prediction about the people will develop the diabetes.
And in python we have the scikit-learn package to get the trained model:
We have to write the following code in Jupyter notebook for splitting the data:
In the above code we have taken the columns in our data and defining predicted class which has two values 0 for not diabetes and 1 for diabetes. We have defined split test size = 0.30 just because we need 30% data for testing and random_state = 42 because to pass any numerical value to indicate we need to split the data.
And to ensure the split:
Now, we have the data divided. Let’s run the code.
df.head()
Above we can see the so many zero values for the columns such as thickness, num_preg and insulin and these can impact the accuracy of our model. So, we have the following options:
- Ignore the values.
- Drop the observations (rows).
- Replace the values (impute).
But, we will go with the last option because we can’t delete the missing or zero values which can a great bias in the need. So, we going to replace the values with the mean of the values of that particular column and we call this operation Imputing and in python we have Imputer class for the same.
Finally, we have reached the last step to get the trained model by importing the Naive Bayes using python.
In the output above, we have the trained model which we will test it in the next article and finally will use it to make the prediction.
If you’ve found this article, please hit a clap button and share it with others.