To find out more on the Pima indian dataset challenge visit Kaggle
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases.
The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.
Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
import pandas as pd # for reading the csv file
import matplotlib.pyplot as plt # for plotting graphs
import numpy as np # for numerical manipulation
diabetes_csv = pd.read_csv("diabetes.csv") # for reading csv files
%pylab inline
"""
-> this is a magic function
-> this is an Ipython command, that allows graphs to be embedded in the notebook.
-> %matplotlib, %pyplot and %pylab wotk the same way only that %pylab imports all needed
libraries for graphing using matplotlib
"""
dataset = diabetes_csv # assigning the csv to the dataset variable
dataset.head() # prints the first 5 rows of our csv
dataset.plot() # this is for plotting our csv data
plt.show() # for plotting in the notebook though it can be oittmed because %pylab inline
# setting a random seed ensures reproducibility of the results
seed = 7
np.random.seed(seed)
dataset.shape # shows the number of rows and colums
dataset.dtypes # shows the data types
# descriptions, change precision to 3 decimal places
pd.set_option('precision', 3)
pd.set_option('display.width', 200)
dataset.describe()
# class distribution
dataset.groupby('Outcome').size() # lets you know how many values are assigned for each variable 0 or 1
# correlation
dataset.corr(method='pearson')
dataset.head(10)
# Prepare Data
array = dataset.values
X = array[:,0:8]
Y = array[:,8]
# spliting helps us map on our training set (X) to your target out come (Y)
import tensorflow as tf # importing tensorflow
from tensorflow import keras # importing keras from the tensorflow library to run any Keras-compatible code
The tf.keras
version in the latest TensorFlow release might not be the same as the latest keras version from PyPI (pip installation)
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint, EarlyStopping
from keras.models import model_from_json
import os
from keras.models import Sequential
In Keras, you assemble layers to build models. A model is (usually) a graph of layers.
The most common type of model is a stack of layers: the tf.keras.Sequential model.
from keras.layers import Dense
Just your regular densely-connected NN layer
from keras.callbacks import ModelCheckpoint, EarlyStopping
Callbacks are essentially a set of functions to be applied at different stages of the training procedure.
ModelCheckpoint
- Save the model after every epoch.
EarlyStopping
- Stop training when a monitored quantity has stopped improving.
from keras.models import model_from_json
to save a Keras model into a single HDF5 file which will contain:
-the architecture of the model, allowing to re-create the model
-the weights of the model
-the training configuration (loss, optimizer)
-the state of the optimizer, allowing to resume training exactly where you left off.
model = Sequential()
model.add(Dense(1024, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1024, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1024, kernel_initializer='uniform', activation='relu'))
model.add(Dense(512, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
print(model.summary())
%%time
model.compile(loss='binary_crossentropy', optimizer='adamax', metrics=['accuracy'])
# Save the model according to the conditions
checkpoint = ModelCheckpoint(filepath="diabetes.h5", monitor='acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
"""
Save the model after every epoch.
monitor: quantity to monitor... either val_acc or val_loss
verbose: verbosity mode, 0 or 1.
save_best_only: if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten.
mode: one of {auto, min, max}
--> auto - will infer from the quantity to monitor
--> min - is only used when the monitor = 'val_loss'
--> max - is only used when monitor = 'val_acc'
save_weights_only: if True, then only the model's weights will be saved
else, else the full model is saved
period: Interval (number of epochs) between checkpoints.
"""
model.save("diabetes.h5")
#early stopping in the event there is no improvement in val_acc
early = EarlyStopping(monitor='acc', min_delta=0, patience=10, verbose=1, mode='auto')
# patience is the number of epochs to stop if there no improvement
"""
Stop training when a monitored quantity has stopped improving.
monitor: quantity to be monitored.
min_delta: minimum change in the monitored quantity to qualify as an improvement,
i.e. an absolute change of less than min_delta, will count as no improvement.
patience: number of epochs with no improvement after which training will be stopped.
verbose: verbosity mode.
mode: one of {auto, min, max}.
In min mode, training will stop when the quantity monitored has stopped decreasing;
in max mode it will stop when the quantity monitored has stopped increasing;
in auto mode, the direction is automatically inferred from the name of the monitored quantity.
"""
%%time
model.fit(X, Y, epochs=250, batch_size=10, callbacks = [checkpoint, early])
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
# load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))