Hands-on Activity Machine Learning for Diabetes Prediction

Quick Look

Grade Level: 11 (10-12)

Time Required: 1 hours 45 minutes

(two 50-minute class periods)

Expendable Cost/Group: US $0.00

Group Size: 2

Activity Dependency: None

Subject Areas: Biology, Computer Science, Data Analysis and Probability, Life Science, Measurement, Reasoning and Proof, Science and Technology

NGSS Performance Expectations:

NGSS Three Dimensional Triangle

A visualization of machine learning input-output in a three step process. The process starts with visual concepts inputted through a stylized “brain” and outputted into phrases.
The machine learning input-output process, visualized.
Copyright © 2018 Francesco Pettini, CC-BY-4.0, Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Machine_learning.jpg


Machine learning is an exciting method that engineers can use to understand large data sets. In this hands-on activity, students put on their computer science hats to tackle a real-world problem: designing a machine learning model that can predict whether a patient has diabetes. Students first learn about the diabetes epidemic and the relationship between machine learning and healthcare. They design a simple program using machine learning that can predict whether a patient has diabetes depending on various symptoms and measurements. The goal is not just to expose students to machine learning, but the realities of the diabetes epidemic.
This engineering curriculum aligns to Next Generation Science Standards (NGSS).

Engineering Connection

Engineers use machine learning in various applications across all industries. Recently research studies have explored the benefits of using a machine learning approach to detect anomalies in the area of healthcare. Machine learning engineers continually evolve the technology to improve the performance and the accuracy of the results, such as detecting diabetes cases. Much research has gone into the non-invasive automated detection of diabetes using machine learning techniques. Machine learning is employed based on steps of data preprocessing, feature selection, and classification.

Learning Objectives

After this activity, students should be able to:

  • Participate in an inquiry-based activity where they will make a hypothesis on which columns from a dataset, representing different symptoms to predict if a person has diabetes.
  • Use machine learning skills to design a model to make the predictions they hypothesized.
  • Use the model to test other symptom data and refine their research.
  • Utilize Google Collab and Python to introduce machine learning tools for predicting diabetes.

Educational Standards

Each TeachEngineering lesson or activity is correlated to one or more K-12 science, technology, engineering or math (STEM) educational standards.

All 100,000+ K-12 STEM standards covered in TeachEngineering are collected, maintained and packaged by the Achievement Standards Network (ASN), a project of D2L (www.achievementstandards.org).

In the ASN, standards are hierarchically structured: first by source; e.g., by state; within source by type; e.g., science or mathematics; within type by subtype, then by grade, etc.

NGSS Performance Expectation

HS-ETS1-4. Use a computer simulation to model the impact of proposed solutions to a complex real-world problem with numerous criteria and constraints on interactions within and between systems relevant to the problem. (Grades 9 - 12)

Do you agree with this alignment?

Click to view other curriculum aligned to this Performance Expectation
This activity focuses on the following Three Dimensional Learning aspects of NGSS:
Science & Engineering Practices Disciplinary Core Ideas Crosscutting Concepts
Use mathematical models and/or computer simulations to predict the effects of a design solution on systems and/or the interactions between systems.

Alignment agreement:

Both physical models and computers can be used in various ways to aid in the engineering design process. Computers are useful for a variety of purposes, such as running simulations to test different ways of solving a problem or to see which one is most efficient or economical; and in making a persuasive presentation to a client about how a given design will meet his or her needs.

Alignment agreement:

Models (e.g., physical, mathematical, computer models) can be used to simulate systems and interactions—including energy, matter, and information flows—within and between systems at different scales.

Alignment agreement:

  • Connect technological progress to the advancement of other areas of knowledge and vice versa. (Grades 9 - 12) More Details

    View aligned curriculum

    Do you agree with this alignment?

Suggest an alignment not listed above

Materials List

Equipment and Materials

Worksheets and Attachments

Visit [www.teachengineering.org/activities/view/rice-2622-machine-learning-diabetes-prediction] to print or download.

Pre-Req Knowledge

Have basic computer skills.

Be familiar with Excel and Excel spreadsheet data.

Have  ability to read a histogram and interpret values in a dataset.


What is machine learning? [Let students raise their hands and give answers.] Machine learning is an application of artificial intelligence that allows software applications to become accurate in predicting outcomes. Moreover, machine learning focuses on the development of computer algorithms that can perform specific tasks.

Who was the first person to invent machine learning? Can you think of anything that could need machine learning? Self-driving cars, mobile voice assistants from technology companies like Apple (Siri), Amazon (Alexa) or Microsoft (Cortana) or even your email inbox all rely on some form of machine learning. How do we use machine learning across many disciplines? What role does machine learning play in improving our lives?

Machine learning is used in internet search engines, email filters to sort out spam, websites to make personalized recommendations, banking software to detect unusual transactions, and many apps on our phones such as voice recognition and predictive text autocomplete.

There are three types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

So, why is machine learning important in healthcare? Machine learning might positively affect patient care outcomes. One example of a machine learning application in healthcare includes disease identification and diagnosis by analyzing thousands of healthcare records and other patient data. This can help increase healthcare access in underserved communities and developing countries by streamlining disease diagnosis and treatment in a low-cost way.

Machine learning could be a computer algorithm that predicts a medical imaging diagnosis by performing image processing on a patient’s high-resolution x-rays, CT scans, or MRIs. Intelligent systems built on machine learning algorithms can learn from experience or historical data to make decisions like a medical diagnosis.

What do you want the machine learning system to do? The possibilities are limitless.



In this hands-on activity, students put on their computer science hat on to tackle a real-world problem: creating a machine learning model that can predict whether a patient has diabetes. Machine learning is a rapidly growing field of computer science that is leading to heavy investment from companies across almost every industry, but it is a topic that is often left unexplored in a classroom due to its complexity. This hands-on activity is for students to create a simple program using machine learning that can predict whether a patient has diabetes depending on various symptoms and measurements. The goal is not just to expose students to machine learning, but the realities of the diabetes epidemic. Machine learning is made accessible and the possible applications of this student-created program in the health and wellness context. Students will be using the engineering design process to design, create, and test machine learning models. The relationship between health and computer science, machine learning, and healthcare will be explored through various articles and web resources. Students will use Python (a programming language), Google Colab (a programming environment) and two popular Machine Learning libraries, SciKit and Seaborn, within Google Colab to build their prototype and visualize their data. The challenge exposes students to programming, machine learning, and various libraries. They will realize how programming and engineering can be used to solve health problems. Finally, students test their machine learning model by hypothesizing which features in the dataset will provide the most accurate prediction model conducting some simple experimentation, and comparing their results.

Before the Activity

With the Students

Day 1 (50 minutes)

  1. Warm-Up/Hook (5 min): How many people in the US suffer from diabetes? What about the entire world?
  2. Class Discussion (5 min): Have students do a think-pair-share on the impact of diabetes in their local community.
    1. Discussion Goal: Students should understand that diabetes is an epidemic that affects many people around the world. There is also a higher impact on African-American and Latino populations.
  1. Provide students with the Pre-Activity Reading Assessment.
  2. Divide the class into groups of two or three students each.
  3. Small Group Reading (10 min): Pass out the Pre-Activity Reading Worksheet and ask students to work in pairs to read the first three articles, discuss and take notes about Type 1 and Type 2 Diabetes.
  4. Formative Assessment (10 min): After students have finished the readings and taken notes, pass out the Pre-Activity Assessment and have students complete their answers to Part 1. Ask students to share out their answers with another group and the class.
  5. Small-Group Reading (10 min): Allow students to read the last article about machine learning and take notes.
  6. Formative Assessment (10 min): After students have finished the readings and taken notes, allow them time to complete their answers to Part 2.
    • Go over student responses to part 2 taking time to cover questions 9 and 10.

Day 2 (50 minutes)

  1. Warm-Up/Hook (5 min): The teacher will distribute the diabetes_data_upload dataset for the students and explain that each row contains data of a patient and each column represents a different piece of information. Ask them to review the data with their group and write a hypothesis about which columns could best be used to create a machine learning model that can predict whether a patient has diabetes or not. The last column class tells whether the patient is diabetic and should not be included in the hypothesis
  2. Guided Coding Activity (25 min): “Attention: This activity requires the following resource Google Colab notebook (https://colab.research.google.com/drive/18rp9-NWgS8GaCymg7_8LxXgWyPWud68P?usp=sharing). Students will access the Google Colab and complete the guided coding activity.
  3. Summative Assessment (10 min): If time permits, ask the students to create a rapid digital artifact (e.g. flipgrid video, google slides) to present to the class the results of their experiments with their machine learning model.
  4. Gallery Walk/Wrap Up (5 min): Allow students to walk around the room and observe other student’s digital artifacts. Discuss as a class whether machine learning can reliably be used to predict whether a patient has diabetes.

A bar chart showing two columns. One taller blue column representing positive diabetes patients and one shorter orange column representing negative diabetes patients. The graph shows there are more patients with diabetes in the dataset.
Visualize your data and show how many patients have diabetes
Copyright © 2021 Daniel Angel, Rice University RET (adapted from 2021 D-Angel, CC BY-SA-3.0)

A bar chart showing the number of patients per age category. Each column contains an orange section with the number of negative diabetes patients and a blue section showing positive diabetes patients in the dataset. Most patients are in the 30-70 age group.
Using Seaborn in Google Colab to visualize your data is easy!
Copyright © 2021 Daniel Angel, Rice University RET (adapted from 2021 D-Angel, CC BY-SA-3.0)

A green rectangle asking if the patient is obese. Arrows show a yes response leads to a positive diabetes diagnosis and a no response leads to a negative diabetes diagnosis.
A simple decision tree
Copyright © 2021 Daniel Angel, Rice University RET (adapted from 2021 D-Angel, CC BY-SA-3.0)


classification: The process of identifying set of categories based on an observation (or observations) and then organizing those categories to place observations where they belong.

dataset: A collection of related sets of information composed of separate elements but can be manipulated by a computer.

decision trees: A decision support tool that uses a model with branches (tree-like), organized by their possible outcomes; a popular supervised learning method used for classification problems in machine learning.

diabetes: A disease affecting a person’s blood glucose or blood sugar; levels of blood sugar that are too high cause serious health problems over time.

features: An individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern recognition, classification, and regression.

machine learning: The design of computers or computing systems that can discover how to can perform tasks without being explicitly programmed to do so.


Pre-Activity Assessment

Students will be assigned readings to learn about Type 1 and Type 2 Diabetes and Machine Learning. To create a model that determines whether a person has diabetes, students will need to know about the disease, the symptoms, the risks, and complications.  They will need to understand what machine learning is to create their model. Students will be provided with a checklist to complete before taking the Pre-Activity Reading Assessment.  Example questions :

  • What are the causes of type 1 and 2 Diabetes?
  • List 2 risk factors for type 1 and type 2 Diabetes.
  • Define machine learning and list some examples of how it can be used.

Activity Embedded (Formative) Assessment

The students will run code that creates histograms using the provided dataset using the popular data visualization library called Seaborn. For example, students could write a code to print a histogram that shows the age of the patients that have diabetes and answer the following question. What observations can you make about the patient's age data?

Post-Activity (Summative) Assessment

The students will run their experiments using different features to test from the dataset. For example: in the original model we have used the patient’s age and weight loss, the students can change the features and the attributes.

Run your experiments!

Try using different features to see if you can improve the performance of your model.

Investigating Questions

What symptoms are highly correlated to diabetes?

Can we use machine learning to predict whether a patient has diabetes?

How can machine learning affect the health outcomes of people who live in underserved communities?

Troubleshooting Tips

Recommend the teacher take on the lead learner role. It is highly recommended you familiarize yourself with the Google Colab environment by completing the guided coding activity before the lesson. Consider watching the Get Started with Colab YouTube Video and going over the Overview of Colab Features.

Activity Scaling

  • For English language learners, chunk and scaffold the readings. Introduce vocabulary before reading. Consider selecting excerpts from the articles and modifying language to reduce the reading load.
  • For more advanced students, allow them the opportunity experiment with the code and create different data visualizations using the Seaborn library or try different supervised learning models from SciKit learn.

Additional Multimedia Support

Overview of Colab Features (https://colab.research.google.com/notebooks/basic_features_overview.ipynb)

Get Started with Colab Video (https://www.youtube.com/watch?v=inN8seMm7UI)


Get the inside scoop on all things TeachEngineering such as new site features, curriculum updates, video releases, and more by signing up for our newsletter!
PS: We do not share personal information or emails with anyone.


NWPC Blog. NW Primary Care 2020. What you need to know about type 1 and type 2 diabetes


Pant, Ayush. Introduction to Machine Learning for Beginners.  January 7, 2019. Towards Data Science


Mayo Clinic. 1998-2021 Mayo Foundation for Medical Education and Research (MFMER)

Type 1 Diabetes - Symptoms and causes - Mayo Clinic; Type 2 Diabetes - Symptoms and causes - Mayo Clinic


© 2022 by Regents of the University of Colorado; original © 2021 Rice University


Daniel Angel; Sheryl Adams; Amira Wallace

Supporting Program

Precise Advanced Technology and Health Systems for Underserved Populations (PATHS-UP) Research Experience for Teachers, Rice Office of STEM Engagement and Department of Electrical and Computer Engineering, Rice University


This activity was developed as part of the Research Experience for Teachers through the Office of STEM Engagement and the Department of Electrical and Computer Engineering at Rice University supported by the National Science Foundation under grant no. NSF EEC – 1648451. Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or Rice University.

Special acknowledgments to Carolyn Nichol, Allen Antoine with the Rice University Office of STEM Engagement, and Tianyi Zhang with the Computational Imaging Lab at Rice University for his help and support in developing this activity.

Last modified: June 10, 2022

Free K-12 standards-aligned STEM curriculum for educators everywhere.
Find more at TeachEngineering.org