The demand for machine learning skills is increasing in the biomedical research community, as these approaches provide a useful framework for asking and answering questions of large, multi-dimensional datasets.
fredhutch.io maintains three short courses to introduce machine learning applications for biomedical research:
- Concepts in Machine Learning: does not require coding
- Intermediate Python: Machine Learning: (currently under revision) requires prior experience with Python
- Intermediate R: Machine Learning: (currently in development) requires prior experience with R
While the courses listed above provide a general introduction to machine learning, you may also find it useful to practice applying machine learning skills to biomedically relevant questions. The resources on this page help bridge the gap between introductory learning and application of the methods to research questions. The following resources apply machine learning to biomedical research data, and include both explanations of the code as it is introduced, as well as challenge questions to assess your knowledge. They include materials suitable for researchers at beginning, intermediate, and advanced levels of coding expertise:
If you are looking for additional datasets that would be appropriate for practicing machine learning skills, the following links include biomedically-relevant data that you may find useful.
Previously published data from Data Dryad:
- Glaucoma diagnosis
- Triple negative breast cancer imaging
- Immunohistochemical typing of adenocarcinomas
- Computer aided diagnosis in breast and prostate cancer
The following sites list multiple datasets, and/or have aggregated data from multiple sources:
- 15 Open Datasets for Healthcare from ODSC - Open Data Science
- 15 Open Healthcare and Medical Datasets for Machine Learning - Lionbridge
- OpenNeuro: A free and open platform for sharing MRI, MEG, EEG, iEEG, ECoG, and ASL data
- Medical Data for Machine Learning - Andrew Beam
- CT Colonography - ANCER Imaging Archive
- UC Irvine Machine Learning Repository