fasterrisk

This repository contains source code to our NeurIPS 2022 paper:

FasterRisk: Fast and Accurate Interpretable Risk Scores

Documentation: https://fasterrisk.readthedocs.io
GitHub: https://github.com/jiachangliu/FasterRisk
PyPI: https://pypi.org/project/fasterrisk/
Free and open source software: BSD license

Table of Content

Introduction
Installation
Python Usage
R tutorial
License
Contributing

Introduction

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Below is a risk score example created on the 3rd fold of the adult dataset by FasterRisk, predicting salary> 50K.


1. No High School Diploma	-4 points	...
2. High School Diploma	-2 points	+ ...
3. Age 22 to 29	-2 points	+ ...
4. Any Capital Gains	3 points	+ ...
5. Married	4 points	+ ...
	SCORE	=


SCORE	-8	-6	-5	-4	-3	-2	-1
RISK	0.1%	0.4%	0.7%	1.2%	2.3%	4.2%	7.6%
SCORE	0	1	2	3	4	5	7
RISK	13.3%	22.3%	34.9%	50.0%	65.1%	77.7%	92.4%

Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow.

We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a "star ray" search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.

Installation

conda create -n FasterRisk python=3.9 # create a virtual environment
conda activate FasterRisk # activate the virtual environment
python -m pip install fasterrisk # pip install the fasterrisk package

Python Usage

Please see the example.ipynb jupyter notebook on GitHub or Example Usage on Read the Docs for a detailed tutorial on how to use FasterRisk in a python environment. The detailed descriptions of key functions can be found in the API Reference on Read the Docs.

There are two major two classes for the users to interact with:

RiskScoreOptimizer

sparsity = 5 # produce a risk score model with 5 nonzero coefficients 

# import data
X_train, y_train = ...

# initialize a risk score optimizer
m = RiskScoreOptimizer(X = X_train, y = y_train, k = sparsity)

# perform optimization
m.optimize()

# get all top m solutions from the final diverse pool
arr_multiplier, arr_intercept, arr_coefficients = m.get_models() # get m solutions from the diverse pool; Specifically, arr_multiplier.shape=(m, ), arr_intercept.shape=(m, ), arr_coefficients.shape=(m, p)

# get the first solution from the final diverse pool by passing an optional model_index; models are ranked in order of increasing logistic loss
multiplier, intercept, coefficients = m.get_models(model_index = 0) # get the first solution (smallest logistic loss) from the diverse pool; Specifically, multiplier.shape=(1, ), intercept.shape=(1, ), coefficients.shape=(p, )

RiskScoreClassifier

# import data
X_featureNames = ... # X_featureNames is a list of strings, each of which is the feature name

# create a classifier
clf = RiskScoreClassifier(multiplier = multiplier, intercept = intercept, coefficients = coefficients, featureNames = featureNames)

# get the predicted label
y_pred = clf.predict(X = X_train)

# get the probability of predicting y[i] with label +1
y_pred_prob = clf.predict_prob(X = X_train)

# compute the logistic loss
logisticLoss_train = clf.compute_logisticLoss(X = X_train, y = y_train)

# get accuracy and area under the ROC curve (AUC)
acc_train, auc_train = clf.get_acc_and_auc(X = X_train, y = y_train) 

# print the risk score model card
m.print_model_card()

R tutorial

FasterRisk can also be easily used inside R. See the R tutorial on how to apply FasterRisk on an example dataset.

License

fasterrisk was created by Jiachang Liu. It is licensed under the terms of the BSD 3-Clause license.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Credits

fasterrisk was created with cookiecutter and the py-pkgs-cookiecutter template.

Citing Our Work

If you find our work useful in your research, please consider citing the following paper:

@article{liu2022fasterrisk,
  title={FasterRisk: Fast and Accurate Interpretable Risk Scores},
  author={Liu, Jiachang and Zhong, Chudi and Li, Boxuan and Seltzer, Margo and Rudin, Cynthia},
  booktitle={Proceedings of Neural Information Processing Systems},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
R_tutorial		R_tutorial
docs		docs
src/fasterrisk		src/fasterrisk
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
conda_env_FasterRisk.yml		conda_env_FasterRisk.yml
jupyterNotebook.sh		jupyterNotebook.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fasterrisk

Table of Content

Introduction

Installation

Python Usage

R tutorial

License

Contributing

Credits

Citing Our Work

About

Releases

Packages

Contributors 2

Languages

License

interpretml/FasterRisk

Folders and files

Latest commit

History

Repository files navigation

fasterrisk

Table of Content

Introduction

Installation

Python Usage

R tutorial

License

Contributing

Credits

Citing Our Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages