Skip to content

Application to implement the Joint mutual information maximisation (JMIM) algorithm by CLI given a path to a CSV file.

License

Notifications You must be signed in to change notification settings

DeraUchenwoke/jmim

Repository files navigation

Joint Mutual Information Maximimisation (JMIM) CLI Application

A bit of context

This repository follows after my final year project. The aim of the project was to reduce the memory consumption of a LightGBM model used for malware detection. To accomplish this, I used the Joint Mutual Information Maximisation (JMIM) algorithm to select the $K$ most 'important' variables to train the model on. Given there was no public JMIM pre-built package at the time, the algorithm was written in Python from scratch.

The Problem

My original implementation has the following issues:

  1. Not language agnostic i.e., only suitable for Python programs.
  2. Adopts a slow, naive approach to handle discrete and continuous variables.

The Solution

The purpose of this repository is to use JMIM to create a tool which selects the $K$ most 'important' variables via CLI given a CSV file of discrete/continuous random variables.

A dash of mathematics

For those interested to know how the algorithm above was derived go to section 3.4.1 of my final year project paper.

Installation

For Windows users:

  1. Clone the repository: git clone https://github.com/DeraUchenwoke/jmim.git
  2. In PowerShell terminal run: cd scripts followed by .\setup.ps1.

Usage

Examples

Style guide & practices

Google style guide. Powershell practice and style.

The tool was written in line with CPP core guidelines.

About

Application to implement the Joint mutual information maximisation (JMIM) algorithm by CLI given a path to a CSV file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published