Module qsarmodelingpy.filter

Implements filtering routines for dimensionality reduction.

Functions

def autocorrelation_cut(X: pandas.core.frame.DataFrame, y, cut)

Perform the matrix filtering based on autocorrelation.

Args

X : DataFrame
The matrix to be filtered
y : DataFrame
The vector of dependent variable
cut : float
The filtering amount. If cut == 1, no cutting will be performed; if cut == 0, an empty list will be returned.

Returns

list[int]
A list of all selected indexes.

Example of use:

Let X and y be the dataset (matrix and vector). Then to perform an autocorrelation cut of 50%:

filtered_columns = autocorrelation_cut(X, y, 0.5)
filtered_X = X.loc[:,filtered_columns]
def autoscale1(X)
def correlation_cut(X, y, cut)

Select only columns of X with a minimum correlation cut with the dependent variable y.

Args

X : DataFrame
The dataframe to be cutted
y : DataFrame
The vector of dependent variable
cut : float
The minimum correlation allowed

Returns

list[int]
A list of all selected indexes.
def filter_matrix(X: pandas.core.frame.DataFrame, y, lj_transform: bool = False, var_cut: float = 0, corr_cut: float = 0, auto_corrcut: float = 1) ‑> pandas.core.frame.DataFrame

Perform Lennard-Jones data transformation, variance cut, correlation cut and autocorrelation cut.

While others methods in this class returns a list of indexes, filter_matrix() returns the filtered matrix (pandas.DataFrame).

Args

X : DataFrame
The matrix to be filtered.
y : DataFrame
The dependent variable
lj_transform : bool, optional
Wether or not to perform Lennard-Jones Data Tranformation (see qsarmodelingpy.lj_cut). Defaults to False.
var_cut : float, optional
See variance_cut(). Defaults to 0.
corr_cut : float, optional
See correlation_cut(). Defaults to 0.
auto_corrcut : float, optional
See autocorrelation_cut(). Defaults to 1.

Returns

DataFrame
The filtered matrix.
def variance_cut(X, cut)

Select only columns of X with a minimum variance cut.

Args

X : DataFrame
The dataframe to be cutted
cut : float
The minimum variance allowed

Returns

list[int]
A list of all selected indexes.