Module qsarmodelingpy.filter
Implements filtering routines for dimensionality reduction.
Functions
def autocorrelation_cut(X: pandas.core.frame.DataFrame, y, cut)-
Perform the matrix filtering based on autocorrelation.
Args
X:DataFrame- The matrix to be filtered
y:DataFrame- The vector of dependent variable
cut:float- The filtering amount. If
cut == 1, no cutting will be performed; ifcut == 0, an empty list will be returned.
Returns
list[int]- A list of all selected indexes.
Example of use:
Let
Xandybe the dataset (matrix and vector). Then to perform an autocorrelation cut of 50%:filtered_columns = autocorrelation_cut(X, y, 0.5) filtered_X = X.loc[:,filtered_columns] def autoscale1(X)def correlation_cut(X, y, cut)-
Select only columns of
Xwith a minimum correlationcutwith the dependent variabley.Args
X:DataFrame- The dataframe to be cutted
y:DataFrame- The vector of dependent variable
cut:float- The minimum correlation allowed
Returns
list[int]- A list of all selected indexes.
def filter_matrix(X: pandas.core.frame.DataFrame, y, lj_transform: bool = False, var_cut: float = 0, corr_cut: float = 0, auto_corrcut: float = 1) ‑> pandas.core.frame.DataFrame-
Perform Lennard-Jones data transformation, variance cut, correlation cut and autocorrelation cut.
While others methods in this class returns a list of indexes,
filter_matrix()returns the filtered matrix (pandas.DataFrame).Args
X:DataFrame- The matrix to be filtered.
y:DataFrame- The dependent variable
lj_transform:bool, optional- Wether or not to perform Lennard-Jones Data Tranformation (see
qsarmodelingpy.lj_cut). Defaults to False. var_cut:float, optional- See
variance_cut(). Defaults to 0. corr_cut:float, optional- See
correlation_cut(). Defaults to 0. auto_corrcut:float, optional- See
autocorrelation_cut(). Defaults to 1.
Returns
DataFrame- The filtered matrix.
def variance_cut(X, cut)-
Select only columns of
Xwith a minimum variancecut.Args
X:DataFrame- The dataframe to be cutted
cut:float- The minimum variance allowed
Returns
list[int]- A list of all selected indexes.