Module qsarmodelingpy.filter
Implements filtering routines for dimensionality reduction.
Functions
def autocorrelation_cut(X: pandas.core.frame.DataFrame, y, cut)
-
Perform the matrix filtering based on autocorrelation.
Args
X
:DataFrame
- The matrix to be filtered
y
:DataFrame
- The vector of dependent variable
cut
:float
- The filtering amount. If
cut == 1
, no cutting will be performed; ifcut == 0
, an empty list will be returned.
Returns
list[int]
- A list of all selected indexes.
Example of use:
Let
X
andy
be the dataset (matrix and vector). Then to perform an autocorrelation cut of 50%:filtered_columns = autocorrelation_cut(X, y, 0.5) filtered_X = X.loc[:,filtered_columns]
def autoscale1(X)
def correlation_cut(X, y, cut)
-
Select only columns of
X
with a minimum correlationcut
with the dependent variabley
.Args
X
:DataFrame
- The dataframe to be cutted
y
:DataFrame
- The vector of dependent variable
cut
:float
- The minimum correlation allowed
Returns
list[int]
- A list of all selected indexes.
def filter_matrix(X: pandas.core.frame.DataFrame, y, lj_transform: bool = False, var_cut: float = 0, corr_cut: float = 0, auto_corrcut: float = 1) ‑> pandas.core.frame.DataFrame
-
Perform Lennard-Jones data transformation, variance cut, correlation cut and autocorrelation cut.
While others methods in this class returns a list of indexes,
filter_matrix()
returns the filtered matrix (pandas.DataFrame
).Args
X
:DataFrame
- The matrix to be filtered.
y
:DataFrame
- The dependent variable
lj_transform
:bool
, optional- Wether or not to perform Lennard-Jones Data Tranformation (see
qsarmodelingpy.lj_cut
). Defaults to False. var_cut
:float
, optional- See
variance_cut()
. Defaults to 0. corr_cut
:float
, optional- See
correlation_cut()
. Defaults to 0. auto_corrcut
:float
, optional- See
autocorrelation_cut()
. Defaults to 1.
Returns
DataFrame
- The filtered matrix.
def variance_cut(X, cut)
-
Select only columns of
X
with a minimum variancecut
.Args
X
:DataFrame
- The dataframe to be cutted
cut
:float
- The minimum variance allowed
Returns
list[int]
- A list of all selected indexes.