Skip to contents

SVM with variable selection (clone selection) using L1-norm penalty. ( a fast Newton algorithm NLPSVM from Fung and Mangasarian )

Usage

lpsvm(A, d, k = 5, nu = 0, output = 1, delta = 10^-3, epsi = 10^-4, 
seed = 123, maxIter=700)

Arguments

A

n-by-d data matrix to train (n chips/patients, d clones/genes).

d

vector of class labels -1 or 1's (for n chips/patiens ).

k

k-fold for cv, default k=5.

nu

weighted parameter, 1 - easy estimation, 0 - hard estimation, any other value - used as nu by the algorithm. Default : 0.

output

0 - no output, 1 - produce output, default is 0.

delta

some small value, default: \(10^-3\).

epsi

tuning parameter.

seed

seed.

maxIter

maximal iterations, default: 700.

Details

k: k-fold for cv, is a way to divide the data set into test and training set.
if k = 0: simply run the algorithm without any correctness calculation, this is the default.
if k = 1: run the algorithm and calculate correctness on the whole data set.
if k = any value less than the number of rows in the data set: divide up the data set into test and training using k-fold method.
if k = number of rows in the data set: use the 'leave one out' (loo) method

Value

a list of

w

coefficients of the hyperplane

b

intercept of the hyperplane

xind

the index of the selected features (genes) in the data matrix.

epsi

optimal tuning parameter epsilon

iter

number of iterations

k

k-fold for cv

trainCorr

for cv: average train correctness

testCorr

for cv: average test correctness

nu

weighted parameter

References

Fung, G. and Mangasarian, O. L. (2004). A feature selection newton method for support vector machine classification. Computational Optimization and Applications Journal 28(2) pp. 185-202.

Author

Natalia Becker

Note

Adapted from MATLAB code http://www.cs.wisc.edu/dmi/svm/lpsvm/

See also

Examples

set.seed(123)
train<-sim.data(n = 20, ng = 100, nsg = 10, corr=FALSE, seed=12)
print(str(train)) 
#> List of 3
#>  $ x   : num [1:100, 1:20] -0.64379 -0.00486 -0.08606 -0.2183 2.45035 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:100] "pos1" "pos2" "pos3" "pos4" ...
#>   .. ..$ : chr [1:20] "1" "2" "3" "4" ...
#>  $ y   : Named num [1:20] -1 1 -1 -1 -1 -1 -1 1 1 1 ...
#>   ..- attr(*, "names")= chr [1:20] "1" "2" "3" "4" ...
#>  $ seed: num 12
#> NULL
  
# train data  
model <- lpsvm(A=t(train$x), d=train$y, k=5, nu=0,output=0, delta=10^-3, epsi=0.001, seed=12)
print(model)
#> 
#> Bias =  178.3869
#> Selected Variables=  pos3 neg2 bal10 bal11 bal12 bal21 bal22 bal24 bal30 bal32 bal35 bal46 bal48 bal50 bal51 bal56 bal66 bal71 bal74 bal77 bal81
#> Coefficients:
#>         pos3       neg2      bal10      bal11      bal12      bal21      bal22 
#>  899.92693 -231.19038  241.69383  101.85423 -250.31708  308.99564   82.03101 
#>      bal24      bal30      bal32      bal35      bal46      bal48      bal50 
#> -571.25835  112.53478   16.61886  678.41629  302.98743 -337.51707   35.23294 
#>      bal51      bal56      bal66      bal71      bal74      bal77      bal81 
#>  643.03151   26.97797   -3.53579  326.13807   22.94525  296.81442  285.79581 
#>