======================= Link Propagation.R ======================= "LinkPropagation.R" contains functions including: - LP(...): main routine of Link Propagation algorithm - sampleCV(): sample script using LP() for model evaluation by cross validation - sample(): sample script using LP() for actual prediction and several utility funcitons. ======================= LP(...) function ======================= LP(...) is the main routine of Link Propagation algorithm. The arguments and outputs of this function are as follows. Arguments: - networkFileNameList:= list of filenames for adjacency matrices of biological network of multiple species. Its [[k]]-th element is the filename of the adjacency matrix for the k-th species. The adjacency matrix is space-delimited and its (m,n)-th element is 1 if the m-th and n-th proteins has a link, and is -1 if they does not have a link, and is 0 if unknown link. - similarityFileNameList:= list of filenames for similarity matrices among proteins. Its [[k]]-th element is the filename of the similarity matrix between the i-th species and the j-the species where k=(i-1)*number_of_species + j and i<=j. (See sample() or sampleCV() for a concrete example.) The (m,n)-th element of the similarity matrix is the non-negative similarity value between the m-th protein of the i-th species and the n-th protein of the j-th species. - proportionOfTrainingData:= this proportion of the known parts network are sampled to be used as training data, and the rest are used as test data for evaluation. If it is 1, the whole data are used for training to fill in the 0 entries of the adjacency matrices. - fold:= seed for generating random numbers for choosing training data. - sigma:= regularization parameter of Link Propagation (default value:=0.001) - epsilon:= convergence parameter of the conjugate gradient algorithm (default value:=0.00001) - simultaneous:= If it is TRUE, LP(...) performs simultaneous prediction of the multiple networks. If false, LP(...) performs individual prediction of each of the networks. Outputs: If proportionOfTrainingData < 1, the output is a list including - LP(...)[[1]] := total test AUC value - LP(...)[[2]] := a vector of test AUC values - LP(...)[[3]] := execution time (used for training+prediction) If proportionOfTrainingData == 1, the output is a list of matrices, each of which is a matrix of predicted link strength for a species. ======================= sample codes ======================= - sampleCV() and sample() give examples of using LP(). - sampleCV() performs training with 75% (proportionOfTrainingData <- 0.75) of the whole data, and calculate the test AUC for the rest. sampleCV() continues this five times (numberOfFolds <- 5) and returns the mean and standard deviation of the obtained AUCs. - sample() uses all data (proportionOfTrainingData <- 1) for predicting the links with 0 values in the adjacency matrix.sampleCV returns a list of matrices with predicted link strength.