*** COMMENTS ARE BETWEEN THREE STARS *** ________________________________________________________________________________ *** PART OF THE ALGORITHM AND BASIS SETS ARE YET TO PUBLISHED. SOME WERE PRESENTED AT THE BIOPHYSICAL SOCIETY MEETING, BALTIMORE, 1999 If YOU ENCOUNTER A BUG OR A PROBLEM OR IF YOU HAVE SUGGESTIONS TO IMPROVE THE METHOD OR PROGRAM PLEASE SEND A e-mail TO sreeram@lamar.colostate.edu *** THE FRACTIONS OF SECONDARY STRUCTURES FROM THE SELF-CONSISTENT METHOD Ref: Sreerama and Woody, Anal. Biochem. (1993), 209, 32 Sreerama and Woody, Biochemistry, 33, 10022-25, (1994), Sreerama et al. Protein. Science, 8, 370-380, (1999), Johnson W.C. Jr., XXXXXXXXXXXXX XX, 1XXXXXXX, (1999), Reference PROTEIN Set Selected: CDDATA.29; SSDATA.29 Structures: Helix1, Helix2, Strand1, Strand2, Turns and Unordered ________________________________________________________________________________ *** THIS SECTION DEFINES THE INPUT PARAMETERS *** SAMPLE INPUT: Lactate Dehydrogenase CD DATA (178-260 nm) Beginning wavelength: 260.00; Ending wavelength: 178.00 Total No of CD points 83 Secondary structural elements= 6 Multipilcation FACTOR for CD spectrum= 1.000 SAMPLE CD: FILE from TEST 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.03 -0.07 -0.12 -0.18 -0.25 -0.33 -0.43 -0.54 -0.67 -0.82 -0.99 -1.19 -1.41 -1.65 -1.92 -2.20 -2.49 -2.79 -3.11 -3.43 -3.71 -3.93 -4.10 -4.22 -4.29 -4.32 -4.31 -4.28 -4.22 -4.15 -4.06 -3.97 -3.87 -3.78 -3.71 -3.67 -3.64 -3.60 -3.53 -3.36 -3.05 -2.60 -2.04 -1.40 -0.68 0.22 1.38 2.68 4.00 5.32 6.63 7.74 8.46 8.91 9.20 9.27 9.02 8.45 7.60 6.63 5.69 4.80 3.95 3.15 2.40 1.71 1.11 0.58 0.12 -0.30 Number of Proteins in the database= 29 The SAMPLE data from: TEST Ordered DELTA(CD) values : 0.0000 0.6577 0.7873 1.2756 1.4219 1.6179 1.6343 1.6980 1.7051 1.7881 1.8070 2.0072 2.0468 2.3761 2.4768 2.5857 2.7137 2.7612 2.9066 3.0717 3.1368 3.3616 3.3675 3.6036 4.1668 4.2391 4.2979 4.3736 4.6301 Ordered List of PROTEINS : TEST THML TPI SUBN ECOR LYSM PGK CYTC T4LS SUBB GPD FLVD HMRT AZU BLAC RNAS HBN PAPN PRAL CONA PPSN SUDS TNF CHYT ELAS BNJN GCR MGLB BTOX IGUESS = 0; The structure of the Protein with closest CD spectrum: THML Initial Guess: 0.282 0.133 0.070 0.095 0.215 0.206 ________________________________________________________________________________ *** This corresponds to original Hennessey and Johnson Method; all reference proteins and 5 SVD vectors. The helix content is used in the THIRD STAGE. No other use is made of this result. *** Solution Corresponding to All proteins in the Basis and 5 SVD vectors This corresponds to Hennessey and Johnson Method Str1 Str2 Str2 Str4 Str5 Str6 Sum .227 .144 .169 .105 .221 .275 1.140 Helix From H & J Method: .370 ________________________________________________________________________________ *** THIS SECTION PERFORMS SELCON METHOD - A SLIGHTLY MODIFIED VERSION OF THE ORIGINAL METHOD Sreerama & Wody 1993 - *** FIRST STAGE: Solutions With FIRST TWO Selection Rules: This is for the First 19 Iterations only Constraints relaxed to get a MIN # of Solns ITER: 1; AVE OF 65 SOLN: 0.263 0.157 0.077 0.069 0.198 0.246 SUM of SECNDRY STRUCTURE: 1.010 RMSD with Previous Guess: 0.0242 ITER: 2; AVE OF 63 SOLN: 0.262 0.160 0.077 0.066 0.196 0.250 SUM of SECNDRY STRUCTURE: 1.010 RMSD with Previous Guess: 0.0028 SOLN. CONVERGED: 2 ITERATIONS MinSol: 1 SUM < 0.050 Solution at the END of FIRST STAGE: STR1 STR2 STR3 STR4 STR5 STR6 SUM 0.262 0.160 0.077 0.066 0.196 0.250 1.010 First Part Completed. The results roughly correspond to SELCON and SELCON1 with only 2 selection rules ________________________________________________________________________________ *** THIS IS AN ADDITION TO THE ORIGINAL SELCON METHOD. THE CD SPECTRUM CORRESPONDING TO THE SOLUTION IS CALCULATED ALONG WITH THE RECONSTRUCTED CD SPECTRUM (Corresponding to the Bas and NS values) AND THE RMS DIFFERENCES BETWEEN THESE AND THE SAMPLE CD SPECTRUM ARE USED IN SELECTING THE SOLUTION. THIS FORMS THE THIRD SELECTION RULE. THE RMS-CD (Minimum of either of these) less than 0.25 *** *** ALSO THE LATEST DEVELOPMENT IN THE CD ANALYSIS -- ESTIMATION OF NUMBER OF SEGMENTS IS IMPLEMENTED. THE DETAILS WILL BE PUBLISHED IN PROTEIN SCIENCE (ACCEPTED FOR PUBLICATION) 1999. FOR YOUR PROTEIN MULTIPLY THE NUMBER GIVEN FOR 100 RESIDUES BY THE FACTOR Number of RESIDUES in YOUR PROTEIN / 100 ; AVERAGE LENGTH IS DETERMINED FOR YOUR PROTEIN BY THE PROGRAM *** *** I - Index; NSol - Number of Solution; Bas - Number of Proteins used in the Solution NS - Number of SVD vectors Used in Solution Str1 to Str6 - Fractions of 6 secondary structures Sum - Sum of Secondary structures RmsRCN - Rms between sample CD and reconstructed CD RmsEXP - Rms between sample CD and CALCULATED CD *** SECOND STAGE: SOLUTIONS THAT SATISLY THREE SELECTION RULES: I NSol Bas NS Str1 Str2 Str2 Str4 Str5 Str6 Sum RmsRCN RmsEXP 1 1 7 1 0.223 0.159 0.067 0.072 0.220 0.278 1.019 0.405 0.012 2 3 7 3 0.228 0.147 0.084 0.076 0.185 0.274 0.993 0.345 0.181 3 7 8 3 0.227 0.139 0.092 0.085 0.180 0.250 0.973 0.345 0.166 4 8 8 4 0.238 0.149 0.091 0.091 0.207 0.240 1.017 0.308 0.201 5 9 8 5 0.207 0.132 0.091 0.090 0.227 0.266 1.012 0.127 0.408 6 10 9 1 0.247 0.169 0.057 0.059 0.196 0.251 0.979 0.405 0.071 7 11 9 2 0.252 0.168 0.058 0.058 0.184 0.238 0.959 0.355 0.163 8 13 9 4 0.238 0.144 0.088 0.091 0.208 0.230 0.999 0.308 0.144 9 14 9 5 0.235 0.142 0.089 0.091 0.208 0.235 0.999 0.127 0.159 10 15 9 6 0.213 0.141 0.084 0.083 0.220 0.249 0.990 0.097 0.558 11 16 10 1 0.249 0.172 0.061 0.062 0.203 0.260 1.007 0.405 0.077 12 17 10 2 0.254 0.170 0.061 0.060 0.189 0.245 0.979 0.355 0.182 13 23 11 1 0.251 0.172 0.066 0.065 0.210 0.270 1.033 0.405 0.079 14 24 11 2 0.256 0.170 0.065 0.062 0.192 0.250 0.994 0.355 0.195 15 32 12 2 0.261 0.171 0.069 0.067 0.202 0.255 1.026 0.355 0.178 16 39 13 1 0.268 0.166 0.057 0.056 0.192 0.250 0.990 0.405 0.058 17 40 13 2 0.272 0.164 0.055 0.054 0.181 0.239 0.966 0.355 0.158 18 41 13 3 0.272 0.169 0.062 0.061 0.203 0.257 1.024 0.345 0.174 19 47 14 1 0.266 0.166 0.062 0.060 0.201 0.258 1.012 0.405 0.101 20 48 14 2 0.268 0.164 0.063 0.060 0.197 0.254 1.006 0.355 0.174 21 55 15 1 0.264 0.167 0.070 0.063 0.205 0.262 1.031 0.405 0.119 22 56 15 2 0.267 0.165 0.069 0.062 0.200 0.256 1.017 0.355 0.177 23 64 16 2 0.267 0.164 0.070 0.062 0.198 0.254 1.016 0.355 0.200 24 67 16 5 0.235 0.152 0.086 0.080 0.215 0.269 1.038 0.127 0.414 25 68 16 6 0.241 0.156 0.082 0.078 0.216 0.263 1.035 0.097 0.412 26 71 17 1 0.275 0.162 0.062 0.053 0.182 0.232 0.966 0.405 0.108 27 73 17 3 0.274 0.162 0.068 0.058 0.193 0.239 0.995 0.345 0.160 28 79 18 1 0.275 0.163 0.064 0.055 0.185 0.240 0.982 0.405 0.137 29 81 18 3 0.274 0.161 0.069 0.059 0.194 0.239 0.996 0.345 0.219 30 84 18 6 0.239 0.159 0.082 0.079 0.220 0.266 1.046 0.097 0.543 31 87 19 1 0.275 0.163 0.069 0.057 0.187 0.245 0.996 0.405 0.136 32 88 19 2 0.276 0.159 0.065 0.054 0.179 0.231 0.964 0.355 0.209 33 89 19 3 0.273 0.161 0.080 0.063 0.197 0.249 1.023 0.345 0.221 34 95 20 1 0.274 0.163 0.074 0.059 0.190 0.248 1.008 0.405 0.140 35 96 20 2 0.273 0.160 0.077 0.059 0.188 0.241 0.999 0.355 0.201 36 97 20 3 0.271 0.161 0.087 0.065 0.199 0.253 1.036 0.345 0.208 37 103 21 1 0.274 0.165 0.077 0.061 0.192 0.251 1.019 0.405 0.149 38 104 21 2 0.273 0.162 0.081 0.062 0.190 0.244 1.011 0.355 0.201 39 105 21 3 0.272 0.163 0.086 0.065 0.196 0.251 1.033 0.345 0.210 40 111 22 1 0.273 0.164 0.080 0.062 0.195 0.254 1.029 0.405 0.149 41 112 22 2 0.274 0.161 0.079 0.060 0.188 0.242 1.004 0.355 0.208 42 113 22 3 0.272 0.162 0.085 0.064 0.194 0.249 1.027 0.345 0.218 43 119 23 1 0.273 0.164 0.083 0.064 0.197 0.257 1.037 0.405 0.150 44 120 23 2 0.273 0.161 0.084 0.063 0.192 0.248 1.021 0.355 0.204 45 121 23 3 0.272 0.162 0.090 0.066 0.198 0.254 1.041 0.345 0.216 46 127 24 1 0.273 0.165 0.084 0.064 0.198 0.260 1.044 0.405 0.165 47 128 24 2 0.273 0.161 0.080 0.061 0.189 0.244 1.009 0.355 0.238 48 129 24 3 0.271 0.162 0.089 0.066 0.197 0.253 1.038 0.345 0.250 49 135 25 1 0.273 0.165 0.084 0.064 0.198 0.259 1.043 0.405 0.180 50 143 26 1 0.273 0.164 0.082 0.063 0.197 0.258 1.037 0.405 0.187 51 151 27 1 0.273 0.164 0.083 0.063 0.197 0.258 1.038 0.405 0.189 52 152 27 2 0.272 0.163 0.087 0.065 0.194 0.254 1.036 0.355 0.249 53 161 28 3 0.271 0.156 0.097 0.068 0.185 0.247 1.023 0.345 0.232 54 169 29 3 0.272 0.155 0.090 0.064 0.179 0.241 1.002 0.345 0.234 TotSOL > 1 Limits: ABS(Sum-1.0) < 0.050; Each Fraction > -0.030 RmsCD(Exp,Cal) < 0.250 SOLUTION AT SECOND STAGE (SELCON2): Average Solution From 54 Solutions: STR1 STR2 STR3 STR4 STR5 STR6 SUM 0.261 0.161 0.076 0.066 0.197 0.251 1.011 Second Part Completed. The results roughly correspond SELCON2 with three selection rules +--------------------------------------------------+ | Based on the SOLUTION the Number of SEGMENTS of | | HELICES (Per 100 Residues) are: 4.014 | | STRANDS (Per 100 Residues) are: 3.291 | +--------------------------------------------------+ For YOUR PROTEIN multiply No. of SEGMENTS by (Number of Residues) / 100 e.g., If Number of Residues = 153, Use a FACTOR of 1.53 Average Length of Segments Remains as Estimated +--------------------------------------------------+ | Based on the SOLUTION Obtained | | The AVERAGE LENGTH of HELICES : 10.498 | | The AVERAGE LENGTH of STRANDS : 4.317 | +--------------------------------------------------+ ________________________________________________________________________________ *** The CALCULATED CD Spectrum at the end of SECOND STAGE (Average of all selected solutions) compared with SAMPLE CD spectrum *** COMPARISON OF EXPT. AND CALC. CD SPECTRA (Ave. of 54 Solns) OF: TEST LEGEND: **** Calc. CD; oooo Expt. CD; .... Diff. CD 9.+-------ooo------------------------------------------------+ | o **o | | o** * | | o o | | o o | | o | | * o | |*oo o | *o * | o ...... o | +--...------...............................................+ ... o *oooo | | oo ooo | | *o oo | | *ooooooooo **ooo | | *oooo | | | | | | | | | -9.+-----+-----+-----+-----+-----+-----+-----+-----+-----+----+ 178. Wavelength at Intervals of: 8.2 nm 260. ________________________________________________________________________________ *** THIS IS AN ADDITION TO THE SELCON2 METHOD. The solutions obtained at the SECOND STAGE are further screened using a NEW SELECTION RULE developed by W.C.Johnson (Proteins, 1999), which utilizes the helix content from Hennessey and Johnson Analysis *** *** ALSO THE LATEST DEVELOPMENT IN THE CD ANALYSIS -- ESTIMATION OF NUMBER OF SEGMENTS IS IMPLEMENTED. THE DETAILS WILL BE PUBLISHED IN PROTEIN SCIENCE (ACCEPTED FOR PUBLICATION) 1999. FOR YOUR PROTEIN MULTIPLY THE NUMBER GIVEN FOR 100 RESIDUES BY THE FACTOR Number of RESIDUES in YOUR PROTEIN / 100 ; AVERAGE LENGTH IS DETERMINED FOR YOUR PROTEIN BY THE PROGRAM *** *** I - Index; NSol - Number of Solution; Bas - Number of Proteins used in the Solution NS - Number of SVD vectors Used in Solution Str1 to Str6 - Fractions of 6 secondary structures Sum - Sum of Secondary structures RmsRCN - Rms between sample CD and reconstructed CD RmsEXP - Rms between sample CD and CALCULATED CD *** *** THIRD STAGE:*** Now the FOURTH SELECTION RULE is applied to the Solutions That SATISFY the first THREE rules. The HELIX FRACTION should be with in HLIMITS Hmin .339 Hmax .441 HelHJ .370 HELIX: .406 HLIMITS: .376 ---> .436 1 1 7 1 .223 .159 .067 .072 .220 .278 1.019 .012 2 4 8 4 .238 .149 .091 .091 .207 .240 1.017 .201 3 6 9 1 .247 .169 .057 .059 .196 .251 .979 .071 4 7 9 2 .252 .168 .058 .058 .184 .238 .959 .163 5 8 9 4 .238 .144 .088 .091 .208 .230 .999 .144 6 9 9 5 .235 .142 .089 .091 .208 .235 .999 .127 7 11 10 1 .249 .172 .061 .062 .203 .260 1.007 .077 8 12 10 2 .254 .170 .061 .060 .189 .245 .979 .182 9 13 11 1 .251 .172 .066 .065 .210 .270 1.033 .079 10 14 11 2 .256 .170 .065 .062 .192 .250 .994 .195 11 15 12 2 .261 .171 .069 .067 .202 .255 1.026 .178 12 16 13 1 .268 .166 .057 .056 .192 .250 .990 .058 13 19 14 1 .266 .166 .062 .060 .201 .258 1.012 .101 14 20 14 2 .268 .164 .063 .060 .197 .254 1.006 .174 15 21 15 1 .264 .167 .070 .063 .205 .262 1.031 .119 16 22 15 2 .267 .165 .069 .062 .200 .256 1.017 .177 17 23 16 2 .267 .164 .070 .062 .198 .254 1.016 .200 18 24 16 5 .235 .152 .086 .080 .215 .269 1.038 .127 19 25 16 6 .241 .156 .082 .078 .216 .263 1.035 .097 20 30 18 6 .239 .159 .082 .079 .220 .266 1.046 .097 21 32 19 2 .276 .159 .065 .054 .179 .231 .964 .209 22 33 19 3 .273 .161 .080 .063 .197 .249 1.023 .221 23 35 20 2 .273 .160 .077 .059 .188 .241 .999 .201 24 36 20 3 .271 .161 .087 .065 .199 .253 1.036 .208 25 38 21 2 .273 .162 .081 .062 .190 .244 1.011 .201 26 39 21 3 .272 .163 .086 .065 .196 .251 1.033 .210 27 41 22 2 .274 .161 .079 .060 .188 .242 1.004 .208 28 42 22 3 .272 .162 .085 .064 .194 .249 1.027 .218 29 44 23 2 .273 .161 .084 .063 .192 .248 1.021 .204 30 45 23 3 .272 .162 .090 .066 .198 .254 1.041 .216 31 47 24 2 .273 .161 .080 .061 .189 .244 1.009 .238 32 48 24 3 .271 .162 .089 .066 .197 .253 1.038 .250 33 52 27 2 .272 .163 .087 .065 .194 .254 1.036 .249 34 53 28 3 .271 .156 .097 .068 .185 .247 1.023 .232 35 54 29 3 .272 .155 .090 .064 .179 .241 1.002 .234 Final Solution: Aver. of 35 Solns PROT Str1 Str2 Str3 Str4 Str5 Str6 Sum RmsCD TEST .260 .162 .076 .066 .198 .251 1.013 .386 +--------------------------------------------------+ | Based on the SOLUTION the Number of SEGMENTS of | | HELICES (Per 100 Residues) are: 4.038 | | STRANDS (Per 100 Residues) are: 3.317 | +--------------------------------------------------+ For YOUR PROTEIN multiply No. of SEGMENTS by (Number of Residues) / 100 e.g., If Number of Residues = 153, Use a FACTOR of 1.53 Average Length of Segments Remains as Estimated +--------------------------------------------------+ | Based on the SOLUTION Obtained | | The AVERAGE LENGTH of HELICES : 10.445 | | The AVERAGE LENGTH of STRANDS : 4.301 | +--------------------------------------------------+ ________________________________________________________________________________ *** The FINAL CALCULATED CD Spectrum (Average of all selected solutions) compared with SAMPLE CD spectrum *** COMPARISON OF EXPT. AND CALC. CD SPECTRA (Average of 35 Solutions) OF: TEST LEGEND: **** Calc. CD; oooo Expt. CD; .... Diff. CD 9.+-------ooo------------------------------------------------+ | o **o | | o** o | | o o | | o o | | o | | * o | |*oo o | *o * | o ...... o | +--...------...............................................+ ... o *oooo | | oo ooo | | *o oo | | *ooooooooo **ooo | | *oooo | | | | | | | | | -9.+-----+-----+-----+-----+-----+-----+-----+-----+-----+----+ 178. Wavelength at Intervals of: 8.2 nm 260. ________________________________________________________________________________ Final Note: It can be seen that the number of valid solutions decrease from 63 to 54 to 35 from first to second to third stage. *** The digital values of the CD data -- WaveL Expt_CD Calc_CD -- are written to the file CDOUT, to enable importing into a plotting routine. This was suggested by Dr. Don Gray and Dr. N. Greenfield. ***