Sequence List:
Paste your sequence below (FASTA or other format). All possible constructs are generated, their pI, GRAVY, and MCSG Z-score are calculated and displayed in a heat map. One of the heat maps is normalized.
The calculated values (pI, GRAVY, Z-score) are listed below. The large positive values are indicative of crystallizability when using the MCSG pipeline protocol.
Expression vector (for tag inclusion):
Minimum Z-score       Maximum Z-score

Please click here for more details and links to associated utilities
Method Summary
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the proteinís propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the proteinís iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a proteinís propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data.
Additional Utilities

generate single-line sequences from MFASTA file (they can be pasted into Excel)

calculate pI/MW/GRAVY for single-line sequences (they can be pasted back into Excel)

generate 2D histogram from x:y values using various bin-sizes (the results can be visualized in the following utility)

display 2D histograms (2D histograms can be generated by the previous utility)

generate MCSG Z-scores for single-line sequences (they can be pasted back into Excel)

metrics generation (ROC, AUC, and other metrics)

Questions? (gbabnigg)

Completed in 0.012233s