Hello! Guest! Please Login or Register! Log Out

Microarray Data Visualization


Navigation and visualization for microarray data are available at different levels. This will help users choosing suitable parameters in gene filtering and analysis. It also aids in microarray experiment quality diagnostics.

Different types of visualizations have very different ways of getting access to them. Please refer to the text below for detailed instructions.


1. Experiment Level Visualization:

Purpose: Box plots are used for showing the distribution of expression values among the hybridizations.

 

Entry page: Experiment visualization page, use the links on the top half page.

 

Menu Access: From top menu, select "Analysis & Viz." --> "Visualization" ->"Experiment".

Box plots are also called box-and-whisker plots. They are used to display the distribution of the samples. Five values of each data set are used to draw the plots. The values are the 3 quartiles, 1st quartile, median, and third quartile, and the min and max of the values. The length of the box represent the interquartile range (IQR). IQR gives the range of the middle 50% of the data.  Box plots for different samples are usually drawn together for comparing their data distributions.  The notches represent the position of the medians, thus serving as a rough measure of the difference between data set. If the notches of two plots do not overlap then the medians are significantly different between the two samples at the 5 percent level.

Box plots for raw Perfect Match (PM) intensities in all chips are useful for checking  raw probe-level intensity distribution and the extent of outliers. Due to the variation is starting RNA amount, synthesis and labeling efficiency, hybridization and washing conditions, the distributions of different hybridizations usually show some extent of difference among the hybridizations. Usually, some variation does not mean of bad quality.  And the normalization methods (Affymetrix MAS 5.0, Bioconductor's RMA)  usually take care of the minor variations and estimated uniform values. But significantly different hybridizations need be checked carefully. For example, BB1_H10 shows much higher median than others. Visual checking the PM image revealed that this hybridization shows very high background, which may resulted from poor washing.

Box plots of normalized RMA or MAS 5.0 expression estimations are usually very uniform in median and in box plot shape. This is understandable, as we normalized all hybridizations to target mean expression values of 500 in MAS 5.0 estimations. For RMA estimations, the algorithm first estimate and remove background globally, and then do quantile normalization on the supposedly "true"  PM signals, thus bring all similar hybridization to same medians and means.

Despite the uniform nature in medians and means of normalized expression values, there are still some putative outliers showing up in MAS5.0 or RMA estimations. RMA is designed to down-weight variations in the low expression zone, thus on the box plots, the "outliers" are present at the high expression zone. On the other hand MAS 5.0 estimations have lot of putative outliers in the low expression zone, where it usually declared the probe sets as  "Absent" and exclude them from further analysis.

Histogram density plots  of all perfect match probes (PM) intensity in all chips. Aiding quick inspection of signal distributions across hybridizations. It can also help in finding saturation in scanning, which may occur if the Scanner PMT was set too high.

Bioconductors affy package provides functions for getting RNA degeneration plots and statistics. They are very useful in showing consistency in RNA preparation, cRNA synthesis quality. Usually, the 5' end of RNA degenerate faster, thus the RNA degeneration plots will show are increasing signals from 5' to 3'. This trend is very obvious in ATH1 GeneChips. For Barley1 GeneChip, the 3' end also has a drop down in signal. If on the plot, some hybridizations have very different slope from others, they may have undergone variation in RNA quality or in synthesis than others.


2. Hybridization Level Visualization:

Purpose: (1). Scatter plots and MVA plots show reproducibility/variability among hybridizations or treatment means.

                   (2). Pseudo-color image of PM intensities is used  for visual detection of spatial abnormality.

 

Entry page: hybridization visualization page.


Menu Access: From top menu, select "Analysis & Viz." --> "Visualization" --> "Hybridization".

 

Scatterplots can be obtained for any two hybridization or treatments.  There are two types of scatter plots: the scatterplots with density uses the brightness to represent data point density, the brighter, the denser. For scatterplot colored version, the numbers of genes changed at 2~4, 4~8 and over 8 fold are shown as legends, and the corresponding data points are colored accordingly for easier viewing.

M vs A Plot is common in microarray data visualization.  The M is the difference, M = log2(X)-log2(Y) and A is average, A = 0.5*(log2(X) + log2(Y)). It can be regarded as a 45 degree clockwise rotation of scatterplot for easier viewing of differential expression or for rapid identification of skewed data. Data points from similar hybridizations will be centered on the M = 0 axis.

 MvA pairs plot matrix for all replicates from a treatment shows the reproducibility among replicates. The variance shown on the plot are the variance of the M's, which will be small for similar hybridizations. So the smaller the variance, the better the reproducibility.

 MvA pairs plot matrix for selected hybridizations shows the variation among selected hybridizations.

 Image of PM intensities in pseudo-color from full-detail page for visually checking for global quality, and detecting spatial abnormality. Log transformed intensities are used for drawing PM images.

 Histogram of PM and MM intensities of the hybridization are plotted together. Usually, we can expect PM distributed more at higher intensity zone.


3. Filtered Gene List Visualization

Entry page: My Gene Lists page, use the "Visualization" link.

Menu Access: From top menu, select "Analysis & Viz." --> "Visualization" --> "Genes".

  • Expression line graph for all, or for selected probe sets from filtered data set.

  • Heatmap with dendrogram for both probe sets and samples.

  • Part or all probe sets from the data set can be selected for plotting.

[Back to Top]


4 Single Probe Set (Gene) Visualization

Entry page 1: On any probe set query result page, click the probe set names.

Entry page 2: From saved data set from  My Data Set page, use the "View Probe Set Annotation"  link, then click the probe set names.

Menu Access: From top menu, select "Database Usage" --> "Data Set & Analysis" --> "My Dataset".

Entry page 3: Select a probe set set from  Browse Probe Set page,  then click the probe set name.

Menu Access: From top menu, select "Database Usage" --> "Browse Probe Set".

  • Usually, user will goes here after some kind of probe set filtering or search. It might be more meaningful to check probe sets belong to a data set returned from filtering.

  • Line-graph of RMA and MAS5.0 estimation across hybridizations in an experiment are plotted together.

  • Expression view (heatmap) of expression profile neighbors, which may represent co-regulated genes. Anti-regulated probe sets will also be added in future.

  • For an example, please view the Contig15950_at page.

[Back to Top]


5 At probe level

Entry page:  Following probe set detail page above, use the  "View Probe Level Value and Barplot"  link.

  • Will be available from probe set, follow probe set-level view above.

  • Bar-plots with standard deviation shown, allows comparison of intensities across hybridizations for same probe, or across probe pairs for same hybridization.

  • For an example, click the "Get barplot for Contig15950_at" link at Contig15950_at page.

[Back to Top]


 

Copyright@2001-2005 The BarleyBase Group
All rights reserved.

For problems with the webpages, contact barleybasewebmaster