4/10/2022

R Slotnames

51

Affymetrix microarray data normalization and quality assessment

The first is the slotNames command which can take either an object or the name of a class. We obtain names of slots that are related to the class as strings. slotNames(a) slotNames('Agent') Output: Do you know about R String Manipulation Functions. The getSlots and slotNames command are similar as they both take the name of a class as a. Contribute to sdchandra/CNAclinic development by creating an account on GitHub.

Denis Puthier and Jacques van Helden

This tutorial is just a brief tour of the language capabilities and is intented to give some clues to begin with the R programming language. For a more detailled overview see R for beginners (E. Paradis)

Contents

Bioconductor

Setnames In R

From Wikipedia:

Bioconductor is a free, open source and open development software project for the analysis and comprehension of genomic data generated by wet lab experiments in molecular biology.

R slotnames

Most Bioconductor components are distributed as R packages, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix and two or more channel cDNA/Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, X-seq data (RNA-Seq, ChIP-Seq, ...), or SNP data.

The broad goals of the projects are to:

  • Provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.
  • Facilitate the inclusion of biological metadata in the analysis of genomic data, e.g. annotation data from UCSC or GO database.
  • Provide a common software platform that enables the rapid development and deployment of plug-able, scalable, and interoperable software.
  • Further scientific understanding by producing high-quality documentation and reproducible research.
What is reproducible research ? How can R contributes to reproductibility ?

Some area covered by the Bioconductor project with some representative packages:

Setnames in r

R Slotnames

  • Affymetrix GeneChip analysis: Affy, simpleaffy
  • Affymetrix exon arrays: xmapcore, xps
  • Probe Metadata: Annotate, hgu133aprobe, hgu95av2probe, ABPkgBuilder
  • Microarray data filtering: Genefilter
  • Statistical analysis of microarrays: SAMR, siggenes, multtest, DEDS, pickgene
  • Tiling arrays: AffyTiling, tilingArray
  • CGH array analysis: CGHbase, snapCGH
  • NGS quality control/filtering: ShortRead
  • RNA-Seq: easyRNASeq, DESeq
  • ChIP-Seq: chipseq
  • High level plotting functions: geneplotter
  • Functionnal enrichment analysis: GO, Gostats, goCluster, geneplotter
  • Genome coordinates: GenomicFeatures, genomeIntervals, GenomeGraphs, GenomicRanges
  • Graphs: graph, Rgraphviz, biocGraph
  • Flow cytometry: flowCore, flowViZ
  • Variant calling: VariantTools
  • Proteomics: MassSpecWavelet
  • Image analysis: EBImage
[back to contents]

Installing Bioconductor

To install bioconductor you need to retrieve the biocLite function from BioC web site. We will also check that some annotation packages for affymetrix geneChips are available on your computer. To use the command below you need to start R (type R within a terminal)

[back to contents]

S4 objects

We have seen several classes of objects so far (vector, factor, matrix, data.frame...). In R, one can also create custom classes of objects in order to store and interrogate more complex objects.

Let say one need to store an experiment related to two-color microarrays. Then we have to store values from red and green channel for both foreground and background signal. We could also be interested in storing the symbols of the genes measured, the kind of microarray platform used, a description of the experiment (...). The interesting point is that only one instance from such a class would store all informations related to the experiment making it easier to manipulate and share.

Let us design such a class, we will call it microarrayBatch. We will use the setClass function that will allow to store the class definition.

Setnames

Now that the classes is defined, we can create an instance of thisclass (an object). Inside R, this object is viewed as an S4 object ofclass 'microarrayBatch'. As any classical S4 object it contains a setof slots whose names can be accessed with the slotNamesfunction.

The type of object stored in each slot can be accessed usingthe getClassDef function.

Let us store an artificial data set with two microarrays, eachcontaining 10 genes.

As every S4 objects, each slot can be accessed using the @operator:

We can link functions (called methods) to this object. For instancewe can define a method getGreen() for the classmicroarrayBatch. This will retrieve the data stored in slot G (redchannel of the two-color microarray).

Now let's call this function

We can check that the function returns, as expected, the content ofthe slot G of our microarrayBatch object.

As shown in this example we can easily define new object andmethods within R. This S4 formalism is used throughout bioconductorproject.

[back to contents]

The dataset from Den Boer (2009)

Here we will use a subset of the GSE13425 experiment which which can be retrieved from the Gene Expression Omnibus (GEO) public database. In this experiment, the authors were interested in the molecular classification of acute lymphoblastic leukemia (ALL) characterized by an abnormal clonal proliferation, within the bone marrow, of lymphoid progenitors blocked at a precise stage of their differentiation.

Data were produced using Affymetrix geneChips (Affymetrix Human Genome U133A Array, HGU133A). Informations related to this platform are available on GEO website under identifier GPL96.

  • Go to the GEO website to get information about the experiment GSE13425.
  • What kind of tumor types were analyzed ?
  • What does HGU133A stand for ?
[back to contents]

Reading Affymetrix data

Retrieving data

  • Open a terminal
  • Create a directory GSE13425.
  • Move to this directory.
  • Download the subset of the affymetrix raw files : GSE13425_sub.tar
  • Uncompress the files
  • Download data related to sample phenotypes phenoData_sub.txt.
  • Have a look at the phenoData_sub.txt file.
View solution Hide solution

Note: we won't perform pre-processing of the full dataset due to memory and time issues.

Loading data into R

  • Launch R
  • Load the affy library
  • Using the ReadAffy function, assign the result of the ReadAffy function to an object named affy.s13 (Note: the object name is arbitrary, we choose this name to indicate that this object contains an object of type 'AffyBatch', containing the intensity values for 13 selected samples of the DenBoer dataset).
  • print the affy.s13 object.
Setnames in rView solution Hide solution
  • What is the class of this object ?
  • What slots does this object contain ?
  • How many probes does the microarray contain ?
  • Ask for help about the corresponding class
  • What does the assayData slot contain ?
  • Have a look at the method associated to this class
  • What does the exprs method returns ? What are the dimensions ?
  • Does the expression matrix contains as many rows as the number of cells on the array ?
View solution Hide solution[back to contents]

Loading phenotypic data

By default the ReadAffy function does not load phenotypic data. They can be load using the read.AnnotatedDataFrame function that will return an object of class AnnotatedDataFrame.

Given that the phenoData slot of affy.s13 (our instance of class AffyBatch) is also an AnnotatedDataFrame assign the result of read.AnnotatedDataFrame to our affy.s13 object.
View solution Hide solution[back to contents]

Indexing an affyBatch object

The indexing operator '[' (which in fact is a function...), is also re-defined in the source code of the affy library. The code stipulates that the indexing function will always return an AffyBatch object. In the following example when selecting two microarrays, we also select both the expression values and the corresponding phenotypic data

[back to contents]

Affy library: graphics

The image function

  • Generate a pseudo image of the first and second arrays using the image function.
View solution Hide solution

The barplot.ProbeSet() function

The probeSet names can be accessed through the geneNames function.

Note: the method geneNames() returns probeset identifiers rather than actual 'gene names'.

Given one or several probeSet IDs, the probeset method allows one to extract the corresponding probe expression values.

  • Use the function barplot.ProbeSet() to visualize the intensity values for the perfect macth probes (PM) and mismatch probes (MM) of the probeSet with identifier '200000-s_at'.
  • Do the same for the probesets '221798_x_at' and '209380_s_at'.
  • What can we conclude about the PM and MM values for these probesets ?
View solution Hide solution[back to contents]

Quality control of raw data

Descriptive statistics

  • Create an object named affyLog2, which will contain the expression values transformed in logarithmic scale (base 2).
  • Display the distribution of the first array using the hist function (use the affyLog object).
  • Use the plotDensity function to display microarray distributions (use the affyLog object).
  • Use the boxplot function to display microarray distributions (use the affyLog object and pch='.' as argument).
View solution Hide solution

AffyRNAdeg

The box plots and histograms generated above indicate the global distribution of intensity values for all probes. A well-known pittfall of Affymetrix technology is the degradation effect: for a given gene, the intensity tend to decrease from the distalmost (3') to the less distal (5') probes. The affy library implements a specific quality control criteria, enabling to plot the changes in mean intensities from 5' to 3' probes (AffyRNAdeg function).

[back to contents]

Present/absent calls

It is most generally important to select a set of genes that are above the background in at least a given number of samples. The affymetrix reference method allows one to compute for each probeSet a Absent/Marginal/Present call (A/M/P). However, this method is based on the comparison of signals emitted by PM and MM (that tend to follow the PM signal). This function is implemented in mas5calls function (as it was originally part of the MAS5 normalization algorithm).

[back to contents]

Data normalization

Numerous methods have been proposed for affymetrix data normalization (mas5, PLIER, Li-Wong, rma, gcrma,...). These methods rely on elaborate treatment, including inter-sample normalization. A detailed description and comparison of these methods is out of scope for his course. For this practical, we will use the (rma()) function. Note that RMA normalization includes a log2 transformation of the raw data.

  • What object is returned by rma ?
  • Which slots does the object contain ?
  • Ask for some help about the class of this object .
  • What are the slot contained within this object ?
  • Use the smoothScatter function (library geneplotter) to compare normalized values from the first and second microarray.
View solution Hide solution[back to contents]

The ExpressionSet object

The ExpressionSet class is central to BioC as lots of packages converge to produce ExpressionSet instances. This simple object is intended to store normalized data from various technologies.

[back to contents]

Checking the normalization results

Relative Log Expression (RLE)

R slotnames

One can use classical diagram to visualize the normalization results. Another solution to check the normalization of an expression matrix is to use the Relative Log Expression (RLE) plot.

View solution Hide solution

MA plot diagram

One popular diagram in dna chip analysis is the M versus A plot (MAplot). In this diagram:

    M is the log intensity ratio calculated for any gene.

    A is the average log intensity which corresponds to an estimate of the gene expression level.

Would data be perfectly normalized, M value should not depend on A values. To represent the MA plot we will first compute values for a pseudo-microarray that will be the reference. This pseudo-microarray will be highly representative of the series as it will contain the median expression values for each gene.

  • Calculate A1..n and M1..n for sample 1 versus ref given that for each gene g with intensities Ig,1 et Ig,ref M and A can be computed as follows:
  • A=(Ig,1+Ig,ref)/2
    M=Ig,1−Ig,ref
  • Using the abline function (h argument) add the line M=1, M=-1 and M=0.
  • Using the text function display the names of probesets for which the absolute value of the ratio is above 4.
  • View solution Hide solution[back to contents]

    Probe annotations

    As you have probably noticed, the gene names are neither availablein the affyBatch object nor in the eset object. Each affymetrixmicroarray has its own annotation library that can be used to linkprobesets to genes Symbol and retrieve additional information aboutgenes. Here we need to load the hgu133a.db library. If it isnot previously install, use the biocLite function.

    This library give access to a set of annotation sources that can be listed using the hgu133a function.

    The following commands can be used to retrieve gene Symbols for the hgu133a geneChip.

  • Create a new object, m, that will contain the normalized expression matrix.
  • Change the row names (rownames function) of m so that they will contain both the probe names and gene symbols (use the paste function with ' ' as separator).
  • View solution Hide solution[back to contents]

    Writing data onto disk

    R objects can be saved using the save function (then subsequently load using the load function). For a tab-delimited file output one may use the write.table function.

    [back to contents]

    Additional exercices

    Using boxplot and densities, compare the effect on raw pm data of quantile normalization vs median centering, median-centering and scaling, and median-centering and scaling with mad.

    References

      Den Boer et al. A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: a genome-wide classification study. Lancet Oncol (2009) vol. 10 (2) pp. 125-34.
    The Names of an Object

    Functions to get or set the names of an object.

    Keywords
    attribute
    Usage
    Arguments
    x

    an R object.

    value

    a character vector of up to the same length as x, or NULL.

    Details

    names is a generic accessor function, and names<- is a generic replacement function. The default methods get and set the 'names' attribute of a vector (including a list) or pairlist.

    For an environmentenv, names(env) gives the names of the corresponding list, i.e., names(as.list(env, all.names = TRUE)) which are also given by ls(env, all.names = TRUE, sorted = FALSE). If the environment is used as a hash table, names(env) are its “keys”.

    If value is shorter than x, it is extended by character NAs to the length of x.

    It is possible to update just part of the names attribute via the general rules: see the examples. This works because the expression there is evaluated as z <- 'names<-'(z, '[<-'(names(z), 3, 'c2')).

    The name ' is special: it is used to indicate that there is no name associated with an element of a (atomic or generic) vector. Subscripting by ' will match nothing (not even elements which have no name).

    A name can be character NA, but such a name will never be matched and is likely to lead to confusion.

    Both are primitive functions.

    Value

    For names, NULL or a character vector of the same length as x. (NULL is given if the object has no names, including for objects of types which cannot have names.) For an environment, the length is the number of objects in the environment but the order of the names is arbitrary.

    For names<-, the updated object. (Note that the value of names(x) <- value is that of the assignment, value, not the return value from the left-hand side.)

    Note

    For vectors, the names are one of the attributes with restrictions on the possible values. For pairlists, the names are the tags and converted to and from a character vector.

    For a one-dimensional array the names attribute really is dimnames[[1]].

    Formally classed aka “S4” objects typically have slotNames() (and no names()).

    References

    Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

    See Also

    slotNames, dimnames.

    Aliases
    • names
    • names.default
    • names<-
    • names<-.default
    Examples
    library(base)# NOT RUN {# print the names attribute of the islands data setnames(islands)# remove the names attributenames(islands) <- NULLislandsrm(islands) # remove the copy madez <- list(a = 1, b = 'c', c = 1:3)names(z)# change just the name of the third element.names(z)[3] <- 'c2'zz <- 1:3names(z)## assign just one namenames(z)[2] <- 'b'z# }
    Documentation reproduced from package base, version 3.6.2, License: Part of R 3.6.2

    Community examples

    richie@datacamp.com at Jan 17, 2017 base v3.3.2

    A better way to remove the `names` attribute is to use [`unname()`](https://www.rdocumentation.org/packages/base/topics/unname).```{r}unname(islands)```The main advantage of having names is that it gives you an easy-to-read way of subsetting.```{r}islands[c('South America', 'Southampton')]```Or, more fancily, you can use a regular expression to extract all islands with names begining with `'A'`, for example.```{r}islands[grepl('^A', names(islands))]```Lists can also have names.```{r}(l <- list(a = 1, b = letters[1:5], c = list(d = 1:3)))names(l) # only the top level element names, not 'd'names(unlist(l)) # unlist gives a name for every element```You can overwrite all the names.```{r}(l <- list(a = 1, b = letters[1:5], c = list(d = 1:3)))names(l) <- LETTERS[1:3]l```… or just some of them.```{r}(l <- list(a = 1, b = letters[1:5], c = list(d = 1:3)))names(l)[1:2] <- c('Alpha', 'Beta')l```Setting names on an object, then returning that object can be done in a single step using [`setNames()`](https://www.rdocumentation.org/packages/stats/topics/setNames).```{r}(l <- list(a = 1, b = letters[1:5], c = list(d = 1:3)))setNames(l, c('Alef', 'Bet', 'Gimel'))```If an object has no names, then the `names()` function returns `NULL`.```{r}v <- 1:3names(v)```If an object has some names, then the names function returns a character vector with missing values where there are no names.```{r}v <- 1:3names(v)[2] <- '2nd'names(v)v```

    API documentation