Using Next-Generation Sequencing (NGS) to Survey Transcriptome in Different Seed Compartments, Tissues, and Regions Throughout Soybean Seed Development.

Background

The Affymetrix soybean genome array was used to study the activity of genes in different compartments of the soybean seed at various stages of development (see the Browse link). The soybean array was designed using publicly available ESTs (Click here for more details about the soybean array). Most of the ESTs originate from reproductive and vegetative organs, but very few ESTs are from libraries constructed from soybean seeds throughout development. As such, genes active during many stages of seed development are most likely under-represented on the array. To uncover additional genes active during Soybean seed development, we carried out a study using next-generation sequencing to survey the transcriptome in different seed compartments, tissues, and regions across soybean seed development.

Methods

Soybean plants were grown in the UCLA Plant Growth Center with a 16:8 light-dark cycle at 22°C.

  • Total RNA isolated from soybean whole seeds was subjected to two rounds of poly(A) selection using a Dynabeads oligo(dT) system (Invitrogen). Poly(A) selected RNA was used to make Illumina sequencing library with Illumina RNA-Seq Kit.
  • Total RNA isolated from soybean seed compartments, tissues, and regions captured by Laser Capture Microdissection (LCM) was amplified with Ovation RNA-Seq system kit from Nugen The cDNAs serve as template for the second half of the Illumina RNA-Seq procedure for end-repair, polyA tailing, adapter ligation and PCR amplification.

Data Analysis

Briefly, raw reads generated by Illumina sequencing machine is processed to remove low quality and rRNA reads. Filtered reads are then mapped to several references using Bowtie (Langmead et al. Genome Research (2009). The references used for mapping include the entire assembled genome sequence (version 1.0), the predicted gene models, and the predicted transcripts. The inclusion of all three references will allow the identification of gene models from reads that mapped to exon-exon junctions, novel exons (e.g. within predicted introns), and novel untranslated regions. Raw unprocessed sequences generated from this study have been submitted to NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/projects/geo/). The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomic data submitted by the scientific community. The Illumina raw sequence files containing the raw sequence and sequence quality information can be access through the GEO web site under GEO accession series number GSE29163 or sample number GSM721725 or click the GEO accession numbers listed below.

Summary

As shown on the figure 1, we surveyed:
1. Soybean seed mRNAs from five developmental stages (GEO Accession - GSE29163):

2.Sobyean seed compartment mRNAs captured by LCM (GEO Accession - GSE29162):

  • Embryo proper region from globular stage seed - GSM721717
  • Suspensor region from globular stage seed - GSM721718
  • seed coat parenchyma tissue from early maturation stage seed - GSM721719

3.Soybean seed cotyledon mRNAs from maturation through germination (GEO Accession - GSE29134):

  • Embryonic cotyledons of mid-maturation stage embryos - GSM721277
  • Embryonic cotyledons from late-maturation stage embryos - GSM721278
  • Cotyledons from seedling after 6 days after imbibition - GSM721280

The major conclusion to date are (shown in figure 2):

  • Using next-generation sequencing technology, we estimate that there are at least 53,000 diverse mRNAs required for the differentiation of all soybean seed compartments, regions, and tissues across development (i.e., genes required to "make a soybean seed").
Figure 1. RNASeq Analysis of Soybean Seed mRNAs At Different Developmental Stages. (A) Seed stages used for RNASeq analysis are boxed. GLOB, HRT, COT, EM, MM, and LM refer to globular, heart, cotyledon, early maturation, mid-maturation, and late-maturation stages of development, respectively. 6DAI and cot refer to six days after seed imbibition and post-germination cotyledons, respectively. (B) Cartoons of globular and early-maturation stage seed cross-sections used to capture specific seed regions and tissues for RNASeq analysis (red boxes). (C) Summary of seed mRNA sequencing reads using RNASeq. (D) Accession numbers of soybean seed mRNA populations analyzed using RNASeq. EP, SUS, and SCPY refer to embryo proper, suspensor, and seed coat parenchyma region, respectively, and are highlighted by the red boxes in (B).
Figure 2. Gene Activity During Soybean Seed Development and in Specific Seed Compartments and Tissues. Data obtained from the RNASeq studies summarized in Figure 1. Whole seed, whole cotyledons, and LCM refer to mRNA sequences obtained from the entire seed, cotyledons only (Figure 1A), and specific seed compartments and regions (Figure 1B), respectively. Abbreviations are defined in the legend to Figure 1.