Summary of All Project GenBank Submissions

Dataset BS-DNA-SEQ RNA-SEQ SMALL RNA-SEQ Dataset GENECHIP
Total Reads 1,967M 985M 56M No. Compartments 76
Total Bases 190.2Gb 74.6Gb 4.2Gb No. GeneChip 166
No. Datasets 11 17 2 No. Datasets 10


All GenBank Submissions Categorized By Project

Please click on the GEO accession number below to download the data from GenBank.

BS DNA-SEQ (Methylome Profiling of Soybean Seed Development Using Next-Generation Sequencing)

Study Dataset No. Reads No. Bases GEO Accessions
Methylation Changes During Soybean Seed Development
(GSE34637)
Globular Stage Seeds (BR1) 102M 8.4Gb SRX039376
Globular Stage Seeds (BR2) 195M 19.5Gb GSM852274
Early Maturation Stage Seeds (BR1) 77M 6.3Gb SRX039377
Early Maturation Stage Seeds (BR2) 191M 19.1Gb GSM852275
Mid-Maturation Stage Seeds 336M 33.6Gb GSM852276
Mid-Maturation Stage Axis 191M 19.1Gb GSM852277
Late-Maturation Stage Seeds 144M 14.4Gb GSM852278
Dry Seeds 277M 27.7Gb GSM852279
Methylation Changes in Soybean Early Maturation Seed Compartments Using LCM Seed Coat Parenchyma 154M 12.1Gb Pending
Cotyledon Abaxial Parenchyma 150M 15.0Gb Pending
Cotyledon Adaxial Parenchyma 150M 15.0Gb Pending

Note: BR - Biological Replicate; LCM - Laser Microdissection;

RNA-SEQ (Transcriptome Profiling of Soybean Seed Development Using Next-Generation Sequencing)

Study Dataset No. Reads No. Bases GEO Accessions
Transcriptome Profiling of the Soybean Life Cycle
(GSE29163)
Globular Stage Seeds 89M 6.8Gb GSM721725
Heart Stage Seeds 40M 3.0Gb GSM721726
Cotyledon Stage Seeds 52M 4.0Gb GSM721727
Early Maturation Stage Seeds 123M 9.3Gb GSM721728
Dry Seeds 42M 3.2Gb GSM721729
Trifoliate leaves 45M 3.4Gb GSM721730
Roots 40M 3.0Gb GSM721731
Stems 19M 1.4Gb GSM721732
Floral Buds 59M 4.5Gb GSM721733
Whole seedlings six days after imbibition 33M 2.5Gb GSM721734
Transcriptome Profiling of Soybean Seed Compartments Using LCM
(GSE29162)
Globular Stage Embryo Proper 74M 5.6Gb GSM721717
Globular Stage Suspensor 68M 5.2Gb GSM721718
Early Maturation Seed Coat Parenchyma 73M 5.5Gb GSM721719
Transcriptome Profiling of Soybean Embryonic Cotyledon Before and After Germination
(GSE29134)
Mid-Maturation Cotyledon 49M 3.7Gb GSM721277
Late-Maturation Cotyledon 83M 6.3Gb GSM721278
Dry Seed 42M 3.2Gb GSM721279
Seedling Cotyledon 54M 4.1Gb GSM721280

Note: BR - Biological Replicate; LCM - Laser Microdissection;

SMALL RNA-SEQ (Small RNA Profiling During Soybean Seed Development)

Study Dataset No. Reads No. Bases GEO Accessions
Small RNA Profiling of Soybean Seed Compartment Using LCM
(GSE34638)
Early Maturation Whole Seed 28M 2.1Gb GSM852281
Early Maturation Seed Coat Parenchyma 28M 2.1Gb GSM852280

Note: BR - Biological Replicate; LCM - Laser Microdissection;

GENECHIP (Transcriptome Profiling of Seed Development Using GeneChip Arrays)

Study Seed Stage No. Compartments Studied No. GeneChip Experiments GEO Series Accessions
Transcriptome Profiling of Soybean Seed Compartments Using LCM
Globular 8 24 GSE6414
Heart 8 19 GSE7511
Cotyledon 8 16 GSE7881
Early Maturation 16 32 GSE8112
Transcriptome Profiling of Arabidopsis Seed Compartments Using LCM
Pre-globular 6 12 GSE12402
Globular 7 15 GSE11262
Heart 6 14 GSE15160
Linear Cotyledon 6 12 GSE12403
Bending Cotyledon 5 10 GSE20039
Mature Green 6 12 GSE15165

Note: BR - Biological Replicate; LCM - Laser Microdissection;

A Soybean Seed Transcription Factor RNAi Knock-Out Collection

To study the functions of transcription factor genes active during soybean seed development, we collaborated with Dr. David Somers (Monsanto) to generate a collection of soybean seed RNAi knock-out lines. We used the CaMV 35S gene promoter to generate RNAi lines for 63 transcription factor genes that are expressed in specific seed regions at the globular, heart, cotyledon, and early-maturation stages of development. RNAi transgenes were integrated into the soybean genome, and R0 lines containing a single RNAi transgene were isolated and grown to maturity in the greenhouse. The developing R1 and R2 seed populations were screened for developmental abnormalities associated with seed and vegetative development. A preliminary screen yielded three lines with significant phenotypes. The remainder appeared similar to wild type under our screening conditions. A list of each transcription factor gene knock-out line, their seed expression profile, and RNAi phenotype is presented in the table below. All of the RNAi lines are available to the soybean community from Monsanto through a standard MTA process. To initiate this process, please contact either Bob Goldberg (bobg@ucla.edu) or David Somers directly at Monsanto (david.a.somers@monsanto.com).

Phenotypes of Some RNAi Knock-Out Lines (click on image to enlarge)

A Complete Summary of RNAi Knock-Out Lines (click on image to enlarge)

Click here to see the abbreviation of stages and compartments.

To find the expression profile of target genes during seed development, first go to the "Browse Soybean mRNAs Profiling Database" page. Next, type the target gene name (e.g. Glyma04g41710), in the "Predicted Gene Model ID" window and click the "Submit Query" button to search the database. Lastly, in the search results page, click on the probe set corresponding to the target gene to view the expression profile.

Soybean IVT Array Annotation

Sequences used for BLAST came from the Affymetrix Soybean target sequences. Sequence information can be obtained directly from Affymetrix. The Affymetrix Soybean target sequence was based on the NCBI Unigene Build 13 (November, 2003). Probe design was based on the NCBI Unigene Build as well as the Affymetrix in-house clustering algorithm. Affymetrix in-house clustering probes are designated with the prefix "GmaAffx".

BLASTX analysis was carried out using soybean target sequences searched against all Arabidopsis proteins (TAIR ATH1_pep_cm_20040228). In our BLAST analysis, we filtered and removed any results with e-value greater than e-02. We selected the top Arabidopsis hit from each BLAST result (sometimes one Soybean sequence can hit many different Arabidopsis sequence) when identifying the corresponding Arabidopsis sequence. The e-value for that hit is displayed in the annotation file. Therefore, for each Soybean probe set, there is an associated Arabidopsis annotation (if available) and the degree of homology between the Soybean and Arabidopsis sequence based on the e-value. In cases where no Arabidopsis hit was identified (~9000 Soybean probe sets did not have homology to any Arabidopsis proteins), we BLASTED the Soybean sequence against Rice Proteins (Build #2 from TIGR) and the NCBI non-redundant protein database. We annotated Soybean probe sets and did not annotate any features from H. glycines or P. sojae that are in the GeneChip.

Recently, we annotated the soybean GeneChip to the draft soybean genome sequences (Phytozome.net).


ANNOTATION UPDATE:

Sept. 25, 2009 - We mapped individual probes to soybean predicted gene models (generated by the Department of Energy (DOE) Joint Genome Institute, Glyma version 1.01, released April 7, 2009) using BLASTN (≥ 23/25 nucleotide identity) to associate soybean array probe sets with soybean gene models. Probe sets that contain at least 9 out of 11 probes mapping to the same genomic locus are represented in the files below. Probe sets that did not meet these criteria (i.e. 23/25 nucleotide identity, ≥ 9/11 probes per probe set) were not included in the file below. We split the file into two files based on the confidence of prediction of soybean gene models (ftp://ftp.jgi-psf.org/pub/JGI_data/Glycine_max/Glyma1/annotation/highConfidence/Glyma1_highConfidence.transcriptList). Click the files below to download the association of Soybean array probe sets and Soybean gene models.

Feb. 1, 2009 - We updated the annotation of the soybean array information based on information from TAIR 7.0, TIGR, and Peking Transcription Factor databases as of October 2007. The updated information is available from the following link.

Distribution of All Probe Sets on the Soybean Array (2007)

Soybean Whole Transcript Genome Array

Motivation:

We created this Soybean Whole Transcript (WT) Array to interrogate all the genes in the genome. The first generation Affymetrix Soybean Genome array was designed by the Soybean Consortium using publicly available soybean full-length cDNAs and ESTs. The Soybean Genome array consists of 37,000 probe sets interrogating ~ 25,000 distinct genes/transcripts. The release of the whole genome sequence of soybean1 (available at Phytozome.net) allowed the creation of an array that can survey all the genes (both high and low confidence gene models) in the genome [Schmutz et al., Nature 463 pp. 178-83 (2010)].

Design:

The design of the Soybean WT array is different from the Soybean Genome array. For the Soybean Genome array, probes were selected to correspond to the 3’ end of the transcript or cDNA. However, for the Soybean WT array, probes were selected to span every exon of the predicted gene models/transcripts, if possible. This approach allows for the interrogation of the transcript (from 5’ to 3’) and can help determine exon usage in different splice variants that may be differentially expressed in specific tissues or compartments. For information regarding this array design, please check out other references from Affymetrix (http://media.affymetrix.com:80/support/technical/technotes/gene_1_0_st_technote.pdf).

Note: This array was designed for studying both Soybean and Medicago (i.e. a Legume array). There are sequences on the array corresponding to Medicago cDNAs. However, our main focus will be on the Soybean sequences on the array.

Sequence Data:

All sequence data used to design probes on the array were obtained from the Department of Energy - Joint Genome Institute (DOE-JGI) web site (phytozome: http://phytozome.net). Probes were designed from the first draft assembly of the soybean genome1 (version 1.0). The probe selection algorithm was developed by Christopher Davies and Brant Wong at Affymetrix.

Publication Acknowledgement:

The array was designed with collaboration from our lab (Goldberg Lab) and Affymetrix with advice and suggestions from other members of the soybean community, including Randy Shoemaker.

Please acknowledge the following people for the design of this array:

Goldberg Lab: Bob Goldberg, Brandon Le, Chen Cheng, Min Chen, and Anhthu Bui

Affymetrix: Gene Tanimoto, Christopher Davies, Stan Trask, Brant Wong, Eric Schell, Xue Mei Zhou, and Patricia Chan


Files for Download


[Probe Association File]

We've created a text file that correlates Affymetrix probe ID with associated probe sequence, gene and exon information, etc.

Probe Association File: [Click Here to Download]

[Soybean SENSE WT Array]

This array design is available to the general public and can be purchased through Affymetrix.

Library File: SoyGene-1_0-st-v1-rev02.zip [Click Here to Download]

Labeling Protocol: Check the Affymetrix Website for labeling and hybridization kits [Go to Affymetrix Website]

[Soybean ANTISENSE WT Array]

This array was created for our lab and is a custom-designed antisense WT array. Please use the library file and protocols listed below for this array only.

Library File: SoyGene-1_0-antisense_rev02.zip [Click Here to Download]

Labeling Protocols:

  • Labeling Protocol One: Nugen Ovation Pico WTA System

    Click on the link to go to the product web site [Link]

  • Labeling Protocol Two: Ambion WT Expression Kit with Affymetrix Second Strand cDNA Synthesis

    [Click Here to Download Protocol]

This labeling protocol is presented as is and is not regularly supported by the Affymetrix Technical Support team. This method requires an Ambion WT Expression kit, Affymetrix Fragmentation and Terminal Labeling kit, and second strand cDNA synthesis reagents from vendors provided in the attached protocol. For this protocol, you will generate cRNA using the Ambion WT Expression kit (up to Day2 Workflow, Step2). After cRNA synthesis, you will use the Affymetrix protocol (starting on page 9) to make the second cycle cDNA and terminally-labeled targets.

[Hybrididization Program]

For array wash, stain, and scan, use the fluidics protocol EuKGE-WS2v5_450 for wash and stain procedures as described in the GeneChip Expression Analysis Technical Manual (Section 2: Eukaryotic Sample and Array Processing).

Arabidopsis ATH1 Array Annotation

The Arabidopsis ATH1 array was annotated in 2003 using all the publicly available resources at the time. In order to keep up with the increasing amount of information generated within the past four years since the annotation of the ATH1 array, we decided to re-annotate the ATH1 array in parallel with the soybean genome array.

The strategy for the re-annotation of the ATH1 array is as follows:

1. We updated the descriptions for each probe set on the array using TAIR Affy array descriptions (affy_ATH1_array_elements-2007-5-2.txt). The description file was downloaded from the TAIR web site: ftp://ftp.arabidopsis.org/home/tair/Microarrays. Descriptions were based on the latest release of the Arabidopsis genome TAIR 7 (released 04-11-07).

Note from TAIR: The mapping to the TAIR7 Transcripts was performed using the BLASTN program with e-value cutoff < 9.9e-6. For the 25-mer oligo probes used on the Affy chips, the required match length to achieve this e-value is 23 or more identical nucleotides. To assign a probe set to a given locus, at least 9 of the probes included in the probe set were required to match a transcript at that locus. Otherwise, the probe set was not assigned a locus and was given the description "no match".

2. In addition to updating the descriptions for each probe set, we also updated gene ontology (GO) information provided by Affymetrix.

3. We gathered information about putative transcription factors from many publicly available TF database for Arabidopsis including:

Transcription factors and transcription factor families were associated with each probe set on the array. Information obtained from points 1-3 were compiled together into an annotation file containing the 2003 ATH1 annotations. Transcription factors were automatically updated based on the information obtained from the databases in point 3.

4. We focused on probe sets that were previously assigned into the "unclassified" category. The rationale is that many of the sequences in the "unclassified" category might have update information that can be used to re-assign into a different category. Sequences previously assigned categories of "protein synthesis" or "metabolism" most likely will not change. Therefore, we first focused on re-assigning the 11,145 probe sets classified as "unclassified" in 2003.

5. After the "unclassified" category was re-examined, we decided to re-examine the entire 22,746 probe sets on the array for consistent assignment of functional categories. We sorted all the probe sets by their description and made sure that probe sets with similar descriptions are assigned the same functional category.

6. We further examined the "unclassified" category that is divided into three groups as follows:

  • Unclassified - hypothetical proteins with no cDNA support
  • Unclassified - hypothetical proteins with cDNA support
  • Unclassified - proteins with unknown function

We obtained several files from TAIR that will distinguish the different sequences within the unclassified category. We downloaded several files from the TAIR site including:

  • TAIR7_protein_coding_no_transcript_support_09_30_07
  • TAIR7_protein_coding_with_transcript_support_09_30_07
  • TAIR7_unknown_proteins_no_transcript_support_09_30_07
  • TAIR7_proteins_of_undefined_function_03_07
  • TAIR7_unknown_proteins_03_07
  • TAIR7_locus_type

These files were compiled into one main table listing all the transcripts detected and/or predicted in the Arabidopsis genome. This list helps distinguish if a sequence has cDNA support, represents a pseudogene/transposon, or is unknown. These files help re-assign the probe sets into appropriate unclassified categories.

Download

The updated information is available from the following link.

Distribution of All Probe Sets on the Arabidopsis ATH1 Array (2007)

Click the image to view larger image.

Presentations Relevant to This Project

Bob Goldberg

  • Using Genomics to Dissect Soybean Seed Development, University of Arizona, Tucson, Arizona (2011) [Download pdf]
  • Using Genomics to Dissect Soybean Seed Development, 13TH Biennial Molecular & Cellular Biology of the Soybean Conference, Durham, North Carolina (2010) [Download pdf]
  • What Are The Genes Required to Make a Seed? Important For Food, Fuel, & Engineering New Crops, Faculty Science Research Colloquium Lecturer, UCLA (2008) [Download pdf]
  • Using Genomics to Dissect Seed Development, Ueli Wobus at 65 Seed Biology Symposium, Gatersleben, Germany (2008) [Download pdf]
  • Using Genomics to Dissect Seed Development, XX International Congress on Sexual Plant Reproduction, Brasilia, Brazil (2008) [Download pdf]
  • Genetic Engineering Novel Crop Plants:Unlimited Horizons, Mount Hood, Oregon (2008) [Download pdf]
  • Genetic Engineering New Crops: Importance for Food, Fuel, and Sustainable Crops, Peers Undergraduate Orientation Research Lecture, UCLA (2008) [Download pdf]

John Harada

  • College of Biological Science, Peking University, Beijing, China (2008)
  • Institute of Genetics and Development, Chinese Academy of Sciences, Beijing, China (2008)
  • Institute of Botany, Chinese Academy of Sciences, Beijing, China (2008)
  • Sonoma State University, Rohnert Park, CA (2008)
  • International Congress on Sexual Plant Reproduction, Brasilia, Brazil (2008)
  • BASF Research Triangle park (2008)
  • Monsanto, AgraCetus Campus (2008)
  • University of Missouri (2008)
  • Texas A & M University (2008)
  • University of Arizona (2008)
  • National Chung-Hsing University, Taichung City, Taiwan (2007)
  • Academia Sinica, Taipei, Taiwan (2007)
  • National University of Taiwan (2007)

Miscellaneous Videos

These movies are best viewed in Quicktime. Click here to download Quicktime. To download the video, Mac users: Press CTRL and click on link to download video; PC user: right-click on the mouse and select download.

  • Seed Development Movie (2008) Developed by Brandon Le and Bob Goldberg [Download video]

  • Laser-capture microdissection of Arabidopsis seed compartments [Download video]

  • Laser-capture microdissection of soybean seed compartments [Download video]

People

If you have any questions or comments about this project, or data presented in this web site, please contact Bob Goldberg or John Harada.

For questions or comments about this web site, please contact Brandon Le or Min Chen.


Current NSF Project - Gene Regulatory Processes Required to Make a Soybean Seed

  UCLA   UCD
Bob Goldberg
Principal Investigator, UCLA
bobg@ucla.edu URL
Matteo Pellegrini
Co-Principal Investigator, UCLA
matteop@mcdb.ucla.edu URL
Jungim Hur
Post-Doc, UCLA
jihur@ucla.edu
Jer-Young Lin
Post-Doc, UCLA
lin51@ucla.edu
Brandon Le
Graduate student, UCLA
ble@ucla.edu
Stephen Douglass
Graduate student, UCLA
m.chen@ucla.edu
Min Chen
Technician, UCLA
m.chen@ucla.edu
Weihong Yan
System administrator/Web developer, UCLA
wyan@ucla.edu
John Harada
Co-Principal Investigator, UCD
jjharada@ucdavis.edu URL
Julie Pelletier
Technician, UCD
jpelletier@ucdavis.edu
Ryan Kirkbride
Graduate student, UCD
rkirkbride@ucdavis.edu
Tina Wang
Technician, UCD
tywang@ucdavis.edu
Meryl Hashimoto
Technician, UCD
mhashimoto@ucdavis.edu

Previous NSF Project - Genes Required to Make a Soybean Seed

UCLA

Bob Goldberg, Principal Investigator, UCLA
Steve Horvath, Co-Principal Investigator, UCLA
Anhthu Bui
Shundai Li
Javier Wagmaister
Xinjun Wang
Jungim Hur
Brandon Le
Chen Cheng
Min Chen
Weihong Yan

UCD

John Harada, Co-Principal Investigator, UCD
Sandra Stone
Mark Belmonte
Julie Pelletier
Ryan Kirkbride
Tina Wang
Meryl Hashimoto
Jiong Fei
Xiaohua Lu

Monsanto

Dave Somers
John Danzer