Analysis of the genome sequence of the flowering plant ...

文章推薦指數: 80 %
投票人數:10人

The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Skiptomaincontent Thankyouforvisitingnature.com.YouareusingabrowserversionwithlimitedsupportforCSS.Toobtain thebestexperience,werecommendyouuseamoreuptodatebrowser(orturnoffcompatibilitymodein InternetExplorer).Inthemeantime,toensurecontinuedsupport,wearedisplayingthesitewithoutstyles andJavaScript. Advertisement nature special article AnalysisofthegenomesequenceofthefloweringplantArabidopsisthaliana DownloadPDF AbstractThefloweringplantArabidopsisthalianaisanimportantmodelsystemforidentifyinggenesanddeterminingtheirfunctions.HerewereporttheanalysisofthegenomicsequenceofArabidopsis.Thesequencedregionscover115.4 megabasesofthe125-megabasegenomeandextendintocentromericregions.TheevolutionofArabidopsisinvolvedawhole-genomeduplication,followedbysubsequentgenelossandextensivelocalgeneduplications,givingrisetoadynamicgenomeenrichedbylateralgenetransferfromacyanobacterial-likeancestoroftheplastid.Thegenomecontains25,498genesencodingproteinsfrom11,000families,similartothefunctionaldiversityofDrosophilaandCaenorhabditiselegans—theothersequencedmulticellulareukaryotes.Arabidopsishasmanyfamiliesofnewproteinsbutalsolacksseveralcommonproteinfamilies,indicatingthatthesetsofcommonproteinshaveundergonedifferentialexpansionandcontractioninthethreemulticellulareukaryotes.Thisisthefirstcompletegenomesequenceofaplantandprovidesthefoundationsformorecomprehensivecomparisonofconservedprocessesinalleukaryotes,identifyingawiderangeofplant-specificgenefunctionsandestablishingrapidsystematicwaystoidentifygenesforcropimprovement. DownloadPDF MainTheplantandanimalkingdomsevolvedindependentlyfromunicellulareukaryotesandrepresenthighlycontrastinglifeforms.ThegenomesequencesofC.elegans1andDrosophila2revealthatmetazoansshareagreatdealofgeneticinformationrequiredfordevelopmentalandphysiologicalprocesses,butthesegenomesequencesrepresentalimitedsurveyofmulticellularorganisms.Floweringplantshaveuniqueorganizationalandphysiologicalpropertiesinadditiontoancestralfeaturesconservedbetweenplantsandanimals.Thegenomesequenceofaplantprovidesameansforunderstandingthegeneticbasisofdifferencesbetweenplantsandothereukaryotes,andprovidesthefoundationfordetailedfunctionalcharacterizationofplantgenes.Arabidopsisthalianahasmanyadvantagesforgenomeanalysis,includingashortgenerationtime,smallsize,largenumberofoffspring,andarelativelysmallnucleargenome.TheseadvantagespromotedthegrowthofascientificcommunitythathasinvestigatedthebiologicalprocessesofArabidopsisandhascharacterizedmanygenes3.Tosupporttheseactivities,aninternationalcollaboration(theArabidopsisGenomeInitiative,AGI)begansequencingthegenomein1996.Thesequencesofchromosomes2and4havebeenreported4,5,andtheaccompanyingLettersdescribethesequencesofchromosomes1(ref.6),3(ref.7)and5(ref.8).HerewereportanalysisofthecompletedArabidopsisgenomesequence,includingannotationofpredictedgenesandassignmentoffunctionalcategories.Wealsodescribechromosomedynamicsandarchitecture,thedistributionoftransposableelementsandotherrepeats,theextentoflateralgenetransferfromorganelles,andthecomparisonofthegenomesequenceandstructuretothatofotherArabidopsisaccessions(distinctivelinesmaintainedbysingle-seeddescent)andplantspecies.Thisreportisthesummationofworkbyexpertsinterestedinmanybiologicalprocessesselectedtoilluminateplant-specificfunctionsincludingdefence,photomorphogenesis,generegulation,development,metabolism,transportandDNArepair.Theidentificationofmanynewmembersofreceptorfamilies,cellularcomponentsforplant-specificfunctions,genesofbacterialoriginwhosefunctionsarenowintegratedwithtypicaleukaryoticcomponents,independentevolutionofseveralfamiliesoftranscriptionfactors,andsuggestionsofasyetuncharacterizedmetabolicpathwaysareafewmorehighlightsofthiswork.Theimplicationsofthesediscoveriesarenotonlyrelevantforplantbiologists,butwillalsoaffectagriculturalscience,evolutionarybiology,bioinformatics,combinatorialchemistry,functionalandcomparativegenomics,andmolecularmedicine. TheArabidopsisGenomeInitiative Threegroupscontributedtotheworkreportedhere.TheGenomeSequencinggroups,arrangedhereinorderofsequencecontribution,sequencedandannotatedassignedchromosomalregions.TheGenomeAnalysisgroupcarriedouttheanalysesdescribed.TheContributingAuthorsinterpretedthegenomeanalyses,incorporatingotherdataandanalyses,withrespecttoselectedbiologicaltopics. GenomeSequencingGroups SamirKaul,HeanL.Koo,JenniferJenkins,MichaelRizzo,TimothyRooney,LukeJ.Tallon,TamaraFeldblyum,WilliamNierman,Maria-InesBenito,M-IXiaoyingLin,ChristopherD.Town,J.CraigVenter,ClaireM.Fraser,SatoshiTabata,YasukazuNakamura,TakakazuKaneko,ShuseiSato,ErikaAsamizu,TomohikoKato,HirokazuKotani,ShigemiSasamoto,JosephR.Ecker,AthanasiosTheologis,NancyA.Federspiel,CurtisJ.Palm,BrianI.Osborne,PaulShinn,AaronB.Conway,ValentinaS.Vysotskaia,KenDewar,LaneConn,CatherineA.Lenz,ChristopherJ.Kim,NancyF.Hansen,ShirleyX.Liu,EugenBuehler,HootanAltafi,HitomiSakano,PatrickDunn,BaoLam,PaulK.Pham,QiminChao,MichelleNguyen,GuixiaYu,HuamingChen,AudreySouthwick,JeongMiLee,MollyMiranda,MitsueJ.Toriumi,RonaldW.Davis. EuropeanUnionChromosome4and5SequencingConsortium R.Wambutt,G.Murphy,A.Düsterhöft,W.Stiekema,T.Pohl,K.-D.Entian,N.Terryn,G.Volckaert. EuropeanChromosome3SequencingConsortium M.Salanoubat,N.Choisne,M.Rieger,W.Ansorge,M.Unseld,B.Fartmann,G.Valle,F.Artiguenave,J.Weissenbach,F.Quetier. TheColdSpringHarborandWashingtonUniversityGenomeSequencingCenterConsortium RichardK.Wilson,MelissadelaBastide,M.Sekhon,EmilyHuang,LoriSpiegel,LidiaGnoj,K.Pepin,J.Murray,D.Johnson,KristinaHabermann,NeilayDedhia,LarryParnell,RaymondPreston,L.Hillier,EllsonChen,M.Marra,RobertMartienssen,W.RichardMcCombie. GenomeAnalysisGroup KlausMayer,OwenWhite,MichaelBevan,KaiLemcke,ToddH.Creasy,CordBielke,BrianHaas,DirkHaase,RamaMaiti,StephenRudd,JeremyPeterson,HeikoSchoof,DimitrijFrishman,BurkhardMorgenstern,PauloZaccaria,MariaErmolaeva,MihaelaPertea,JohnQuackenbush,NataliaVolfovsky,DongyingWu,ToddM.Lowe,StevenL.Salzberg,Hans-WernerMewes. CONTRIBUTINGAUTHORS ComparativeanalysisofthegenomesofA.thalianaaccessions S.Rounsley,D.Bush,S.Subramaniam,I.Levin,S.Norris. ComparativeanalysisofthegenomesofA.thalianaandothergenera R.Schmidt,A.Acarkan,I.Bancroft. Integrationofthethreegenomesintheplantcell:theextentofproteinandnucleicacidtrafficbetweennucleus,plastidsandmitochondria F.Quetier,A.Brennicke,J.A.Eisen. Transposableelements T.Bureau,B.-A.Legault,Q.-H.Le,N.Agrawal,Z.Yu,R.Martienssen. rDNA,telomeresandcentromeres G.P.Copenhaver,S.Luo,C.S.Pikaard,D.Preuss. Membranetransport I.T.Paulsen,M.Sussman. DNArepairandrecombination A.B.Britt,J.A.Eisen. Generegulation D.A.Selinger,R.Pandey,D.W.Mount,V.L.Chandler,R.A.Jorgensen,C.Pikaard. Cellularorganization G.Juergens. Development E.M.Meyerowitz. Signaltransduction J.R.Ecker,A.Theologis. Recognitionofandresponsetopathogens J.Dangl,J.D.G.Jones. Photomorphogenesisandphotosynthesis M.Chen,J.Chory. Metabolism C.Somerville.OverviewofsequencingstrategyWeusedlarge-insertbacterialartificialchromosome(BAC),phage(P1)andtransformation-competentartificialchromosome(TAC)libraries9,10,11,12astheprimarysubstratesforsequencing.Earlystagesofgenomesequencingused79cosmidclones.PhysicalmapsofthegenomeofaccessionColumbiawereassembledbyrestrictionfragment‘fingerprint’analysisofBACclones13,byhybridization14orpolymerasechainreaction(PCR)15ofsequence-taggedsitesandbyhybridizationandSouthernblotting16.Theresultingmapswereintegrated(http://nucleus/cshl.org/arabmaps/)withthegeneticmapandprovidedafoundationforassemblingsetsofcontigsintosequence-readytilingpaths.Endsequence(http://www.tigr.org/tdb/at/abe/bac_end_search.html)of47,788BACcloneswasusedtoextendcontigsfromBACSanchoredbymarkercontentandtointegratecontigs.Tencontigsrepresentingthechromosomearmsandcentromericheterochromatinwereassembledfrom1,569BAC,TAC,cosmidandP1clones(averageinsertsize100kilobases(kb)).Twenty-twoPCRproductswereamplifieddirectlyfromgenomicDNAandsequencedtolinkregionsnotcoveredbyclonedDNAortooptimizetheminimaltilingpath.Telomeresequencewasobtainedfromspecificyeastartificialchromosome(YAC)andphageclones,andfrominversepolymerasechainreaction(IPCR)productsderivedfromgenomicDNA.Clonefingerprints,togetherwithBACendsequences,weregenerallyadequateforselectionofclonesforsequencingovermostofthegenome.Inthecentromericregions,thesephysicalmappingmethodsweresupplementedwithgeneticmappingtoidentifycontigpositionsandorientation17.Selectedclonesweresequencedonbothstrandsandassembledusingstandardtechniques.Comparisonofindependentlyderivedsequenceofoverlappingregionsandindependentreassemblysequencedclonesrevealedaccuracyratesbetween99.99and99.999%.OverhalfofthesequencedifferenceswerebetweengenomicandBACclonesequence.Allavailablesequencedgeneticmarkerswereintegratedintosequenceassembliestoverifysequencecontigs4,5,6,7,8.Thetotallengthofsequencedregions,whichextendfromeitherthetelomeresorribosomalDNArepeatstothe180-base-pair(bp)centromericrepeats,is115,409,949bp(Table1).EstimatesoftheunsequencedcentromericandrDNArepeatregionsmeasureroughly10megabases(Mb),yieldingagenomesizeofabout125 Mb,intherangeofthe50–150 Mbhaploidcontentestimatedbydifferentmethods18.Ingeneral,featuressuchasgenedensity,expressionlevelsandrepeatdistributionareveryconsistentacrossthefivechromosomes(Fig.1),andthesearedescribedindetailinreportsonindividualchromosomes4,5,6,7,8andintheanalysisofcentromere,telomereandrDNAsequences.Table1SummarystatisticsoftheArabidopsisgenomeFullsizetableFigure1:RepresentationoftheArabidopsischromosomes.Eachchromosomeisrepresentedasacolouredbar.Sequencedportionsarered,telomericandcentromericregionsarelightblue,heterochromaticknobsareshownblackandtherDNArepeatregionsaremagenta.Theunsequencedtelomeres2Nand4Naredepictedwithdashedlines.Telomeresarenotdrawntoscale.ImagesofDAPI-stainedchromosomeswerekindlysuppliedbyP.Fransz.Thefrequencyoffeatureswasgivenpseudo-colourassignments,fromred(highdensity)todeepblue(lowdensity).Genedensity(‘Genes’)rangedfrom38per100 kbto1geneper100 kb;expressedsequencetagmatches(‘ESTs’)rangedfrommorethan200per100 kbto1per100 kb.Transposableelementdensities(‘TEs’)rangedfrom33per100 kbto1per100 kb.Mitochondrialandchloroplastinsertions(‘MT/CP’)wereassignedblackandgreentickmarks,respectively.TransferRNAsandsmallnucleolarRNAs(‘RNAs’)wereassignedblackandredticksmarks,respectively.FullsizeimageWeusedtRNAscan-SE1.21(ref.19)andmanualinspectiontoidentify589cytoplasmictransferRNAs,27organelle-derivedtRNAsand13pseudogenes—morethaninanyothergenomesequencedtodate.All46tRNAfamiliesneededtodecodeallpossible61codonswerefound,definingthecompletenessofthefunctionalset.SeveralhighlyamplifiedfamiliesoftRNAswerefoundonthesamestrand6;excludingthese,eachaminoacidisdecodedby10–41tRNAs.ThespliceosomalRNAs(U1,U2,U4,U5,U6)haveallbeenexperimentallyidentifiedinArabidopsis.ThepreviouslyidentifiedsequencesforallRNAswerefoundinthegenome,exceptforU5wherethemostsimilarcounterpartwas92%identical.Between10and16copiesofeachsmallnuclearRNA(snRNA)werefoundacrossallchromosomes,dispersedassingletonsorinsmallgroups.ThesmallnucleolarRNAs(snoRNAs)consistoftwosubfamilies,theC/DboxsnoRNAs,whichincludes36Arabidopsisgenes,andtheH/ACAboxsnoRNAs,forwhichnomembershavebeenidentifiedinArabidopsis.U3isthemostnumerousoftheC/DboxsnoRNAs,witheightcopiesfoundinthegenome.Weidentifiedforty-fiveadditionalC/DboxsnoRNAsusingsoftware(www.rna.wustl.edu/snoRNAdb/)thatdetectssnoRNAsthatguideribosemethylationofribosomalRNA.Acombinationofalgorithms,alloptimizedwithparametersbasedonknownArabidopsisgenestructures,wasusedtodefinegenestructure.Weusedsimilaritiestoknownproteinandexpressedsequencetag(EST)sequencetorefinegenemodels.Eightypercentofthegenestructurespredictedbythethreecentresinvolvedwerecompletelyconsistent,93%ofESTsmatchedgenemodels,andlessthan1%ofESTsmatchedpredictednon-codingregions,indicatingthatmostpotentialgeneswereidentified.Thesensitivityandselectivityofthegenepredictionsoftwareusedinthisreporthasbeencomprehensivelyandindependentlyassessed20.The25,498genespredicted(Table1)isthelargestgenesetpublishedtodate:C.elegans1has19,099genesandDrosophila213,601genes.ArabidopsisandC.eleganshavesimilargenedensity,whereasDrosophilahasalowergenedensity;Arabidopsisalsohasasignificantlygreaterextentoftandemgeneduplicationsandsegmentalduplications,whichmayaccountforitslargergeneset.TherDNArepeatregionsonchromosomes2and4werenotsequencedbecauseoftheirknownrepetitivestructureandcontent.Thecentromericregionsarenotcompletelysequencedowingtolargeblocksofmonotonicrepeatssuchas5SrDNAand180-bprepeats.Thesequencecontinuestobeextendedfurtherintocentromericandotherregionsofcomplexsequence.CharacterizationofthecodingregionsToassessthesimilaritiesanddifferencesoftheArabidopsisgenecomplementcomparedwithothersequencedeukaryoticgenomes,weassignedfunctionalcategoriestothecompletesetofArabidopsisgenes.Forchromosome4genesandtheyeastgenome,predictedfunctionswerepreviouslymanuallyassigned5,21.Allotherpredictedproteinswereautomaticallyassignedtothesefunctionalcategories22,assumingthatconservedsequencesreflectcommonfunctionalrelationships.Thefunctionsof69%ofthegeneswereclassifiedaccordingtosequencesimilaritytoproteinsofknownfunctioninallorganisms;only9%ofthegeneshavebeencharacterizedexperimentally(Fig. 2a).GenerallysimilarproportionsofgeneproductswerepredictedtobetargetedtothesecretorypathwayandmitochondriainArabidopsisandyeast,andupto14%ofthegeneproductsarelikelytobetargetedtothechloroplast(Table1).Thesignificantproportionofgeneswithpredictedfunctionsinvolvedinmetabolism,generegulationanddefenceisconsistentwithpreviousanalyses5.Roughly30%ofthe25,498predictedgeneproducts,(Fig.2a),comprisingbothplant-specificproteinsandproteinswithsimilaritytogenesofunknownfunctionfromotherorganisms,couldnotbeassignedtofunctionalcategories.Figure2:FunctionalanalysisofArabidopsisgenes.a,ProportionofpredictedArabidopsisgenesindifferentfunctionalcategories.b,Comparisonoffunctionalcategoriesbetweenorganisms.SubsetsoftheArabidopsisproteomecontainingallproteinsthatfallintoacommonfunctionalclasswereassembled.EachsubsetwassearchedagainstthecompletesetoftranslationsfromEscherichiacoli,Synechocystissp.PCC6803,Saccharomycescerevisae,Drosophila,C.elegansandaHomosapiensnon-redundantproteindatabase.ThepercentageofArabidopsisproteinsinaparticularsubsetthathadaBLASTPmatchwithE ≤ 10-30totherespectivereferencegenomeisshown.ThisreflectsthemeasureofsequenceconservationofproteinswithinthisparticularfunctionalcategorybetweenArabidopsisandtherespectivereferencegenome.yaxis,0.1=10%.FullsizeimageTocomparethefunctionalcatagoriesinmoredetail,wecompareddatafromthecompletegenomesofEscherichiacoli23,Synechocystissp.24,Saccharomycescerevisiae21,C.elegans1andDrosophila2,andanon-redundantproteinsetofHomosapiens,withtheArabidopsisgenomedata(Fig.2b),usingastringentBLASTPthresholdvalueofE<10-30.TheproportionofArabidopsisproteinshavingrelatedcounterpartsineukaryoticgenomesvariesbyafactorof2to3dependingonthefunctionalcategory.Only8–23%ofArabidopsisproteinsinvolvedintranscriptionhaverelatedgenesinothereukaryoticgenomes,reflectingtheindependentevolutionofmanyplanttranscriptionfactors.Incontrast,48–60%ofgenesinvolvedinproteinsynthesishavecounterpartsintheothereukaryoticgenomes,reflectinghighlyconservedgenefunctions.TherelativelyhighproportionofmatchesbetweenArabidopsisandbacterialproteinsinthecategories‘metabolism’and‘energy’reflectsboththeacquisitionofbacterialgenesfromtheancestoroftheplastidandhighconservationofsequencesacrossallspecies.Finally,acomparisonbetweenunicellularandmulticellulareukaryotesindicatesthatArabidopsisgenesinvolvedincellularcommunicationandsignaltransductionhavemorecounterpartsinmulticellulareukaryotesthaninyeast,reflectingtheneedforsetsofgenesforcommunicationinmulticellularorganisms.PronouncedredundancyintheArabidopsisgenomeisevidentinsegmentalduplicationsandtandemarrays,andmanyothergeneswithhighlevelsofsequenceconservationarealsoscatteredoverthegenome.SequencesimilarityexceedingaBLASTPvalueE 



請為這篇文章評分?