Python: List Files in a Directory - Stack Abuse

文章推薦指數: 80 %
投票人數:10人

The command ls -p . ... lists directory files for the current directory, and adds the delimiter / at the end of the name of each subdirectory, ... SALogotypeArticlesLearnWritewithUsSigninSignupPythonJavaScriptJavaHomeArticlesPython:ListFilesinaDirectoryFrankHofmannIprefertoworkwithPythonbecauseitisaveryflexibleprogramminglanguage,andallowsmetointeractwiththeoperatingsystemeasily.Thisalsoincludesfilesystemfunctions.Tosimplylistfilesinadirectorythemodulesos,subprocess,fnmatch,andpathlibcomeintoplay.Thefollowingsolutionsdemonstratehowtousethesemethodseffectively. Usingos.walk() Theosmodulecontainsalonglistofmethodsthatdealwiththefilesystem,andtheoperatingsystem.Oneofthemiswalk(),whichgeneratesthefilenamesinadirectorytreebywalkingthetreeeithertop-downorbottom-up(withtop-downbeingthedefaultsetting). os.walk()returnsalistofthreeitems.Itcontainsthenameoftherootdirectory,alistofthenamesofthesubdirectories,andalistofthefilenamesinthecurrentdirectory.Listing1showshowtowritethiswithonlythreelinesofcode.ThisworkswithbothPython2and3interpreters. Listing1:Traversingthecurrentdirectoryusingos.walk() importos forroot,dirs,filesinos.walk("."): forfilenameinfiles: print(filename) UsingtheCommandLineviaSubprocess Note:Whilethisisavalidwaytolistfilesinadirectory,itisnotrecommendedasitintroducestheopportunityforcommandinjectionattacks. AsalreadydescribedinthearticleParallelProcessinginPython,thesubprocessmoduleallowsyoutoexecuteasystemcommand,andcollectitsresult.Thesystemcommandwecallinthiscaseisthefollowingone: Example1:Listingthefilesinthecurrentdirectory $ls-p.|grep-v/$ Thecommandls-p.listsdirectoryfilesforthecurrentdirectory,andaddsthedelimiter/attheendofthenameofeachsubdirectory,whichwe'llneedinthenextstep.Theoutputofthiscallispipedtothegrepcommandthatfiltersthedataasweneedit. Theparameters-v/$excludeallthenamesofentriesthatendwiththedelimiter/.Actually,/$isaRegularExpressionthatmatchesallthestringsthatcontainthecharacter/astheverylastcharacterbeforetheendofthestring,whichisrepresentedby$. Thesubprocessmoduleallowstobuildrealpipes,andtoconnecttheinputandoutputstreamsasyoudoonacommandline.Callingthemethodsubprocess.Popen()opensacorrespondingprocess,anddefinesthetwoparametersnamedstdinandstdout. Listing2showshowtoprogramthat.Thefirstvariablelsisdefinedasaprocessexecutingls-p.thatoutputstoapipe.That'swhythestdoutchannelisdefinedassubprocess.PIPE.Thesecondvariablegrepisdefinedasaprocess,too,butexecutesthecommandgrep-v/$,instead. Toreadtheoutputofthelscommandfromthepipe,thestdinchannelofgrepisdefinedasls.stdout.Finally,thevariableendOfPipereadstheoutputofgrepfromgrep.stdoutthatisprintedtostdoutelement-wiseinthefor-loopbelow.TheoutputisseeninExample2. Listing2:Definingtwoprocessesconnectedwithapipe importsubprocess #definethelscommand ls=subprocess.Popen(["ls","-p","."], stdout=subprocess.PIPE, ) #definethegrepcommand grep=subprocess.Popen(["grep","-v","/$"], stdin=ls.stdout, stdout=subprocess.PIPE, ) #readfromtheendofthepipe(stdout) endOfPipe=grep.stdout #outputthefileslinebyline forlineinendOfPipe: print(line) Example2:Runningtheprogram $pythonfind-files3.py find-files2.py find-files3.py find-files4.py ... ThissolutionworksquitewellwithbothPython2and3,butcanweimproveitsomehow?Letushavealookattheothervariants,then. Combiningosandfnmatch Asyouhaveseenbeforethesolutionusingsubprocessesiselegantbutrequireslotsofcode.Instead,letuscombinethemethodsfromthetwomodulesos,andfnmatch.ThisvariantworkswithPython2and3,too. Asthefirststep,weimportthetwomodulesos,andfnmatch.Next,wedefinethedirectorywewouldliketolistthefilesusingos.listdir(),aswellasthepatternforwhichfilestofilter.InaforloopweiterateoverthelistofentriesstoredinthevariablelistOfFiles. Finally,withthehelpoffnmatchwefilterfortheentrieswearelookingfor,andprintthematchingentriestostdout.Listing3containsthePythonscript,andExample3thecorrespondingoutput. Listing3:Listingfilesusingosandfnmatchmodule importos,fnmatch listOfFiles=os.listdir('.') pattern="*.py" forentryinlistOfFiles: iffnmatch.fnmatch(entry,pattern): print(entry) Example3:TheoutputofListing3 DataVisualizationinPythonDataVisualizationinPython,acourseforbeginnertointermediatePythondevelopers,willguideyouthroughsimpledatamanipulationwithPandas,covercoreplottinglibraries...Tryitout$python2find-files.py find-files.py find-files2.py find-files3.py ... Usingos.listdir()andGenerators Insimpleterms,ageneratorisapowerfuliteratorthatkeepsitsstate.Tolearnmoreaboutgenerators,checkoutoneofourpreviousarticles,PythonGenerators. Thefollowingvariantcombinesthelistdir()methodoftheosmodulewithageneratorfunction.Thecodeworkswithbothversions2and3ofPython. Asyoumayhavenotedbefore,thelistdir()methodreturnsthelistofentriesforthegivendirectory.Themethodos.path.isfile()returnsTrueifthegivenentryisafile.Theyieldoperatorquitsthefunctionbutkeepsthecurrentstate,andreturnsonlythenameoftheentrydetectedasafile.Thisallowsustoloopoverthegeneratorfunction(seeListing4).TheoutputisidenticaltotheonefromExample3. Listing4:Combiningos.listdir()andageneratorfunction importos deffiles(path): forfileinos.listdir(path): ifos.path.isfile(os.path.join(path,file)): yieldfile forfileinfiles("."): print(file) Usepathlib Thepathlibmoduledescribesitselfasawayto"Parse,build,test,andotherwiseworkonfilenamesandpathsusinganobject-orientedAPIinsteadoflow-levelstringoperations".Thissoundscool-let'sdoit.StartingwithPython3,themodulebelongstothestandarddistribution. InListing5,wefirstdefinethedirectory.Thedot(".")definesthecurrentdirectory.Next,theiterdir()methodreturnsaniteratorthatyieldsthenamesofallthefiles.Inaforloopweprintthenameofthefilesoneaftertheother. Listing5:Readingdirectorycontentswithpathlib importpathlib #definethepath currentDirectory=pathlib.Path('.') forcurrentFileincurrentDirectory.iterdir(): print(currentFile) Again,theoutputisidenticaltotheonefromExample3. Asanalternative,wecanretrievefilesbymatchingtheirfilenamesbyusingsomethingcalledaglob.Thiswaywecanonlyretrievethefileswewant.Forexample,inthecodebelowweonlywanttolistthePythonfilesinourdirectory,whichwedobyspecifying"*.py"intheglob. Listing6:Usingpathlibwiththeglobmethod importpathlib #definethepath currentDirectory=pathlib.Path('.') #definethepattern currentPattern="*.py" forcurrentFileincurrentDirectory.glob(currentPattern): print(currentFile) Usingos.scandir() InPython3.6,anewmethodbecomesavailableintheosmodule.Itisnamedscandir(),andsignificantlysimplifiesthecalltolistfilesinadirectory. Havingimportedtheosmodulefirst,usethegetcwd()methodtodetectthecurrentworkingdirectory,andsavethisvalueinthepathvariable.Next,scandir()returnsalistofentriesforthispath,whichwetestforbeingafileusingtheis_file()method. Listing7:Readingdirectorycontentswithscandir() importos #detectthecurrentworkingdirectory path=os.getcwd() #readtheentries withos.scandir(path)aslistOfEntries: forentryinlistOfEntries: #printallentriesthatarefiles ifentry.is_file(): print(entry.name) Again,theoutputofListing7isidenticaltotheonefromExample3. Conclusion Thereisdisagreementwhichversionisthebest,whichisthemostelegant,andwhichisthemost"pythonic"one.Ilikethesimplicityoftheos.walk()methodaswellastheusageofboththefnmatchandpathlibmodules. Thetwoversionswiththeprocesses/pipingandtheiteratorrequireadeeperunderstandingofUNIXprocessesandPythonknowledge,sotheymaynotbebestforallprogrammersduetotheiradded(andunnecessary)complexity. Tofindananswertowhichversionisthequickestone,thetimeitmoduleisquitehandy.Thismodulecountsthetimethathaselapsedbetweentwoevents. Tocompareallofoursolutionswithoutmodifyingthem,weuseaPythonfunctionality:callthePythoninterpreterwiththenameofthemodule,andtheappropriatePythoncodetobeexecuted.TodothatforallthePythonscriptsatonceashellscripthelps(Listing8). Listing8:Evaluatingtheexecutiontimeusingthetimeitmodule #!/bin/bash forfilenamein*.py;do echo"$filename:" cat$filename|python3-mtimeit echo"" done ThetestsweretakenusingPython3.5.3.Theresultisasfollowswhereasos.walk()givesthebestresult.RunningthetestswithPython2returnsdifferentvaluesbutdoesnotchangetheorder-os.walk()isstillontopofthelist. Method Resultfor100,000,000loops os.walk 0.0085usecperloop subprocess/pipe 0.00859usecperloop os.listdir/fnmatch 0.00912usecperloop os.listdir/generator 0.00867usecperloop pathlib 0.00854usecperloop pathlib/glob 0.00858usecperloop os.scandir 0.00856usecperloop Acknowledgements TheauthorwouldliketothankGeroldRupprechtforhissupport,andcommentswhilepreparingthisarticle. #pythonLastUpdated:February25th,2019Wasthisarticlehelpful?Youmightalsolike...PythonDocstringsHandlingUnixSignalsinPythonTheBestMachineLearningLibrariesinPythonListsvsTuplesinPythonGuidetoSendingHTTPRequestsinPythonwithurllib3Improveyourdevskills!Gettutorials,guides,anddevjobsinyourinbox.EmailaddressSignUpNospamever.Unsubscribeatanytime.ReadourPrivacyPolicy.FrankHofmannAuthorITdeveloper,trainer,andauthor.CoauthoroftheDebianPackageManagementBook(http://www.dpmb.org/). InthisarticleUsingos.walk()UsingtheCommandLineviaSubprocessCombiningosandfnmatchUsingos.listdir()andGeneratorsUsepathlibUsingos.scandir()ConclusionAcknowledgementsCourseDataVisualizationinPython#python#datavisualizationDataVisualizationinPython,acourseforbeginnertointermediatePythondevelopers,willguideyouthroughsimpledatamanipulationwithPandas,covercoreplottinglibraries...DetailsWantaremotejob? MoreJobsJobsbyHireRemote.ioTwitterGitHubFacebook©2013-2022StackAbuse.Allrightsreserved.DisclosurePrivacyTerms



請為這篇文章評分?