07 Spark RDD编程 综合实例 英文词频统计

Posted 123wen

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了07 Spark RDD编程 综合实例 英文词频统计相关的知识,希望对你有一定的参考价值。

>>> s = txt.lower().split()
>>> dd = {}
>>> for word in s:
... if word not in dd:
... dd[word] = 1
... else:
... dd[word] = dic[word] + 1
...
>>> ss = sorted(dd.items(),key=operator.itemgetter(1),reverse=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name \'operator\' is not defined
>>> import operator
>>> ss = sorted(dditems(),key=operator.itemgetter(1),reverse=True)
>>> print(ss)
[(\'the\', 136), (\'and\', 111), (\'of\', 82), (\'to\', 71), (\'our\', 68), (\'we\', 59), (\'that\', 49), (\'a\', 46), (\'is\', 36), (\'in\', 26), (\'this\', 24), (\'for\', 23), (\'are\', 22), (\'but\', 20), (\'--\', 17), (\'they\', 17), (\'on\', 17), (\'it\', 17), (\'will\', 17), (\'not\', 16), (\'have\', 15), (\'us\', 14), (\'has\', 14), (\'can\', 13), (\'with\', 13), (\'who\', 13), (\'be\', 12), (\'as\', 11), (\'or\', 11), (\'(applause.)\', 11), (\'those\', 11), (\'nation\', 10), (\'you\', 10), (\'their\', 10), (\'new\', 9), (\'these\', 9), (\'us,\', 9), (\'so\', 8), (\'by\', 8), (\'than\', 8), (\'must\', 8), (\'because\', 8), (\'what\', 8), (\'every\', 8), (\'all\', 8), (\'its\', 8), (\'been\', 7), (\'at\', 7), (\'when\', 7), (\'no\', 6), (\'less\', 6), (\'cannot\', 6), (\'let\', 6), (\'too\', 6), (\'common\', 6), (\'was\', 5), (\'time\', 5), (\'people\', 5), (\'only\', 5), (\'know\', 5), (\'nor\', 5), (\'now\', 5), (\'from\', 5), (\'seek\', 4), (\'work\', 4), (\'greater\', 4), (\'whether\', 4), (\'america\', 4), (\'more\', 4), (\'before\', 4), (\'power\', 4), (\'which\', 4), (\'long\', 4), (\'through\', 4), (\'men\', 4), (\'meet\', 4), (\'women\', 4), (\'journey\', 3), (\'up\', 3), (\'between\', 3), (\'were\', 3), (\'say\', 3), (\'where\', 3), (\'an\', 3), (\'god\', 3), (\'may\', 3), (\'last\', 3), (\'economy\', 3), (\'hard\', 3), (\'do\', 3), (\'today\', 3), (\'there\', 3), (\'founding\', 3), (\'hope\', 3), (\'crisis\', 3), (\'words\', 3), (\'carried\', 3), (\'them\', 3), (\'future\', 3), (\'come\', 3), (\'shall\', 3), (\'most\', 3), (\'generation\', 3), (\'day,\', 3), (\'you.\', 3), (\'things\', 3), (\'upon\', 3), (\'force\', 3), (\'i\', 3), (\'spirit\', 3), (\'just\', 3), (\'over\', 3), (\'father\', 3), (\'question\', 3), (\'your\', 3), (\'once\', 3), (\'across\', 3), (\'face\', 2), (\'better\', 2), (\'do,\', 2), (\'why\', 2), (\'did\', 2), (\'do.\', 2), (\'world\', 2), (\'role\', 2), (\'nothing\', 2), (\'oath.\', 2), (\'america:\', 2), (\'each\', 2), (\'faced\', 2), (\'rather\', 2), (\'lower\', 2), (\'peace\', 2), (\'faith\', 2), (\'grows\', 2), (\'ambitions,\', 2), (\'like\', 2), (\'service\', 2), (\'again\', 2), (\'lines\', 2), (\'taken\', 2), (\'thank\', 2), (\'fail\', 2), (\'greatness\', 2), (\'willingness\', 2), (\'trust\', 2), (\'work,\', 2), (\'life.\', 2), (\'mutual\', 2), (\'find\', 2), (\'day\', 2), (\'far\', 2), (\'freedom\', 2), (\'never\', 2), (\'moment\', 2), (\'feed\', 2), (\'one\', 2), (\'duties\', 2), (\'man\', 2), (\'wealth\', 2), (\'small\', 2), (\'earth.\', 2), (\'new.\', 2), (\'care\', 2), (\'prosperity\', 2), (\'bless\', 2), ("we\'ll", 2), (\'(applause)\', 2), (\'some\', 2), (\'my\', 2), (\'understand\', 2), (\'promise\', 2), (\'icy\', 2), (\'child\', 2), (\'friend\', 2), (\'schools\', 2), (\'light\', 2), (\'many\', 2), (\'still\', 2), (\'forth\', 2), (\'restore\', 2), (\'workers\', 2), (\'path\', 2), (\'enduring\', 2), (\'age.\', 2), (\'jobs\', 2), (\'children\', 2), (\'way\', 2), (\'market\', 2), (\'challenges\', 2), (\'american\', 2), (\'part\', 2), (\'west,\', 2), (\'something\', 2), (\'ourselves\', 2), (\'false\', 2), (\'planet.\', 2), (\'ideals\', 2), (\'even\', 2), (\'also\', 2), (\'knowledge\', 2), (\'cooperation\', 2), (\'peace.\', 2), (\'remain\', 2), (\'courage\', 2), (\'stronger\', 2), (\'brave\', 2), (\'extend\', 2), (\'today,\', 2), (\'stand\', 2), (\'government\', 2), (\'war\', 2), (\'out\', 2), (\'meaning\', 2), (\'old\', 2), (\'world,\', 2), (\'big\', 2), (\'charter\', 2), (\'might\', 2), (\'end\', 2), (\'calls\', 2), (\'make\', 2), (\'take\', 2), (\'willing\', 2), (\'generations.\', 2), (\'confidence\', 2), (\'back\', 2), (\'answer\', 2), (\'americans\', 2), ("america\'s", 2), (\'remember\', 2), (\'throughout\', 2), (\'begin\', 2), (\'longer\', 2), (\'era\', 2), (\'health\', 2), (\'success\', 2), (\'america.\', 2), (\'waters\', 2), (\'without\', 2), (\'quiet\', 1), (\'packed\', 1), (\'lose\', 1), (\'specter\', 1), (\'dark\', 1), (\'could\', 1), (\'afford\', 1), (\'generosity\', 1), (\'city\', 1), (\'then\', 1), (\'month,\', 1), (\'nation,\', 1), (\'hard-earned\', 1), (\'spoken\', 1), (\'task\', 1), (\'irresponsibility\', 1), (\'see\', 1), (\'law\', 1), (\'gettysburg,\', 1), (\'advance\', 1), (\'scripture,\', 1), (\'more.\', 1), (\'advancing.\', 1), (\'bind\', 1), (\'risk-takers,\', 1), (\'sacred\', 1), (\'many.\', 1), (\'decisions\', 1), (\'birth\', 1), (\'storms.\', 1), (\'needed\', 1), (\'promises,\', 1), (\'true.\', 1), (\'yes,\', 1), (\'broken\', 1), (\'conflict\', 1), (\'celebrated,\', 1), (\'whose\', 1), (\'perils\', 1), (\'recriminations\', 1), (\'swill\', 1), (\'politics.\', 1), (\'faithful\', 1), (\'done,\', 1), (\'interest\', 1), (\'depends\', 1), (\'world;\', 1), (\'earth;\', 1), (\'sum\', 1), (\'starting\', 1), (\'presidential\', 1), (\'prosperous,\', 1), (\'generation:\', 1), (\'drawn\', 1), (\'riches\', 1), (\'believe\', 1), (\'campfires\', 1), (\'effect.\', 1), (\'jews\', 1), (\'choose\', 1), (\'political\', 1), (\'survive...\', 1), (\'system\', 1), (\'birth,\', 1), (\'themselves.\', 1), (\'apologize\', 1), (\'eyes\', 1), (\'serious\', 1), (\'peoples\', 1), (\'muslims,\', 1), (\'blood.\', 1), (\'suggest\', 1), (\'told\', 1), (\'united,\', 1), (\'whisper\', 1), (\'prosper\', 1), (\'return\', 1), (\'joined\', 1), (\'band\', 1), (\'culture,\', 1), (\'decides\', 1), (\'defeat\', 1), (\'rule\', 1), (\'blood\', 1), (\'former\', 1), (\'revolution\', 1), (\'began.\', 1), (\'capital\', 1), (\'time.\', 1), (\'tolerate\', 1), (\'normandy\', 1), (\'responsibility\', 1), (\'grandest\', 1), (\'reform\', 1), ("you\'ve", 1), (\'yet,\', 1), (\'works\', 1), (\'remained\', 1), (\'inducing\', 1), (\'protecting\', 1), (\'innocents,\', 1), (\'unmatched.\', 1), (\'recall\', 1), (\'endure\', 1), (\'traveled\', 1), (\'beneath\', 1), (\'dangers,\', 1), (\'fuel\', 1), (\'history,\', 1), (\'met\', 1), (\'liberty,\', 1), (\'inevitable,\', 1), (\'distant\', 1), (\'ages.\', 1), (\'year.\', 1), (\'narrow\', 1), (\'hours\', 1), (\'purpose,\', 1), (\'charity,\', 1), (\'reveal\', 1), (\'safety\', 1), (\'does\', 1), (\'served\', 1), (\'raw\', 1), (\'hours.\', 1), (\'reach\', 1), (\'consider\', 1), (\'citizenship.\', 1), (\'arguments\', 1), (\'cynics\', 1), (\'heroes\', 1), (\'hatred.\', 1), (\'indicators\', 1), (\'sow\', 1), (\'relies.\', 1), (\'endured\', 1), (\'sees\', 1), (\'watching\', 1), (\'inventive,\', 1), ("world\'s", 1), (\'some,\', 1), (\'services\', 1), (\'build,\', 1), (\'danger,\', 1), (\'true\', 1), (\'ushering\', 1), (\'friends\', 1), (\'poor\', 1), (\'darkest\', 1), (\'lie\', 1), (\'alone\', 1), (\'watchful\', 1), (\'task.\', 1), (\'wage,\', 1), (\'how\', 1), (\'remembrance\', 1), (\'short,\', 1), (\'memories\', 1), (\'emanates\', 1), (\'nagging\', 1), (\'stairway\', 1), (\'faint-hearted,\', 1), (\'choice\', 1), (\'seeks\', 1), (\'ordered\', 1), (\'everywhere\', 1), (\'inhabit\', 1), (\'discord.\', 1), (\'winter,\', 1), (\'scarcely\', 1), (\'individual\', 1), (\'differences\', 1), (\'woman\', 1), (\'would\', 1), (\'factories.\', 1), (\'met.\', 1), (\'flourish\', 1), (\'shown\', 1), (\'bodies\', 1), (\'liberty\', 1), (\'places\', 1), (\'silencing\', 1), (\'said\', 1), (\'unity\', 1), (\'gratitude\', 1), (\'lost,\', 1), (\'size\', 1), (\'tirelessly\', 1), (\'understanding\', 1), (\'surely\', 1), (\'nations\', 1), (\'manage\', 1), (\'digital\', 1), (\'humbled\', 1), (\'passed\', 1), (\'clouds\', 1), (\'greed\', 1), (\'stale\', 1), (\'pat,\', 1), (\'satisfying\', 1), (\'cause,\', 1), (\'entitle\', 1), (\'plenty,\', 1), (\'humanity\', 1), (\'fellow\', 1), (\'virtue\', 1), (\'progress\', 1), (\'keepers\', 1), (\'traveled.\', 1), (\'he\', 1), (\'week,\', 1), (\'network\', 1), (\'unpleasant\', 1), (\'spin\', 1), (\'words.\', 1), (\'documents.\', 1), (\'terror\', 1), (\'achieve\', 1), (\'failure\', 1), (\'use;\', 1), (\'grids\', 1), (\'muslim\', 1), (\'truths.\', 1), (\'tested\', 1), (\'smaller,\', 1), (\'change\', 1), (\'demanded,\', 1), (\'dignified.\', 1), (\'destiny.\', 1), (\'cling\', 1), (\'leisure\', 1), (\'action,\', 1), (\'span\', 1), (\'saw\', 1), (\'loyalty\', 1), (\'vision\', 1), (\'brings\', 1), (\'expedience\', 1), (\'precious\', 1), (\'forgotten\', 1), (\'dignity.\', 1), (\'putting\', 1), (\'crisis,\', 1), ("society\'s", 1), (\'civil\', 1), (\'clean\', 1), (\'starved\', 1), (\'favors\', 1), (\'amidst\', 1), (\'security\', 1), (\'collective\', 1), (\'fathers,\', 1), (\'firm\', 1), (\'rather,\', 1), (\'forge\', 1), (\'abandoned.\', 1), (\'huddled\', 1), (\'today.\', 1), (\'destroy.\', 1), (\'based\', 1), (\'raise\', 1), (\'example,\', 1), (\'bad\', 1), (\'fate.\', 1), (\'far-reaching\', 1), (\'embody\', 1), (\'plowed\', 1), (\'play,\', 1), (\'reminded\', 1), (\'help\', 1), (\'depended\', 1), (\'hungry\', 1), (\'transition.\', 1), (\'restaurant\', 1), (\'use\', 1), (\'well\', 1), (\'[it]."\', 1), (\'toiled\', 1), (\'set\', 1), (\'ill.\', 1), (\'sake.\', 1), (\'consequence\', 1), (\'americans.\', 1), (\'threaten\', 1), (\'fathers\', 1), (\'roll\', 1), (\'nurture\', 1), (\'threat,\', 1), (\'smoke,\', 1), (\'capitals\', 1), (\'corruption\', 1), (\'strength,\', 1), (\'precisely\', 1), (\'dust\', 1), (\'drafted\', 1), (\'difficult\', 1), (\'standing\', 1), (\'freedom.\', 1), (\'land;\', 1), (\'character\', 1), (\'carry\', 1), (\'apply.\', 1), (\'build\', 1), (\'forward.\', 1), (\'strangled\', 1), (\'universities\', 1), (\'missiles\', 1), (\'giving\', 1), (\'always\', 1), (\'energy\', 1), (\'worldly\', 1), (\'now,\', 1), (\'defining\', 1), (\'earned.\', 1), (\'ready\', 1), (\'restraint.\', 1), (\'language\', 1), (\'ground\', 1), (\'end,\', 1), (\'tempering\', 1), (\'guided\', 1), (\'storms\', 1), (\'prosperous.\', 1), (\'generate\', 1), (\'consumed\', 1), (\'proclaim\', 1), (\'side\', 1), (\'sacrificed\', 1), (\'reaffirm\', 1), (\'forty-four\', 1), (\'principles\', 1), (\'his\', 1), (\'falter;\', 1), (\'fascism\', 1), (\'alongside\', 1), (\'statistics.\', 1), (\'government.\', 1), (\'decent\', 1), (\'fame.\', 1), (\'world...that\', 1), (\'wield\', 1), (\'bigger\', 1), (\'often,\', 1), (\'gift\', 1), (\'nuclear\', 1), (\'few\', 1), (\'sapping\', 1), (\'here\', 1), (\'rising\', 1), (\'them,\', 1), (\'outcome\', 1), (\'productive\', 1), (\'lessen\', 1), (\'life,\', 1), (\'protect\', 1), (\'fear\', 1), (\'account,\', 1), (\'data\', 1), (\'humility\', 1), (\'violence\', 1), (\'people:\', 1), (\'create\', 1), (\'ask\', 1), (\'surest\', 1), (\'cost.\', 1), (\'minds.\', 1), (\'ancestors.\', 1), (\'live\', 1), (\'look,\', 1), (\'jobs,\', 1), (\'off,\', 1), (\'patrol\', 1), (\'hands\', 1), (\'spend\', 1), (\'grateful\', 1), (\'arlington\', 1), (\'job\', 1), (\'next\', 1), (\'obscure\', 1), (\'against\', 1), (\'profound,\', 1), (\'selflessness\', 1), (\'changed,\', 1), (\'yet\', 1), (\'measure\', 1), (\'transform\', 1), (\'nations.\', 1), (\'chapter\', 1), (\'leaders\', 1), (\'convictions.\', 1), (\'itself;\', 1), (\'indifference\', 1), (\'nation.\', 1), (\'courage.\', 1), (\'business\', 1), (\'moment,\', 1), (\'gather\', 1), (\'patriots\', 1), (\'prudent\', 1), (\'legacy.\', 1), (\'intend\', 1), (\'free\', 1), (\'so,\', 1), (\'humble\', 1), (\'conflict,\', 1), (\'waver\', 1), (\'state\', 1), (\'honor\', 1), (\'vital\', 1), (\'unfolds\', 1), (\'undiminished.\', 1), (\'christians\', 1), (\'instruments\', 1), (\'families\', 1), (\'remaking\', 1), (\'shuttered.\', 1), (\'easily\', 1), (\'winter\', 1), (\'doers,\', 1), (\'settling\', 1), (\'pledge\', 1), (\'growth.\', 1), (\'towards\', 1), (\'stained\', 1), (\'minds\', 1), (\'it.\', 1), (\'simply\', 1), (\'whip,\', 1), (\'demand\', 1), (\'chosen\', 1), (\'idea\', 1), (\'done.\', 1), (\'science\', 1), (\'doubt,\', 1), (\'route\', 1), (\'fair\', 1), (\'moments,\', 1), (\'further\', 1), (\'deserts\', 1), (\'knew\', 1), (\'turn\', 1), (\'respect.\', 1), (\'ourselves,\', 1), (\'warming\', 1), (\'source\', 1), (\'cars\', 1), (\'purpose\', 1), (\'blame\', 1), (\'levees\', 1), (\'together.\', 1), (\'labor\', 1), (\'gross\', 1), (\'shed,\', 1), (\'swift.\', 1), (\'come.\', 1), (\'hour\', 1), (\'worked\', 1), (\'curiosity,\', 1), (\'sacrifices\', 1), (\'seize\', 1), (\'grudgingly\', 1), (\'virtue,\', 1), (\'prefer\', 1), (\'roads\', 1), (\'threats\', 1), (\'afford,\', 1), (\'deserve\', 1), (\'tribe\', 1), (\'strengthen\', 1), (\'sun\', 1), (\'raging\', 1), ("care\'s", 1), (\'tides\', 1), (\'non-believers.\', 1), (\'young\', 1), (\'shifted\', 1), (\'nations,\', 1), ("public\'s", 1), (\'tasted\', 1), (\'storm\', 1), (\'pursue\', 1), (\'define\', 1), (\'understood\', 1), (\'fist.\', 1), (\'petty\', 1), (\'sights.\', 1), (\'bold\', 1), (\'price\', 1), (\'generations\', 1), (\'filled\', 1), (\'ills\', 1), (\'months,\', 1), (\'tell\', 1), (\'rights\', 1), (\'programs\', 1), (\'president\', 1), ("children\'s", 1), (\'gladly,\', 1), (\'powerful\', 1), (\'rightful\', 1), (\'justness\', 1), (\'judge\', 1), (\'alliances\', 1), (\'forward,\', 1), (\'safely\', 1), (\'prepare\', 1), (\'end.\', 1), (\'worn-out\', 1), (\'values\', 1), (\'well+++\', 1), (\'reject\', 1), (\'soil\', 1), (\'deceit\', 1), (\'noble\', 1), (\'up,\', 1), (\'subject\', 1), (\'"let\', 1), (\'flow;\', 1), (\'dollars\', 1), (\'oceans\', 1), (\'forebears\', 1), (\'far-off\', 1), (\'required\', 1), (\'join\', 1), (\'short-cuts\', 1), (\'around\', 1), (\'eye,\', 1), (\'consume\', 1), (\'history.\', 1), (\'free,\', 1), (\'chance\', 1), (\'adversaries\', 1), (\'winds\', 1), (\'sturdy\', 1), (\'determination\', 1), (\'mindful\', 1), (\'mountains.\', 1), (\'states\', 1), (\'country,\', 1), (\'down\', 1), (\'relative\', 1), (\'play\', 1), (\'expand\', 1), (\'scale\', 1), (\'faction.\', 1), (\'creed,\', 1), (\'communism\', 1), (\'dying\', 1), (\'weakness.\', 1), (\'enemy\', 1), (\'refused\', 1), (\'regard\', 1), (\'magnificent\', 1), (\'year\', 1), (\'heritage\', 1), (\'bestowed,\', 1), (\'much\', 1), (\'leave\', 1), (\'generation,\', 1), (\'very\', 1), (\'old.\', 1), (\'good\', 1), (\'ago\', 1), (\'plans.\', 1), (\'decline\', 1), (\'borders,\', 1), (\'god-given\', 1), (\'lash\', 1), (\'businesses\', 1), (\'already\', 1), (\'tolerance\', 1), (\'ideals.\', 1), (\'came\', 1), (\'segregation,\', 1), (\'badly\', 1), (\'spirit;\', 1), (\'guardians\', 1), (\'lead\', 1), (\'during\', 1), (\'spirit,\', 1), (\'slaughtering\', 1), (\'emerged\', 1), (\'grievances\', 1), (\'reaffirming\', 1), (\'citizens:\', 1), (\'born,\', 1), (\'dissent,\', 1), (\'happiness.\', 1), (\'honesty\', 1), (\'fought\', 1), (\'expanded\', 1), (\'grace\', 1), (\'great\', 1), (\'place,\', 1), (\'necessity\', 1), (\'patchwork\', 1), (\'people,\', 1), (\'demands\', 1), (\'sahn.\', 1), (\'costly,\', 1), (\'electric\', 1), (\'gift,\', 1), (\'globe\', 1), ("god\'s", 1), (\'wonders\', 1), (\'prosperity,\', 1), (\'hindus,\', 1), (\'assure\', 1), (\'ultimately\', 1), (\'often\', 1), (\'local\', 1), (\'midst\', 1), (\'bush\', 1), (\'equal,\', 1), (\'search\', 1), (\'wisely,\', 1), (\'ways\', 1), (\'short\', 1), (\'coldest\', 1), (\'if\', 1), (\'continue\', 1), (\'afghanistan.\', 1), (\'enjoy\', 1), (\'heart\', 1), (\'stranger\', 1), (\'outside\', 1), (\'history;\', 1), (\'nourish\', 1), (\'aside\', 1), (\'till\', 1), (\'office,\', 1), (\'interests\', 1), (\'recognition\', 1), (\'no,\', 1), (\'dissolve;\', 1), ("parent\'s", 1), (\'act,\', 1), (\'makers\', 1), (\'harness\', 1), (\'things.\', 1), (\'shaped\', 1), (\'product,\', 1), (\'foundation\', 1), (\'kindness\', 1), (\'been;\', 1), (\'all.\', 1), (\'unclench\', 1), (\'mark\', 1), (\'move\', 1), (\'timeless\', 1), (\'currents,\', 1), (\'passed.\', 1), (\'iraq\', 1), (\'full\', 1), (\'choices\', 1), (\'depth\', 1), (\'fallen\', 1), (\'wrong\', 1), (\'ours\', 1), (\'give\', 1), (\'concord\', 1), (\'other\', 1), (\'hardship,\', 1), (\'helps\', 1), (\'khe\', 1), (\'weakened,\', 1), (\'man,\', 1), (\'good.\', 1), (\'sweatshops,\', 1), (\'remains\', 1), (\'earlier\', 1), (\'we,\', 1), (\'accept,\', 1), (\'suffering\', 1), (\'skill\', 1), (\'borne\', 1), (\'shores\', 1), ("firefighter\'s", 1), (\'united\', 1), (\'patriotism\', 1), (\'cut\', 1), (\'opportunity\', 1), (\'mall;\', 1), (\'dogmas\', 1), (\'someday\', 1), (\'foes,\', 1), (\'ability\', 1), (\'fear,\', 1), (\'forward\', 1), (\'uncertain\', 1), (\'instead\', 1), (\'possessions\', 1), (\'hatreds\', 1), (\'shape\', 1), (\'evidence\', 1), (\'held\', 1), (\'small,\', 1), (\'finally\', 1), (\'settled\', 1), (\'governments\', 1), (\'bridges,\', 1), (\'effort,\', 1), (\'habits,\', 1), (\'years\', 1), (\'domestic\', 1), (\'oath\', 1), (\'read\', 1), (\'please.\', 1), (\'fixed\', 1), (\'60\', 1), (\'homes\', 1), (\'died\', 1), (\'river.\', 1), (\'responsibly\', 1), (\'pick\', 1), (\'capacity\', 1), (\'horizon\', 1), (\'country\', 1), (\'outlast\', 1), (\'bitter\', 1), (\'tanks,\', 1), (\'imagine,\', 1), (\'celebration\', 1), (\'hand\', 1), (\'delivered\', 1), (\'resources\', 1), (\'then,\', 1), (\'qualities\', 1), (\'goods\', 1), (\'colleges\', 1), (\'less.\', 1), (\'lay\', 1), (\'given.\', 1), (\'snow\', 1), (\'real.\', 1), (\'break,\', 1), (\'imagination\', 1), (\'off\', 1), (\'race\', 1), (\'village\', 1), (\'childish\', 1), (\'control.\', 1), (\'gathering\', 1), (\'commerce\', 1), (\'defense.\', 1), (\'struggled\', 1), (\'rugged\', 1), (\'alarmed\', 1), (\'run\', 1), ("technology\'s", 1), (\'pass;\', 1), (\'retirement\', 1), (\'understood.\', 1), (\'soon\', 1), (\'farms\', 1), (\'pleasures\', 1), (\'quality\', 1), (\'defense,\', 1), (\'high\', 1), (\'measurable,\', 1), (\'aims\', 1)]
>>>

2. 并比较不同计算框架下编程的优缺点、适用的场景。

–Python

–MapReduce

–Hive

–Spark

Mapreduce,它最本质的两个过程就是Map和Reduce,Map的应用在于我们需要数据一对一的元素的映射转换,比如说进行截取,进行过滤,或者任何的转换操作,这些一对一的元素转换就称作是Map;Reduce主要就是元素的聚合,就是多个元素对一个元素的聚合,比如求Sum等,这就是Reduce。

Mapreduce是Hadoop1.0的核心,Spark出现慢慢替代Mapreduce。那么为什么Mapreduce还在被使用呢?因为有很多现有的应用还依赖于它,它不是一个独立的存在,已经成为其他生态不可替代的部分,比如pig,hive等。

尽管MapReduce极大的简化了大数据分析,但是随着大数据需求和使用模式的扩大,用户的需求也越来越多:

1. 更复杂的多重处理需求(比如迭代计算, ML, Graph);

2. 低延迟的交互式查询需求(比如ad-hoc query)

而MapReduce计算模型的架构导致上述两类应用先天缓慢,用户迫切需要一种更快的计算模型,来补充MapReduce的先天不足。

Spark的出现就弥补了这些不足,我们来了解一些Spark的优势:

1.每一个作业独立调度,可以把所有的作业做一个图进行调度,各个作业之间相互依赖,在调度过程中一起调度,速度快。

2.所有过程都基于内存,所以通常也将Spark称作是基于内存的迭代式运算框架。

3.spark提供了更丰富的算子,让操作更方便。

4.更容易的API:支持Python,Scala和Java

其实spark里面也可以实现Mapreduce,但是这里它并不是算法,只是提供了map阶段和reduce阶段,但是在两个阶段提供了很多算法。如Map阶段的map, flatMap, filter, keyBy,Reduce阶段的reduceByKey, sortByKey, mean, gourpBy, sort等。

Hive算是大数据数据仓库的事实标准吧。Hive可以方法HDFS和Hbase上的数据,impala、spark sql、Presto完全能读取hive建立的数据仓库了的数据。一般情况在批处理任务中还在使用Hive,而在热查询做数据展示中大量使用impala、spark sql或Presto。

Hive提供三种访问接口:Cli,web Ui,HiveServer2。

以上是关于07 Spark RDD编程 综合实例 英文词频统计的主要内容,如果未能解决你的问题,请参考以下文章

07 Spark RDD编程 综合实例 英文词频统计

3.7 Spark RDD编程

Spark编程实战-词频统计

Spark编程实战-词频统计

3.9 Spark 键值对RDD编程

5.RDD操作综合实例