如何通过识别python Hadoop中的键来处理Mapreduce

Question

我有两个来自map函数的关键值：NY和Others。所以，我的关键输出是：NY 1，或者其他1.只有这两种情况。

我的地图功能：

    #!/usr/bin/env python
    import sys
    import csv
    import string

    reader = csv.reader(sys.stdin, delimiter=',')
    for entry in reader:
        if len(entry) == 22:
            registration_state=entry[16]
            print('{0}	{1}'.format(registration_state,int(1)))

现在我需要使用reducer来处理地图输出。我的减少：

#!/usr/bin/env python
import sys
import string


currentkey = None
ny = 0
other = 0
# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:

    #Remove leading and trailing whitespace
    line = line.strip()

    #Get key/value 
    key, values = line.split('	', 1)  
    values = int(values)
#If we are still on the same key...
    if key == 'NY':
        ny = ny + 1
    #Otherwise, if this is a new key...
    else:
        #If this is a new key and not the first key we've seen
        other = other + 1


#Compute/output result for the last key 
print('{0}	{1}'.format('NY',ny))
print('{0}	{1}'.format('Other',other))

从这些中，mapreduce将提供两个输出结果文件，每个文件包含NY和Others输出。即一个包含：NY 1248，其他4677;另一个：纽约0，其他1000.这是因为两个缩小了地图的输出分割，所以生成了两个结果，通过组合（合并）最终输出将得到结果。

但是，我想更改我的reduce或map函数，以便只对一个键进行每个简化处理，即一个仅减少处理NY作为键值，另一个处理其他。我希望有一个结果包含：

NY 1258, Others 0; Another: NY 0, Others 5677.

如何调整我的功能以达到我期望的效果？