Showing posts with label collections. Show all posts
Showing posts with label collections. Show all posts

Friday, July 3, 2009

Count the number of occurences using Counter() in Python

Counter is an object of collections module, and it is a dict subclass for counting hashable objects, in Python 3.

To count the number of elements occurrences in a list

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})

Count the number of alphabet occurrences in a text

>>> from collections import Counter
>>> Counter("This is a test")
Counter({' ': 3, 's': 3, 'i': 2, 't': 2, 'a': 1, 'e': 1, 'h': 1, 'T': 1})

To find the n most common words in a text

n is the number of most common words to find
Counter(the_text).most_common(3)
For more information, read The Python Standard Library

Monday, June 1, 2009

Count the number of words using Python

This article represents a way to count the number of words, the number of unique words, and the number of each word occurrences in a text file.
  1. Read the text file.
    text = open("filename.txt").read()
  2. Replace non-alphanumeric characters as a whitespace.
    import re text = re.sub('[^w&^d]', ' ', text)
  3. Change all characters to lowercase.
    text = text.lower()
  4. Split words into a list.
    text = text.split()
  5. Display the number for words in the text file.
    len(text)
  6. Display the number of unique words in the text file.
    len(set(text))
  7. Display the number of occurrences for each word.
    from collections import defaultdict
    wordsCount = defaultdict(int)
    for word in text:
        wordsCount[word] += 1
        for word, num in wordsCount.items():
            print(word, num)