- Read the text file.text = open("filename.txt").read()
- Replace non-alphanumeric characters as a whitespace.import re text = re.sub('[^w&^d]', ' ', text)
- Change all characters to lowercase.text = text.lower()
- Split words into a list.text = text.split()
- Display the number for words in the text file.len(text)
- Display the number of unique words in the text file.len(set(text))
- Display the number of occurrences for each word.from collections import defaultdict
wordsCount = defaultdict(int)
for word in text:
wordsCount[word] += 1
for word, num in wordsCount.items():
print(word, num)
Showing posts with label re. Show all posts
Showing posts with label re. Show all posts
Monday, June 1, 2009
Count the number of words using Python
This article represents a way to count the number of words, the number of unique words, and the number of each word occurrences in a text file.
Labels:
collections
,
count
,
defaultdict
,
defaultdict.items
,
file
,
len
,
number
,
open
,
Python
,
re
,
read
,
Set
,
words
Subscribe to:
Posts
(
Atom
)