- Read the text file.text = open("filename.txt").read()
- Replace non-alphanumeric characters as a whitespace.import re text = re.sub('[^w&^d]', ' ', text)
- Change all characters to lowercase.text = text.lower()
- Split words into a list.text = text.split()
- Display the number for words in the text file.len(text)
- Display the number of unique words in the text file.len(set(text))
- Display the number of occurrences for each word.from collections import defaultdict
wordsCount = defaultdict(int)
for word in text:
wordsCount[word] += 1
for word, num in wordsCount.items():
print(word, num)
Showing posts with label Set. Show all posts
Showing posts with label Set. Show all posts
Monday, June 1, 2009
Count the number of words using Python
This article represents a way to count the number of words, the number of unique words, and the number of each word occurrences in a text file.
Labels:
collections
,
count
,
defaultdict
,
defaultdict.items
,
file
,
len
,
number
,
open
,
Python
,
re
,
read
,
Set
,
words
Saturday, May 16, 2009
The intersection of two Python lists using Set
Set is unordered unique collection of objects.
We can use Set to find the intersection of two lists/tuples.
Suppose that we have two lists.
To find the intersection of the two lists, first change both x and y to the Set type, then use the intersection method (&).
If we need the output in list type, apply the list command, to the output.
We can use Set to find the intersection of two lists/tuples.
Suppose that we have two lists.
>>> x = [1, 2, 3, 4, 5]
>>> y = [2, 6, 7, 3]
>>> y = [2, 6, 7, 3]
To find the intersection of the two lists, first change both x and y to the Set type, then use the intersection method (&).
>>> set(x) & set(y)
{2, 3}
{2, 3}
If we need the output in list type, apply the list command, to the output.
>>> list(set(x) & set(y))
[2, 3]
[2, 3]
Labels:
intersection
,
list
,
Python
,
Set
,
tuple
Subscribe to:
Posts
(
Atom
)