Tag Archives: Python 3 string.punctuation

Tips for Python 3 string.punctuation

preface
When manipulating strings, if you feel like you’re writing something complicated, try the String module, which has a lot of useful properties.

>>> import string
>>> dir(string)
['Formatter', 'Template', '_ChainMap', '_TemplateMetaclass', '__all__', '__built
ins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__packag
e__', '__spec__', '_re', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_u
ppercase', 'capwords', 'digits', 'hexdigits', 'octdigits', 'printable', 'punctua
tion', 'whitespace']
>>> string.ascii_lowercase  #All lowercase letters
'abcdefghijklmnopqrstuvwxyz'
>>> string.ascii_uppercase  #All upper case letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> string.hexdigits        #All hexadecimal characters
'0123456789abcdefABCDEF'
>>> string.whitespace       #All blank characters
' \t\n\r\x0b\x0c'
>>> string.punctuation      #All punctuation characters
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The problem
Count the number of occurrences of all words in a file or a string. Because of the punctuation in a sentence, cutting a string directly involves cutting words and punctuation together, such as:

If the specified punctuation is cut, it can be quite troublesome to operate when the sentence is long or there are many punctuation marks in it.
The solution
Idea: First, replace the punctuation marks in the sentence with Spaces, and then cut them in split(). So you can use String.punctuation at this point
The code:

import string    #Be sure to import the string module before using it.

>>> s="We met at the wrong time, but separated at the right time. The most urgen
t is to take the most beautiful scenery!!! the deepest wound was the most real e
motions."
>>> for i in s:
...     if i in string.punctuation:  #If the character is punctuation, replace it with a space.
...         s = s.replace(i," ")
...
>>> s
'We met at the wrong time  but separated at the right time  The most urgent is t
o take the most beautiful scenery    the deepest wound was the most real emotion
s '
>>> s.split()#Cut to Blank
['We', 'met', 'at', 'the', 'wrong', 'time', 'but', 'separated', 'at', 'the', 'ri
ght', 'time', 'The', 'most', 'urgent', 'is', 'to', 'take', 'the', 'most', 'beaut
iful', 'scenery', 'the', 'deepest', 'wound', 'was', 'the', 'most', 'real', 'emot
ions']
>>>

Of course, this problem can also be solved with regularization:

>>> import re
>>> s="We met at the wrong time, but separated at the right time. The most urgen
t is to take the most beautiful scenery!!! the deepest wound was the most real e
motions."
>>> re.findall(r'\b\w+\b',s)
['We', 'met', 'at', 'the', 'wrong', 'time', 'but', 'separated', 'at', 'the', 'ri
ght', 'time', 'The', 'most', 'urgent', 'is', 'to', 'take', 'the', 'most', 'beaut
iful', 'scenery', 'the', 'deepest', 'wound', 'was', 'the', 'most', 'real', 'emot
ions']

There are many ways to solve a problem, you can try several more to exercise your thinking. When manipulating strings, remember the String module if you feel it is too cumbersome to write, and see if you can solve the problem more easily.