Markov chains to generate text
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
8
down vote
favorite
This is my Python 3 code to generate text using a Markov chain.
The chain first randomly selects a word from a text file. Out of all the occurrences of that word in the text file, the program finds the most populer next word for the first randomly selected word. It continues the process to form a very understandable text.
The best thing about this code is that it copies the style of writing in the text file. At the first trial of the code, I put 3 of the most famous Shakespeare's plays, the Macbeth, Julius Caesar and The Comedy of Errors. And when I generated text from it the outcome was very much like a Shakespeare poem.
My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.
# Markov Chain Poetry
import random
import sys
poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", "
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.
index = 1
chain =
count = 1000 # Desired word count of output
# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]:
key = poems[index - 1]
if key in chain:
chain[key].append(word)
else:
chain[key] = [word]
index += 1
word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()
# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
word2 = random.choice(chain[word1])
word1 = word2
message += ' ' + word2
# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
file.write(message)
output = open("output.txt","r")
print(output.read())
Thanks!!!
python python-3.x random file
add a comment |Â
up vote
8
down vote
favorite
This is my Python 3 code to generate text using a Markov chain.
The chain first randomly selects a word from a text file. Out of all the occurrences of that word in the text file, the program finds the most populer next word for the first randomly selected word. It continues the process to form a very understandable text.
The best thing about this code is that it copies the style of writing in the text file. At the first trial of the code, I put 3 of the most famous Shakespeare's plays, the Macbeth, Julius Caesar and The Comedy of Errors. And when I generated text from it the outcome was very much like a Shakespeare poem.
My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.
# Markov Chain Poetry
import random
import sys
poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", "
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.
index = 1
chain =
count = 1000 # Desired word count of output
# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]:
key = poems[index - 1]
if key in chain:
chain[key].append(word)
else:
chain[key] = [word]
index += 1
word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()
# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
word2 = random.choice(chain[word1])
word1 = word2
message += ' ' + word2
# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
file.write(message)
output = open("output.txt","r")
print(output.read())
Thanks!!!
python python-3.x random file
add a comment |Â
up vote
8
down vote
favorite
up vote
8
down vote
favorite
This is my Python 3 code to generate text using a Markov chain.
The chain first randomly selects a word from a text file. Out of all the occurrences of that word in the text file, the program finds the most populer next word for the first randomly selected word. It continues the process to form a very understandable text.
The best thing about this code is that it copies the style of writing in the text file. At the first trial of the code, I put 3 of the most famous Shakespeare's plays, the Macbeth, Julius Caesar and The Comedy of Errors. And when I generated text from it the outcome was very much like a Shakespeare poem.
My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.
# Markov Chain Poetry
import random
import sys
poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", "
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.
index = 1
chain =
count = 1000 # Desired word count of output
# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]:
key = poems[index - 1]
if key in chain:
chain[key].append(word)
else:
chain[key] = [word]
index += 1
word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()
# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
word2 = random.choice(chain[word1])
word1 = word2
message += ' ' + word2
# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
file.write(message)
output = open("output.txt","r")
print(output.read())
Thanks!!!
python python-3.x random file
This is my Python 3 code to generate text using a Markov chain.
The chain first randomly selects a word from a text file. Out of all the occurrences of that word in the text file, the program finds the most populer next word for the first randomly selected word. It continues the process to form a very understandable text.
The best thing about this code is that it copies the style of writing in the text file. At the first trial of the code, I put 3 of the most famous Shakespeare's plays, the Macbeth, Julius Caesar and The Comedy of Errors. And when I generated text from it the outcome was very much like a Shakespeare poem.
My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.
# Markov Chain Poetry
import random
import sys
poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", "
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.
index = 1
chain =
count = 1000 # Desired word count of output
# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]:
key = poems[index - 1]
if key in chain:
chain[key].append(word)
else:
chain[key] = [word]
index += 1
word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()
# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
word2 = random.choice(chain[word1])
word1 = word2
message += ' ' + word2
# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
file.write(message)
output = open("output.txt","r")
print(output.read())
Thanks!!!
python python-3.x random file
edited May 2 at 7:04
Phrancis
14.6k644137
14.6k644137
asked May 2 at 6:39
AnanthaKrishna K
411
411
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
14
down vote
Functions
Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:
- read input
- assemble chain
- construct new poem
- output
This way, you can reuse parts of the code, save intermediary results and test the parts individually.
generators
instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.
read the input
There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()])
. join
is perfectly capable of handling any iterable, so also a generator expression.
use the with
statement to open files:
def read_input(filename):
"""reads `file`, yields the consecutive words"""
with open(filename, 'r') as file:
for line in file:
for word in line.split(''):
if word and not word.isdigit():
yield word
with regular expressions, and by hoisting the IO, you can ease this method even more:
def read_input_re(file):
pattern = re.compile("[a-zA-Z][a-zA-Z']+")
for line in file:
for word in pattern.finditer(line):
yield word.group()
which then can be called with a file:
def read_file(filename):
with open(filename, 'r') as file:
return read_input_re(file)
or with any iterable that yields strings as argument. For example if poem
holds a multi-line string with a poem:words = read_input_re(poem.split('n'))
This refactoring also makes loading the different poems from different textfiles almost trivial:
filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)
If you want all the words in the chain lowercase, so FROM
and from
are marked as the same word, just add
words = map(str.lower, words)
assemble the chain
Here a collections.defaultdict(list)
is the natural datastructure to for the chain.
Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:
def assemble_chain(words):
chain = defaultdict(list)
try:
word, following = next(words), next(words)
while True:
chain[word].append(following)
word, following = following, next(words)
except StopIteration:
return chain
or using some of itertools
' useful functions:
from itertools import tee, islice
def assemble_chain_itertools(words):
chain = defaultdict(list)
words, followings = tee(words, 2)
for word, following in zip(words, islice(followings, 1, None)):
chain[word].append(following)
return chain
Or even using a deque
:
from collections import deque
def assemble_chain_deque(words):
chain = defaultdict(list)
queue = deque(islice(words, 1), maxlen=2)
for new_word in words:
queue.append(new_word)
word, following = queue
chain[word].append(following)
return chain
whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.
create the poem
Since you will be asking for a new word a lot, it can pay to extract it to its own function:
def get_random_word(choices):
return random.choice(list(choices))
Then you can make an endless generator yielding subsequent words:
def generate_words(chain):
word = get_random_word(chain)
while True:
yield word
if word in chain:
word = get_random_word(chain[word])
else:
word = get_random_word(chain)
We then us islice
to gather the number of words we need, which then can be pasted together with ' '.join()
length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)
"be tatter'd we desire famine where all eating ask'd where"
Once you have that, making a poem of a number of lines with set length is also easy:
def construct_poem(chain, lines, line_length):
for _ in range(lines):
yield ' '.join(islice(generate_words(chain), line_length))
lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make
I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:
def construct_poem2(chain, line_lengths):
for line_length in line_lengths:
yield ' '.join(islice(generate_words(chain), line_length))
line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
14
down vote
Functions
Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:
- read input
- assemble chain
- construct new poem
- output
This way, you can reuse parts of the code, save intermediary results and test the parts individually.
generators
instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.
read the input
There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()])
. join
is perfectly capable of handling any iterable, so also a generator expression.
use the with
statement to open files:
def read_input(filename):
"""reads `file`, yields the consecutive words"""
with open(filename, 'r') as file:
for line in file:
for word in line.split(''):
if word and not word.isdigit():
yield word
with regular expressions, and by hoisting the IO, you can ease this method even more:
def read_input_re(file):
pattern = re.compile("[a-zA-Z][a-zA-Z']+")
for line in file:
for word in pattern.finditer(line):
yield word.group()
which then can be called with a file:
def read_file(filename):
with open(filename, 'r') as file:
return read_input_re(file)
or with any iterable that yields strings as argument. For example if poem
holds a multi-line string with a poem:words = read_input_re(poem.split('n'))
This refactoring also makes loading the different poems from different textfiles almost trivial:
filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)
If you want all the words in the chain lowercase, so FROM
and from
are marked as the same word, just add
words = map(str.lower, words)
assemble the chain
Here a collections.defaultdict(list)
is the natural datastructure to for the chain.
Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:
def assemble_chain(words):
chain = defaultdict(list)
try:
word, following = next(words), next(words)
while True:
chain[word].append(following)
word, following = following, next(words)
except StopIteration:
return chain
or using some of itertools
' useful functions:
from itertools import tee, islice
def assemble_chain_itertools(words):
chain = defaultdict(list)
words, followings = tee(words, 2)
for word, following in zip(words, islice(followings, 1, None)):
chain[word].append(following)
return chain
Or even using a deque
:
from collections import deque
def assemble_chain_deque(words):
chain = defaultdict(list)
queue = deque(islice(words, 1), maxlen=2)
for new_word in words:
queue.append(new_word)
word, following = queue
chain[word].append(following)
return chain
whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.
create the poem
Since you will be asking for a new word a lot, it can pay to extract it to its own function:
def get_random_word(choices):
return random.choice(list(choices))
Then you can make an endless generator yielding subsequent words:
def generate_words(chain):
word = get_random_word(chain)
while True:
yield word
if word in chain:
word = get_random_word(chain[word])
else:
word = get_random_word(chain)
We then us islice
to gather the number of words we need, which then can be pasted together with ' '.join()
length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)
"be tatter'd we desire famine where all eating ask'd where"
Once you have that, making a poem of a number of lines with set length is also easy:
def construct_poem(chain, lines, line_length):
for _ in range(lines):
yield ' '.join(islice(generate_words(chain), line_length))
lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make
I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:
def construct_poem2(chain, line_lengths):
for line_length in line_lengths:
yield ' '.join(islice(generate_words(chain), line_length))
line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to
add a comment |Â
up vote
14
down vote
Functions
Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:
- read input
- assemble chain
- construct new poem
- output
This way, you can reuse parts of the code, save intermediary results and test the parts individually.
generators
instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.
read the input
There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()])
. join
is perfectly capable of handling any iterable, so also a generator expression.
use the with
statement to open files:
def read_input(filename):
"""reads `file`, yields the consecutive words"""
with open(filename, 'r') as file:
for line in file:
for word in line.split(''):
if word and not word.isdigit():
yield word
with regular expressions, and by hoisting the IO, you can ease this method even more:
def read_input_re(file):
pattern = re.compile("[a-zA-Z][a-zA-Z']+")
for line in file:
for word in pattern.finditer(line):
yield word.group()
which then can be called with a file:
def read_file(filename):
with open(filename, 'r') as file:
return read_input_re(file)
or with any iterable that yields strings as argument. For example if poem
holds a multi-line string with a poem:words = read_input_re(poem.split('n'))
This refactoring also makes loading the different poems from different textfiles almost trivial:
filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)
If you want all the words in the chain lowercase, so FROM
and from
are marked as the same word, just add
words = map(str.lower, words)
assemble the chain
Here a collections.defaultdict(list)
is the natural datastructure to for the chain.
Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:
def assemble_chain(words):
chain = defaultdict(list)
try:
word, following = next(words), next(words)
while True:
chain[word].append(following)
word, following = following, next(words)
except StopIteration:
return chain
or using some of itertools
' useful functions:
from itertools import tee, islice
def assemble_chain_itertools(words):
chain = defaultdict(list)
words, followings = tee(words, 2)
for word, following in zip(words, islice(followings, 1, None)):
chain[word].append(following)
return chain
Or even using a deque
:
from collections import deque
def assemble_chain_deque(words):
chain = defaultdict(list)
queue = deque(islice(words, 1), maxlen=2)
for new_word in words:
queue.append(new_word)
word, following = queue
chain[word].append(following)
return chain
whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.
create the poem
Since you will be asking for a new word a lot, it can pay to extract it to its own function:
def get_random_word(choices):
return random.choice(list(choices))
Then you can make an endless generator yielding subsequent words:
def generate_words(chain):
word = get_random_word(chain)
while True:
yield word
if word in chain:
word = get_random_word(chain[word])
else:
word = get_random_word(chain)
We then us islice
to gather the number of words we need, which then can be pasted together with ' '.join()
length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)
"be tatter'd we desire famine where all eating ask'd where"
Once you have that, making a poem of a number of lines with set length is also easy:
def construct_poem(chain, lines, line_length):
for _ in range(lines):
yield ' '.join(islice(generate_words(chain), line_length))
lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make
I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:
def construct_poem2(chain, line_lengths):
for line_length in line_lengths:
yield ' '.join(islice(generate_words(chain), line_length))
line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to
add a comment |Â
up vote
14
down vote
up vote
14
down vote
Functions
Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:
- read input
- assemble chain
- construct new poem
- output
This way, you can reuse parts of the code, save intermediary results and test the parts individually.
generators
instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.
read the input
There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()])
. join
is perfectly capable of handling any iterable, so also a generator expression.
use the with
statement to open files:
def read_input(filename):
"""reads `file`, yields the consecutive words"""
with open(filename, 'r') as file:
for line in file:
for word in line.split(''):
if word and not word.isdigit():
yield word
with regular expressions, and by hoisting the IO, you can ease this method even more:
def read_input_re(file):
pattern = re.compile("[a-zA-Z][a-zA-Z']+")
for line in file:
for word in pattern.finditer(line):
yield word.group()
which then can be called with a file:
def read_file(filename):
with open(filename, 'r') as file:
return read_input_re(file)
or with any iterable that yields strings as argument. For example if poem
holds a multi-line string with a poem:words = read_input_re(poem.split('n'))
This refactoring also makes loading the different poems from different textfiles almost trivial:
filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)
If you want all the words in the chain lowercase, so FROM
and from
are marked as the same word, just add
words = map(str.lower, words)
assemble the chain
Here a collections.defaultdict(list)
is the natural datastructure to for the chain.
Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:
def assemble_chain(words):
chain = defaultdict(list)
try:
word, following = next(words), next(words)
while True:
chain[word].append(following)
word, following = following, next(words)
except StopIteration:
return chain
or using some of itertools
' useful functions:
from itertools import tee, islice
def assemble_chain_itertools(words):
chain = defaultdict(list)
words, followings = tee(words, 2)
for word, following in zip(words, islice(followings, 1, None)):
chain[word].append(following)
return chain
Or even using a deque
:
from collections import deque
def assemble_chain_deque(words):
chain = defaultdict(list)
queue = deque(islice(words, 1), maxlen=2)
for new_word in words:
queue.append(new_word)
word, following = queue
chain[word].append(following)
return chain
whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.
create the poem
Since you will be asking for a new word a lot, it can pay to extract it to its own function:
def get_random_word(choices):
return random.choice(list(choices))
Then you can make an endless generator yielding subsequent words:
def generate_words(chain):
word = get_random_word(chain)
while True:
yield word
if word in chain:
word = get_random_word(chain[word])
else:
word = get_random_word(chain)
We then us islice
to gather the number of words we need, which then can be pasted together with ' '.join()
length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)
"be tatter'd we desire famine where all eating ask'd where"
Once you have that, making a poem of a number of lines with set length is also easy:
def construct_poem(chain, lines, line_length):
for _ in range(lines):
yield ' '.join(islice(generate_words(chain), line_length))
lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make
I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:
def construct_poem2(chain, line_lengths):
for line_length in line_lengths:
yield ' '.join(islice(generate_words(chain), line_length))
line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to
Functions
Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:
- read input
- assemble chain
- construct new poem
- output
This way, you can reuse parts of the code, save intermediary results and test the parts individually.
generators
instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.
read the input
There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()])
. join
is perfectly capable of handling any iterable, so also a generator expression.
use the with
statement to open files:
def read_input(filename):
"""reads `file`, yields the consecutive words"""
with open(filename, 'r') as file:
for line in file:
for word in line.split(''):
if word and not word.isdigit():
yield word
with regular expressions, and by hoisting the IO, you can ease this method even more:
def read_input_re(file):
pattern = re.compile("[a-zA-Z][a-zA-Z']+")
for line in file:
for word in pattern.finditer(line):
yield word.group()
which then can be called with a file:
def read_file(filename):
with open(filename, 'r') as file:
return read_input_re(file)
or with any iterable that yields strings as argument. For example if poem
holds a multi-line string with a poem:words = read_input_re(poem.split('n'))
This refactoring also makes loading the different poems from different textfiles almost trivial:
filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)
If you want all the words in the chain lowercase, so FROM
and from
are marked as the same word, just add
words = map(str.lower, words)
assemble the chain
Here a collections.defaultdict(list)
is the natural datastructure to for the chain.
Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:
def assemble_chain(words):
chain = defaultdict(list)
try:
word, following = next(words), next(words)
while True:
chain[word].append(following)
word, following = following, next(words)
except StopIteration:
return chain
or using some of itertools
' useful functions:
from itertools import tee, islice
def assemble_chain_itertools(words):
chain = defaultdict(list)
words, followings = tee(words, 2)
for word, following in zip(words, islice(followings, 1, None)):
chain[word].append(following)
return chain
Or even using a deque
:
from collections import deque
def assemble_chain_deque(words):
chain = defaultdict(list)
queue = deque(islice(words, 1), maxlen=2)
for new_word in words:
queue.append(new_word)
word, following = queue
chain[word].append(following)
return chain
whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.
create the poem
Since you will be asking for a new word a lot, it can pay to extract it to its own function:
def get_random_word(choices):
return random.choice(list(choices))
Then you can make an endless generator yielding subsequent words:
def generate_words(chain):
word = get_random_word(chain)
while True:
yield word
if word in chain:
word = get_random_word(chain[word])
else:
word = get_random_word(chain)
We then us islice
to gather the number of words we need, which then can be pasted together with ' '.join()
length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)
"be tatter'd we desire famine where all eating ask'd where"
Once you have that, making a poem of a number of lines with set length is also easy:
def construct_poem(chain, lines, line_length):
for _ in range(lines):
yield ' '.join(islice(generate_words(chain), line_length))
lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make
I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:
def construct_poem2(chain, line_lengths):
for line_length in line_lengths:
yield ' '.join(islice(generate_words(chain), line_length))
line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))
Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to
edited May 2 at 12:27
answered May 2 at 11:00
Maarten Fabré
3,204214
3,204214
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193419%2fmarkov-chains-to-generate-text%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password