Markov chains to generate text

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
8
down vote

favorite

This is my Python 3 code to generate text using a Markov chain.
The chain first randomly selects a word from a text file. Out of all the occurrences of that word in the text file, the program finds the most populer next word for the first randomly selected word. It continues the process to form a very understandable text.

The best thing about this code is that it copies the style of writing in the text file. At the first trial of the code, I put 3 of the most famous Shakespeare's plays, the Macbeth, Julius Caesar and The Comedy of Errors. And when I generated text from it the outcome was very much like a Shakespeare poem.

My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.

# Markov Chain Poetry


import random
import sys

poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", " 
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.

index = 1
chain = 
count = 1000 # Desired word count of output

# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]: 
 key = poems[index - 1]
 if key in chain:
 chain[key].append(word)
 else:
 chain[key] = [word]
 index += 1

word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()

# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
 word2 = random.choice(chain[word1])
 word1 = word2
 message += ' ' + word2

# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
 file.write(message)
output = open("output.txt","r")
print(output.read())

Thanks!!!

edited May 2 at 7:04

Phrancis

14.6k644137

asked May 2 at 6:39

AnanthaKrishna K

411

add a commentÂ |Â

up vote
8
down vote

favorite

My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.

# Markov Chain Poetry


import random
import sys

poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", " 
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.

index = 1
chain = 
count = 1000 # Desired word count of output

# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]: 
 key = poems[index - 1]
 if key in chain:
 chain[key].append(word)
 else:
 chain[key] = [word]
 index += 1

word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()

# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
 word2 = random.choice(chain[word1])
 word1 = word2
 message += ' ' + word2

# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
 file.write(message)
output = open("output.txt","r")
print(output.read())

Thanks!!!

edited May 2 at 7:04

Phrancis

14.6k644137

asked May 2 at 6:39

AnanthaKrishna K

411

add a commentÂ |Â

up vote
8
down vote

favorite

My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.

# Markov Chain Poetry


import random
import sys

poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", " 
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.

index = 1
chain = 
count = 1000 # Desired word count of output

# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]: 
 key = poems[index - 1]
 if key in chain:
 chain[key].append(word)
 else:
 chain[key] = [word]
 index += 1

word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()

# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
 word2 = random.choice(chain[word1])
 word1 = word2
 message += ' ' + word2

# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
 file.write(message)
output = open("output.txt","r")
print(output.read())

Thanks!!!

edited May 2 at 7:04

Phrancis

14.6k644137

asked May 2 at 6:39

AnanthaKrishna K

411

My knowledge in Python coding is between intermediate and expert. Please review my code and make changes as you like. I want suggestions from both experts and beginners.

# Markov Chain Poetry


import random
import sys

poems = open("text.txt", "r").read()
poems = ''.join([i for i in poems if not i.isdigit()]).replace("nn", " 
").split(' ')
# This process the list of poems. Double line breaks separate poems, so they are removed.
# Splitting along spaces creates a list of all words.

index = 1
chain = 
count = 1000 # Desired word count of output

# This loop creates a dicitonary called "chain". Each key is a word, and the value of each key
# is an array of the words that immediately followed it.
for word in poems[index:]: 
 key = poems[index - 1]
 if key in chain:
 chain[key].append(word)
 else:
 chain[key] = [word]
 index += 1

word1 = random.choice(list(chain.keys())) #random first word
message = word1.capitalize()

# Picks the next word over and over until word count achieved
while len(message.split(' ')) < count:
 word2 = random.choice(chain[word1])
 word1 = word2
 message += ' ' + word2

# creates new file with output and prints it to the terminal
with open("output.txt", "w") as file:
 file.write(message)
output = open("output.txt","r")
print(output.read())

Thanks!!!

edited May 2 at 7:04

Phrancis

14.6k644137

asked May 2 at 6:39

AnanthaKrishna K

411

edited May 2 at 7:04

Phrancis

14.6k644137

edited May 2 at 7:04

Phrancis

14.6k644137

edited May 2 at 7:04

Phrancis

14.6k644137

asked May 2 at 6:39

AnanthaKrishna K

411

asked May 2 at 6:39

AnanthaKrishna K

411

asked May 2 at 6:39

AnanthaKrishna K

411

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
14
down vote

Functions

Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:

read input

assemble chain

construct new poem

output

This way, you can reuse parts of the code, save intermediary results and test the parts individually.

generators

instead of keeping all the intermediary lists in memory, generators can be a lot more memory efficient. I try to use them as much as possible. substantiating them to a list or dict when needed is easy.

read the input

There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()]). join is perfectly capable of handling any iterable, so also a generator expression.

use the with statement to open files:

def read_input(filename):
 """reads `file`, yields the consecutive words"""
 with open(filename, 'r') as file:
 for line in file:
 for word in line.split(''):
 if word and not word.isdigit():
 yield word

with regular expressions, and by hoisting the IO, you can ease this method even more:

def read_input_re(file):
 pattern = re.compile("[a-zA-Z][a-zA-Z']+")
 for line in file:
 for word in pattern.finditer(line):
 yield word.group()

which then can be called with a file:

def read_file(filename):
 with open(filename, 'r') as file:
 return read_input_re(file)

or with any iterable that yields strings as argument. For example if poem holds a multi-line string with a poem:words = read_input_re(poem.split('n'))

This refactoring also makes loading the different poems from different textfiles almost trivial:

filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)

If you want all the words in the chain lowercase, so FROM and from are marked as the same word, just add

words = map(str.lower, words)

assemble the chain

Here a collections.defaultdict(list) is the natural datastructure to for the chain.

Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:

def assemble_chain(words):
 chain = defaultdict(list)
 try:
 word, following = next(words), next(words)
 while True:
 chain[word].append(following)
 word, following = following, next(words)
 except StopIteration:
 return chain

or using some of itertools' useful functions:

from itertools import tee, islice

def assemble_chain_itertools(words):
 chain = defaultdict(list)
 words, followings = tee(words, 2)
 for word, following in zip(words, islice(followings, 1, None)):
 chain[word].append(following)
 return chain

Or even using a deque:

from collections import deque
def assemble_chain_deque(words):
 chain = defaultdict(list)
 queue = deque(islice(words, 1), maxlen=2)
 for new_word in words:
 queue.append(new_word)
 word, following = queue
 chain[word].append(following)
 return chain

whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.

create the poem

Since you will be asking for a new word a lot, it can pay to extract it to its own function:

def get_random_word(choices):
 return random.choice(list(choices))

Then you can make an endless generator yielding subsequent words:

def generate_words(chain):
 word = get_random_word(chain)
 while True:
 yield word
 if word in chain:
 word = get_random_word(chain[word])
 else:
 word = get_random_word(chain)

We then us islice to gather the number of words we need, which then can be pasted together with ' '.join()

length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)

"be tatter'd we desire famine where all eating ask'd where"

Once you have that, making a poem of a number of lines with set length is also easy:

def construct_poem(chain, lines, line_length):
 for _ in range(lines):
 yield ' '.join(islice(generate_words(chain), line_length))

lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make

I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:

def construct_poem2(chain, line_lengths):
 for line_length in line_lengths:
 yield ' '.join(islice(generate_words(chain), line_length))

line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f193419%2fmarkov-chains-to-generate-text%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
14
down vote

Functions

Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:

read input

assemble chain

construct new poem

output

This way, you can reuse parts of the code, save intermediary results and test the parts individually.

generators

read the input

There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()]). join is perfectly capable of handling any iterable, so also a generator expression.

use the with statement to open files:

def read_input(filename):
 """reads `file`, yields the consecutive words"""
 with open(filename, 'r') as file:
 for line in file:
 for word in line.split(''):
 if word and not word.isdigit():
 yield word

with regular expressions, and by hoisting the IO, you can ease this method even more:

def read_input_re(file):
 pattern = re.compile("[a-zA-Z][a-zA-Z']+")
 for line in file:
 for word in pattern.finditer(line):
 yield word.group()

which then can be called with a file:

def read_file(filename):
 with open(filename, 'r') as file:
 return read_input_re(file)

or with any iterable that yields strings as argument. For example if poem holds a multi-line string with a poem:words = read_input_re(poem.split('n'))

This refactoring also makes loading the different poems from different textfiles almost trivial:

filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)

If you want all the words in the chain lowercase, so FROM and from are marked as the same word, just add

words = map(str.lower, words)

assemble the chain

Here a collections.defaultdict(list) is the natural datastructure to for the chain.

Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:

def assemble_chain(words):
 chain = defaultdict(list)
 try:
 word, following = next(words), next(words)
 while True:
 chain[word].append(following)
 word, following = following, next(words)
 except StopIteration:
 return chain

or using some of itertools' useful functions:

from itertools import tee, islice

def assemble_chain_itertools(words):
 chain = defaultdict(list)
 words, followings = tee(words, 2)
 for word, following in zip(words, islice(followings, 1, None)):
 chain[word].append(following)
 return chain

Or even using a deque:

from collections import deque
def assemble_chain_deque(words):
 chain = defaultdict(list)
 queue = deque(islice(words, 1), maxlen=2)
 for new_word in words:
 queue.append(new_word)
 word, following = queue
 chain[word].append(following)
 return chain

whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.

create the poem

Since you will be asking for a new word a lot, it can pay to extract it to its own function:

def get_random_word(choices):
 return random.choice(list(choices))

Then you can make an endless generator yielding subsequent words:

def generate_words(chain):
 word = get_random_word(chain)
 while True:
 yield word
 if word in chain:
 word = get_random_word(chain[word])
 else:
 word = get_random_word(chain)

We then us islice to gather the number of words we need, which then can be pasted together with ' '.join()

length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)

"be tatter'd we desire famine where all eating ask'd where"

Once you have that, making a poem of a number of lines with set length is also easy:

def construct_poem(chain, lines, line_length):
 for _ in range(lines):
 yield ' '.join(islice(generate_words(chain), line_length))

lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make

I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:

def construct_poem2(chain, line_lengths):
 for line_length in line_lengths:
 yield ' '.join(islice(generate_words(chain), line_length))

line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

add a commentÂ |Â

up vote
14
down vote

Functions

Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:

read input

assemble chain

construct new poem

output

This way, you can reuse parts of the code, save intermediary results and test the parts individually.

generators

read the input

There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()]). join is perfectly capable of handling any iterable, so also a generator expression.

use the with statement to open files:

def read_input(filename):
 """reads `file`, yields the consecutive words"""
 with open(filename, 'r') as file:
 for line in file:
 for word in line.split(''):
 if word and not word.isdigit():
 yield word

with regular expressions, and by hoisting the IO, you can ease this method even more:

def read_input_re(file):
 pattern = re.compile("[a-zA-Z][a-zA-Z']+")
 for line in file:
 for word in pattern.finditer(line):
 yield word.group()

which then can be called with a file:

def read_file(filename):
 with open(filename, 'r') as file:
 return read_input_re(file)

or with any iterable that yields strings as argument. For example if poem holds a multi-line string with a poem:words = read_input_re(poem.split('n'))

This refactoring also makes loading the different poems from different textfiles almost trivial:

filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)

If you want all the words in the chain lowercase, so FROM and from are marked as the same word, just add

words = map(str.lower, words)

assemble the chain

Here a collections.defaultdict(list) is the natural datastructure to for the chain.

Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:

def assemble_chain(words):
 chain = defaultdict(list)
 try:
 word, following = next(words), next(words)
 while True:
 chain[word].append(following)
 word, following = following, next(words)
 except StopIteration:
 return chain

or using some of itertools' useful functions:

from itertools import tee, islice

def assemble_chain_itertools(words):
 chain = defaultdict(list)
 words, followings = tee(words, 2)
 for word, following in zip(words, islice(followings, 1, None)):
 chain[word].append(following)
 return chain

Or even using a deque:

from collections import deque
def assemble_chain_deque(words):
 chain = defaultdict(list)
 queue = deque(islice(words, 1), maxlen=2)
 for new_word in words:
 queue.append(new_word)
 word, following = queue
 chain[word].append(following)
 return chain

whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.

create the poem

Since you will be asking for a new word a lot, it can pay to extract it to its own function:

def get_random_word(choices):
 return random.choice(list(choices))

Then you can make an endless generator yielding subsequent words:

def generate_words(chain):
 word = get_random_word(chain)
 while True:
 yield word
 if word in chain:
 word = get_random_word(chain[word])
 else:
 word = get_random_word(chain)

We then us islice to gather the number of words we need, which then can be pasted together with ' '.join()

length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)

"be tatter'd we desire famine where all eating ask'd where"

Once you have that, making a poem of a number of lines with set length is also easy:

def construct_poem(chain, lines, line_length):
 for _ in range(lines):
 yield ' '.join(islice(generate_words(chain), line_length))

lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make

I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:

def construct_poem2(chain, line_lengths):
 for line_length in line_lengths:
 yield ' '.join(islice(generate_words(chain), line_length))

line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

add a commentÂ |Â

up vote
14
down vote

Functions

Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:

read input

assemble chain

construct new poem

output

This way, you can reuse parts of the code, save intermediary results and test the parts individually.

generators

read the input

There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()]). join is perfectly capable of handling any iterable, so also a generator expression.

use the with statement to open files:

def read_input(filename):
 """reads `file`, yields the consecutive words"""
 with open(filename, 'r') as file:
 for line in file:
 for word in line.split(''):
 if word and not word.isdigit():
 yield word

with regular expressions, and by hoisting the IO, you can ease this method even more:

def read_input_re(file):
 pattern = re.compile("[a-zA-Z][a-zA-Z']+")
 for line in file:
 for word in pattern.finditer(line):
 yield word.group()

which then can be called with a file:

def read_file(filename):
 with open(filename, 'r') as file:
 return read_input_re(file)

or with any iterable that yields strings as argument. For example if poem holds a multi-line string with a poem:words = read_input_re(poem.split('n'))

This refactoring also makes loading the different poems from different textfiles almost trivial:

filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)

If you want all the words in the chain lowercase, so FROM and from are marked as the same word, just add

words = map(str.lower, words)

assemble the chain

Here a collections.defaultdict(list) is the natural datastructure to for the chain.

Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:

def assemble_chain(words):
 chain = defaultdict(list)
 try:
 word, following = next(words), next(words)
 while True:
 chain[word].append(following)
 word, following = following, next(words)
 except StopIteration:
 return chain

or using some of itertools' useful functions:

from itertools import tee, islice

def assemble_chain_itertools(words):
 chain = defaultdict(list)
 words, followings = tee(words, 2)
 for word, following in zip(words, islice(followings, 1, None)):
 chain[word].append(following)
 return chain

Or even using a deque:

from collections import deque
def assemble_chain_deque(words):
 chain = defaultdict(list)
 queue = deque(islice(words, 1), maxlen=2)
 for new_word in words:
 queue.append(new_word)
 word, following = queue
 chain[word].append(following)
 return chain

whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.

create the poem

Since you will be asking for a new word a lot, it can pay to extract it to its own function:

def get_random_word(choices):
 return random.choice(list(choices))

Then you can make an endless generator yielding subsequent words:

def generate_words(chain):
 word = get_random_word(chain)
 while True:
 yield word
 if word in chain:
 word = get_random_word(chain[word])
 else:
 word = get_random_word(chain)

We then us islice to gather the number of words we need, which then can be pasted together with ' '.join()

length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)

"be tatter'd we desire famine where all eating ask'd where"

Once you have that, making a poem of a number of lines with set length is also easy:

def construct_poem(chain, lines, line_length):
 for _ in range(lines):
 yield ' '.join(islice(generate_words(chain), line_length))

lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make

I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:

def construct_poem2(chain, line_lengths):
 for line_length in line_lengths:
 yield ' '.join(islice(generate_words(chain), line_length))

line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

Functions

Split the code into functions, also split the generation and the presentation. Your algorithm has some clear distinct tasks, so split along these lines:

read input

assemble chain

construct new poem

output

This way, you can reuse parts of the code, save intermediary results and test the parts individually.

generators

read the input

There is no need to assemble the intermediary list in ''.join([i for i in poems if not i.isdigit()]). join is perfectly capable of handling any iterable, so also a generator expression.

use the with statement to open files:

def read_input(filename):
 """reads `file`, yields the consecutive words"""
 with open(filename, 'r') as file:
 for line in file:
 for word in line.split(''):
 if word and not word.isdigit():
 yield word

with regular expressions, and by hoisting the IO, you can ease this method even more:

def read_input_re(file):
 pattern = re.compile("[a-zA-Z][a-zA-Z']+")
 for line in file:
 for word in pattern.finditer(line):
 yield word.group()

which then can be called with a file:

def read_file(filename):
 with open(filename, 'r') as file:
 return read_input_re(file)

or with any iterable that yields strings as argument. For example if poem holds a multi-line string with a poem:words = read_input_re(poem.split('n'))

This refactoring also makes loading the different poems from different textfiles almost trivial:

filenames = ['file1.txt', 'file2.txt', ...]
parsed_files = (read_file(filename) for filename in filenames)
words = itertools.chain.from_iterable(parsed_files)

If you want all the words in the chain lowercase, so FROM and from are marked as the same word, just add

words = map(str.lower, words)

assemble the chain

Here a collections.defaultdict(list) is the natural datastructure to for the chain.

Instead of using hard indexing to get the subsequent words, which is impossible to do with a generator, you can do it like this:

def assemble_chain(words):
 chain = defaultdict(list)
 try:
 word, following = next(words), next(words)
 while True:
 chain[word].append(following)
 word, following = following, next(words)
 except StopIteration:
 return chain

or using some of itertools' useful functions:

from itertools import tee, islice

def assemble_chain_itertools(words):
 chain = defaultdict(list)
 words, followings = tee(words, 2)
 for word, following in zip(words, islice(followings, 1, None)):
 chain[word].append(following)
 return chain

Or even using a deque:

from collections import deque
def assemble_chain_deque(words):
 chain = defaultdict(list)
 queue = deque(islice(words, 1), maxlen=2)
 for new_word in words:
 queue.append(new_word)
 word, following = queue
 chain[word].append(following)
 return chain

whichever is more clear is a matter of habit and experience, If performance is important, you will need to time them.

create the poem

Since you will be asking for a new word a lot, it can pay to extract it to its own function:

def get_random_word(choices):
 return random.choice(list(choices))

Then you can make an endless generator yielding subsequent words:

def generate_words(chain):
 word = get_random_word(chain)
 while True:
 yield word
 if word in chain:
 word = get_random_word(chain[word])
 else:
 word = get_random_word(chain)

We then us islice to gather the number of words we need, which then can be pasted together with ' '.join()

length = 10
poem = islice(generate_words(chain), length)
poem = ' '.join(poem)

"be tatter'd we desire famine where all eating ask'd where"

Once you have that, making a poem of a number of lines with set length is also easy:

def construct_poem(chain, lines, line_length):
 for _ in range(lines):
 yield ' '.join(islice(generate_words(chain), line_length))

lines = construct_poem(chain, 4, 10)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel shall beseige
Treasure of small pity the riper eyes were to the
Foe to the riper by time spring within and make

I think it makes sense to do the capitalization after the line has been assembled. Yet another separation of generation and presentation:

def construct_poem2(chain, line_lengths):
 for line_length in line_lengths:
 yield ' '.join(islice(generate_words(chain), line_length))

line_lengths = [10, 8, 8, 10]
lines = construct_poem2(chain, line_lengths)
lines = map(str.capitalize, lines)
print('n'.join(lines))

Be tatter'd we desire famine where all eating ask'd where
Deep trenches that thereby the riper substantial fuel
Shall beseige treasure of small pity the riper
Eyes were to the riper memory but eyes were to

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

edited May 2 at 12:27

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

answered May 2 at 11:00

Maarten FabrÃ©

3,204214

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

trjhtr

Markov chains to generate text

1 Answer
1

Functions

generators

read the input

assemble the chain

create the poem

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

Markov chains to generate text

1 Answer 1

Functions

generators

read the input

assemble the chain

create the poem

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Functions

generators

read the input

assemble the chain

create the poem

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Chat program with C++ and SFML

Read an image with ADNS2610 optical sensor and Arduino Uno

Read files from a directory using Promises

1 Answer
1

1 Answer
1

1 Answer
1