The State of the Union Machine uses models of language to randomly generate text based on different presidents' previous speeches. These models are called "n-gram models" or "Markov models," and are used in many places, from machine translation to DNA sequencing. There is one model for each president, and each sentence is generated using a single model, using inputs from the previous sentence as context.
The models are trained on a corpus of text, and they learn about the probability of a word given its preceding context. It's a bit like a robot that learns how to fill in the blanks. For instance: Models trained on recent presidents have learned that the words "my fellow" are frequently followed by the word "Americans."
Our models were trained on previous State of the Union deliveries, which are archived by researchers at the University of California, Santa Barbara's American Presidency Project. They were trained using the Natural Language Toolkit's language modeling tools. For more information on technical details, please visit our Github repo!
About the Sunlight Foundation
The Sunlight Foundation is a nonpartisan nonprofit that advocates for open government globally and uses technology to make government more accountable to all. We do so by creating tools, open data, policy recommendations and journalism to dramatically expand access to vital government information to create accountability of our public officials. Our vision is to use technology to enable more complete, equitable and effective democratic participation.
Like this Project and Want to Discover Others Like It?
Join the Sunlight Foundation's open government community to learn more
Follow Sunlight Foundation on