Automatically Generating AP Music Theory Chorales

Western music theory comes with a lot of rules: When writing counterpoint, no parallel fifths or octaves are allowed. The soprano may not make a leap in similar motion with the bass into a perfect fifth or octave. In a second inversion chord, the root (chordal fifth) must be doubled. For a beginner, all of these rules feel arbitrary and confusing!

In my AP music theory class, I decided to try writing a computer program that could automatically solve these constraints. In doing so, I ended up learning a lot about music, computing, and creativity.

The Search Problem

One of the tasks on the exam is to realize a four-part (SATB) chorale from a bass line and implied harmony (figured bass). The curriculum defines strict rules for correct voice leading. Here are the most important criteria to visualize the problem:

For any given harmonization, all notes spelled out by the chord must be represented.
- Exceptions are made for dominant seventh triads or the final cadence, where omission of the fifth is allowed.
Voices must always play chord tones implied by the harmony.
- The bass must use the given bass note.
Voices must maintain independent motion.
- Parallel fifths and octaves between any two voices are forbidden.
- Direct fifths and octaves between a leaping soprano and bass in similar motion are also forbidden.
- The leading tone, as well as the chordal seventh must resolve in stepwise motion and may never be doubled.
Cross relations (one voice singing a diatonic pitch followed by another voice singing a chromatic alteration of it) are not allowed.
The soprano must have a valid melodic contour.
- The soprano line may not use augmented or diminished melodic intervals.
- Leaps in the soprano voice larger than a third must be approached and exited with conjunct motion in the opposite direction.
- The soprano may not make a leap greater than a perfect fifth.
- Tendency tones in the soprano must resolve correctly in stepwise motion.
Voices must be correctly spaced.
- Each voice must stay within its own range. The exact ranges vary; I kept my soprano between C4 and G5, my alto between G3 and C5, and my tenor between C3 and G4.
- The soprano and alto may never be more than an octave apart.
- The alto and tenor may never be more than an octave apart.
- Voice crossings are not allowed.
- No voice may move beyond the pitch played by a neighboring voice on the previous note.

Writing these Bach-style chorales can be thought of as a constraint satisfaction problem: the computer needs to find solutions (voicings) that use the given chords to satisfy the constraints of eighteenth-century voice leading conventions.

Following the AP curriculum, I decided to start out by implementing only soprano-bass counterpoint, where I only had to generate one valid soprano line that fit with the given figured bass, not thinking about the other voices.

Even with a single voice, the solution space can be huge. The number of potential candidates grows exponentially as the chorale gets longer; for each chord, the soprano voice must choose one of the notes in the harmony to play. As a naïve estimate, if the soprano voice can choose between four different notes in a chord, a sixteen-chord chorale could have 4^16 = 4,294,967,296 possible combinations. That’s a lot! In order to generate chorales, I needed an optimised algorithm that could find a solution without having to brute-force enumerate every single combination of notes.

The Backtracking Algorithm

Humans are able to solve this kind of puzzle more efficiently because of a strategy called backtracking. When writing a chorale by hand, we only focus on one part at a time. We start from the beginning, choose the next note, and check if it is valid. If it is valid, we keep going. If not, we try a different choice. If we discover at any point that there are no valid harmonizations for the next note, we have to backtrack and find a different previous note that might lead to a solution. With this strategy, we can build our chorale incrementally; we don’t actually need to evaluate 4,294,967,296 solutions. Backtracking solves these problems much more efficiently.

This strategy works pretty intuitively: If the first two notes of a chorale already contain voice leading errors, the solution is invalid regardless of what we write for the rest of the notes. With a backtracking search, we can eliminate this entire set of incorrect solutions from the search tree, take a single step back, and try a different option. Because of the “compositional” nature of this problem, working step-by-step makes us much more likely to find the valid solution, as we avoid wasting time on obvious mistakes or completely restarting every time we run into a dead end.

With this in mind, I implemented my first attempt at generating chorales. I wrote the program fairly quickly in one afternoon, and it seemed fairly successful.

The basic soprano-bass counterpoint worked! But in order to implement the full four-voice chorale generator, I had to make several major revamps to my code design.

Modeling Tonality in Code

One of the major limitations in my original implementation was the fact that my code modeled pitches and intervals as pure numbers. While the output music does only care about the absolute pitches for the sound, my code needed to be able to analyze the function of each note in the diatonic scale in order to rigorously check for melodic contour and cross relations. For example, C♯ and D♭ are enharmonic, but the diatonic interval between B and C♯ is a Major Second, while the diatonic interval between B and D♭ is a Diminished Third. This means that a soprano voice would be allowed to make the leap between these two pitches if the key signature spelled the note as C♯, but not if it was written as a D♭, due to it being a diminished interval. A chorale generator needs to keep track of not only the pitch of the sound, but its function with respect to the diatonic scale.

In order to track the function of each note, I designed a python data class which tracks both the diatonic step size and chromatic size of each interval. Then, I implemented all the necessary arithmetic operations, strictly keeping track of all the interactions between the quality and size of the chord. In my system, a Minor Third (3 semitones, 2 scale steps) plus another Minor Third (3 semitones, 2 scale steps) would be equivalent to a Diminished Fifth (6 semitones, 4 scale steps), whereas a Major Third (4 semitones, 2 scale steps) plus a Minor Second (2 semitones, 1 scale step) would be equivalent to an Augmented Fourth (6 semitones, 3 scale steps).

After implementing and testing my new tonality library, I was ready to tackle the four-part chorale.

Generating Four-Part Harmony

Generating the parts for all four voices introduced extra challenges. Now that I was in charge of writing all four parts, I had to make sure that all the notes in the chord were represented, and that the voices were spaced correctly. There would be even more possible permutations for each chord, while adding extra limitations.

Again, my solution ended up being pretty similar to the strategies that we were taught in class. This time, instead of writing one voice at a time, we would write an entire vertical harmonization at once, then move on. If we found that a harmonization made it hard to voice the next chord, we would go back and tweak the previous voicing. In my code, I implemented this as a backtracking algorithm that would choose between entire vertical harmonization choices. This way, I didn’t have to think about the specific strategies used to plan out the triad, as long as the end result was allowed.

This rewrite took me around two days of coding. I already knew roughly how the algorithm would work from my first experience, but it took me a while to actually write the algorithms and encode all the voice leading rules.

Improving the Four-Part Harmony

My chorale generator worked, and the outputted chorales were technically valid, but sometimes the computer would make awkward decisions when choosing the voicings. For example, the inner voices in this chorale make awkward leaps of a fourth, making the counterpoint feel unrefined in this C Major chorale:

       S     A      T      B
 E5    E4     G3     C3   
 F5    A4     C4     C3   
 E5    E4     G3     C3   
 D5    G4     D4     B2   
 E5    E4     G3     C3   
 E5    C5     C4     A2   
 F5    F4     A3     D2   
 D5    G4     B3     G2   
 E5    E4     G3     C2   

I IV6/4 I V6 I vi ii V I

In order to improve the quality of the output, I had to encode a way to prefer more favorable solutions. I had to give my algorithm taste.

The Heuristic

When writing chorales, we not only want to obey the rules, but we want to make our voice leading as smooth as possible. We prefer to have the inner voices move as little as possible, while employing contrary and oblique motion for voice independence.

In order to encode these preferences into the generator, I used a heuristic algorithm to evaluate the quality of each candidate solution, and then sorted all the candidate solutions by how favorable they were. This way, the backtracking solver always prioritizes the most favorable solutions, and tries the next best option if it doesn’t work.

I wrote a custom evaluation function that assigns different costs and penalties to different choices: similar motion between the soprano and bass incurs a small penalty of 1 point, leaps in the inner voices incur a medium penalty of 3 points, etc. The heuristic weighs the tradeoffs between chordal doubling, smoothness, and contrary motion to try to choose a good solution. Here’s the same chorale, except with the voicings chosen based on my heuristic:

       S     A      T      B
 E5    G4     C4     C3   
 F5    A4     C4     C3   
 G5    C5     E4     C3   
 D5    D4     G3     B2   
 E5    G4     G3     C3   
 E5    A4     C4     A2   
 F5    A4     D4     D2   
 D5    B4     D4     G2   
 C5    G4     E4     C2

I IV6/4 I V6 I vi ii V I

Uniqueness

Even still, always choosing the best voicings may cause the generated chorales to be boring. Simulating human creativity requires some spontaneity and variation. In order to add some randomness into the chorales, I added a “temperature” setting to the generator, which tweaks each voicing evaluation by a random offset. If the computer gets lucky, occasionally a less favorable, but more quirky voicing is chosen. The higher the temperature, the more randomness is introduced into the system. Here is the same C Major chorale, realized four times with varying temperatures:

The random generator is also seeded, so that subsequent runs with the same parameters will yield the same results. More chorales are shown in the gallery at the bottom of this page.

Conclusion

When I started this project, I secretly believed that if I just encoded all the rules, my computer would start writing beautiful music. Western music theory gives a long checklist of criteria: no parallel fifths, resolve the leading tone, and so on. I imagined following the rules would be enough to make a decent chorale.

Once I actually implemented the algorithm, I learned that the rules don’t generate good music on their own. My first chorales were technically correct, but they sounded stiff and awkward. They passed all the requirements, but lacked flow. Even though music theory can explain how good music behaves, it isn’t enough to make good music on its own. I had to code my own heuristic, my own value judgements, in order to get the program to generate elegant voice leadings, and then add random noise to imitate creativity.

This project made me understand what makes music truly meaningful. A computer can easily identify parallel fifths and search enormous problem spaces, but has a hard time finding human meaning in their logic. My custom heuristic was only a crude approximation of what tasteful writing looks like. The computer only thinks about the next chord. It doesn’t plan an outline for the piece, or build any tension. It can be taught to resolve tendency tones, but it will never understand the true beauty of a Phrygian Half Cadence.

Perhaps that’s the moral of the story. The rules don’t make good music, they only serve to explain it. Only taste can tell you whether music sounds right.

In the era of computing, when execution is cheap and correctness is abundant, personal taste is the moat.