Structured Concurrency in Robot Control

Robots need a way to control their various mechanisms to perform tasks, either step-by-step, or in parallel. Most FTC teams do this in a similar way. You might occasionally see some teams playing around with custom task queues and callbacks, but they usually have the same idea at the core: they allow the robot to work on a specific task repeatedly in a non-blocking manner and return so that the caller can run some other logic in parallel while the mechanisms make progress. But our new library, Shuttle, is weird. If we wanted to execute two tasks in parallel, we would write this:

public void releaseAndCollapse() throws InterruptedException {
  try (var scope = HardwareTaskScope.open()) {
    scope.fork(arm::collapse);
    scope.fork(() -> slides.goToPosition(RETRACTED));

    scope.join();
  }
}

When experienced programmers first encounter this code, they might get frustrated with the InterruptedException this method throws. Why do we need to deal with these icky exceptions? Then, they might correctly point out that all of these methods must be blocking, and that means that we have to deal with multithreading. They might even find it pretentious to use such an advanced concept to accomplish a simple task.

In this post, we will explain why this system isn’t overcomplicated, annoying, or even unconventional, but instead a more intuitive way to structure control flow. In fact, we will even argue that other approaches should be entirely removed from robot code, and replaced with our conventions.

Synchronous and asynchronous code

If you already know what the difference between “asynchronous” and “synchronous” processes are, you can skip this section.

A synchronous method is a method that performs an action and blocks until it is complete. Most methods in Java are synchronous. Everything that happens on a computer, from number crunching to file creation is naturally asynchronous, but layers of abstraction around the asynchronous processes create an easy-to-use, synchronous interface for programmers. For example, Math#sqrt does not return until the calculation is complete, but deep down within the CPU, the sqrt() calculation happens in parallel with many other calculations. We don’t see this because the CPU carefully keeps track of them to avoid passing the burden of tracking these processes onto the programmer. We only need to think about the top-level, synchronous Math.sqrt(double) method that Java gives us.

The FTC SDK method DcMotor#setTargetPosition is asynchronous because there is no guarantee that the motor will have reached its target position when the method returns. A call to setTargetPosition corresponds to a LynxSetMotorTargetPositionCommand sent from the FTC SDK to the REV hub’s Lynx controller; most FTC SDK methods have 1:1 correspondences with Lynx commands. These methods do not indicate when the task might finish, and we need to write extra code to handle its logic. The lack of synchronous methods makes it difficult to compose actions under this programming model, and it can feel like you’re fighting the language when doing so. Patterns like finite state machines and command-based architectures can help compose sequences of these asynchronous methods, but they lack the intuitive, sequential control flow that programmers are used to. Synchronous methods are easier to debug and understand.

The FTC SDK does not include such abstractions, but this doesn’t mean we can’t build our own. Similar to how the CPU hides out-of-order execution away from Java programmers, our own abstractions will hide asynchronous hardware commands from whoever is writing our robot’s mechanisms.

(The problem with) Finite-state machines

A finite state machine (FSM) is a model with a set of possible states and transitions that can be made from one state to another. Most FTC teams implement their higher-level mechanism logic using FSMs. For example, the movement of a robot’s mechanism could be modeled via the evolution of its current state, propagated by a periodically-polled update() method which determines whether a transition should be made.

While FSMs are ubiquitous throughout hardware and embedded design, they become a burden to manage at a higher level of abstraction. Let’s consider a hypothetical robot outtake mechanism composed of two components: a pivoted arm mounted on the end of some linear slides. The robot would look something like this:

our hypothetical outtake mechanism has a set of linear slides and an arm.

With an FSM, the linear slides could move between the RETRACTED and LIFTED states, and the arm moves between the COLLAPSED and EXTENDED states. The overall outtake mechanism is considered CLOSED when the linear slides are retracted and the arm is collapsed, and OPEN when the slides are extended and the arm is extended. We can compose the outtake FSM using two smaller FSMs which represent each component like this.

class Outtake {
  private final OuttakeArm arm;
  private final OuttakeSlides slides;

  public enum State { CLOSED, OPEN };
  private State targetState;
  private boolean arrived;

  // ...

  public void update() {
    slides.update();
    arm.update();

    switch (targetState) {
      case OPEN:
        if (slides.timedOut()) {
          slides.setTargetPosition(RETRACTED); // what if this also fails?
          break;
        }
        if (slides.isBusy())
          break;
        else if (slides.targetPosition() != LIFTED) {
          slides.setTargetPosition(LIFTED));
          break;
        }
        if (arm.timedOut()) {
          arm.setTargetPosition(COLLAPSED);
          break;
        }
        if (arm.isBusy())
          break;
        else if (arm.targetPosition() != EXTENDED) {
          arm.setTargetPosition(EXTENDED);
          break;
        }
        arrived = true;
        break;

      case CLOSED:
        if (arm.timedOut()) {
          arm.setTargetPosition(EXTENDED);
          break;
        }
        if (arm.isBusy())
          break;
        else if (arm.targetPosition() != COLLAPSED) {
          arm.setTargetPosition(COLLAPSED);
          break;
        }
        if (slides.timedOut()) {
          slides.setTargetPosition(LIFTED);
          break;
        }
        if (slides.isBusy())
          break;
        else if (slides.targetPosition() != RETRACTED) {
          slides.setTargetPosition(RETRACTED));
          break;
        }
        arrived = true;
        break;
    }
  }
}

While it is possible to program an entire robot using this pattern, it becomes increasingly tedious to compose FSMs and implement error handling. The code feels very contrived: we’re handling the mechanism logic out of order, and we’re funneling all the behavior of our code into one long, non-blocking method. When we write code this way, the language can’t help us. We end up forfeiting the natural benefits of blocking code, and are forced to instead rely on messy patterns in order to implement basic synchronization policy.

It’s not just FSMs.

Ultimately, this style of coding is difficult to reason about because we’re thinking about real-time hardware actions asynchronously. Many other asynchronous design patterns such as command-based architectures are susceptible to the same problem: we’re forcing our intuitive mental model of the robot into an unnatural framework. Although libraries and abstractions can make these patterns easier to use, they still lack the natural control flow that programmers can easily understand. The Java language is more useful to us when we can understand our mechanisms through a structured control flow of step-by-step actions, rather than contrived asynchronous logic.

A brief history of structured programing

In the ancient past, programming languages were fairly basic and closely mimicked the raw capabilities of computers. You could execute a series of instructions, or goto to another instruction location. In the 1960s and 1970s, computer scientists, such as Edsger W. Dijkstra, became obsessed with how programmers could write complex programs that were bigger than what they could contemplate all at once. Djikstra was concerned about abstraction: the idea of treating an arbitrarily complex section of code as a “black box”. He wanted programmers to be able to use method calls without having to fully understand how the code behind it worked. For example, when analyzing this code:

System.out.println("hello, world!");

A programmer does not have to fully understand the code for System.out.println to know whether the control flow will work. System.out.println can be treated as a black box that will run some code and eventually return back to the original program to continue running. However, Djikstra saw that goto statements destroy this form of abstraction. Any code that uses a goto statement could mean that it might send the program wherever it wants to and never return. No matter how cautious programmers were with these techniques, a single uncaught mistake could compromise their entire system. Djikstra therefore reasoned that any code that might use goto statements could never be trusted: in order to inspect it, would have to read through the entire system’s code to see whether a single code snippet will flow correctly. Djikstra argued that goto statements broke encapsulation, and advocated for its removal from high-level programming.

In the end, Djikstra won. The once-fashionable goto statements were eliminated from most programming languages and replaced with conditional blocks and for/while loops. Unrestricted control flow access was confined to the bottom level of computer programming, and structured programming reigned supreme.

We aren’t done yet

Similar control flow problems still affect the way we program our robots (and any concurrent code, for that matter). Every time our code uses an asynchronous device, we risk losing control of our program. A single logic error could trigger a chain reaction that causes the robot to do unexpected actions indefinitely, and we would never know by looking at our top-level code. No debugger trace or profiler could reveal the issue in our code, and we’d be forced again to debug the entire system’s code to find the problem. Just like goto statements, asynchronous code fundamentally breaks the control flow guarantees that enable safe abstraction.

The solution: make things synchronous

The obvious solution is to make our robot actions block until all its actions are finished, just like ordinary code. This means that all tasks that happen can be traced back to the original method call, and the control flow of code is entirely contained. Not only is the control flow safer, but we can also use debuggers to pinpoint the source of any potential issues.

But how exactly can we do that with our robot code? Our robot interacts directly with hardware, so at the lowest level, it must use an FSM. Just like how computer scientists restricted goto statements to the bottom level of their programming languages, we can restrict the usage of FSMs to the very bottom-level of our code. Only our most basic motor abstractions will use FSMs to interface with direct hardware commands, and they will be wrapped with a blocking API. This way, the entirety of our mechanism logic can be kept synchronous and easily debuggable.

Synchronous abstractions

Once we make a synchronous foundation for our motors, we can neatly compose our methods using a hierarchy of individual, blocking tasks. Our previously complex code can be greatly simplified to this:

class Outtake {
  private final OuttakeSlides slides;
  private final OuttakeArm arm;
  // ...

  public void goToDeposit() throws InterruptedException {
    slides.goToPosition(LIFTED);
    arm.extend();
  }

  public void releaseAndCollapse() throws InterruptedException {
    arm.collapse();
    slides.goToPosition(RETRACTED);
  }
}

The use of exceptions such as InterruptedException actually becomes very intuitive with this code! Because we are using imperative code to represent tasks, Java already has a built-in exception model that can help us. Tasks can be easily canceled when necessary by throwing an exception from the bottom-level motors and letting it cascade up the calling stack until it gets caught and appropriately handled, giving us a neat abstraction hierarchy without any extra work.

Why isn’t everybody using it?

The above example looks great, but it’s missing something critically important: concurrency. Blocking code is great at modeling sequences that run step-by-step but it’s not great at modeling parallel sequences. This weakness is why blocking APIs are rare in robotics: concurrent actions with hardware are so common that they must be easy to create. This is fairly unusual: in most high-level programs, concurrency is rare enough that developers can afford to use low-level synchronization primitives, which are notoriously difficult to test and debug. Traditional concurrency involves spawning threads and managing them through a series of locks and queues and require a very different style of programming than regular, single-threaded execution. This style of programming is extremely powerful, but is also unbelievably difficult to write and maintain. Using raw threading would again destroy the safety and encapsulation that our synchronous code promised.

Structured concurrency

Structured concurrency, however, is a design pattern that allows parallel operations to run without the complex, messy, and unsafe synchronization patterns that come with it. Structured concurrency vastly simplifies concurrent programming by trading a small loss in customizability for a huge gain in ease of use. Structured concurrency lets you easily spawn parallel subtasks, on the condition that you must clean them up before you move on. Subtasks act as a “black box” (which could then nest even more subtasks) that block the parent scope until finished. The control flow of our mechanisms could be confined flow chart that looks like this:

structured concurrency creates strict entry and exit points for a scope based on the lifetime of its spawned subtasks.

HardwareTaskScope

Shuttle’s HardwareTaskScope is a lightweight structured concurrency implementation based on Project Loom’s StructuredTaskScope, a resource with a confined scope and lifetime. It’s specifically designed for robot tasks which often require the successful completion of all subtasks. This pattern allows us to easily “fork” subtasks to start running in the background, and wait for all of them to “join”. For example, we could simultaneously move the slides and arm like this:

public void goToDeposit() throws InterruptedException {
  try (var scope = HardwareTaskScope.open()) {
    scope.fork(() -> slides.goToPosition(LIFTED));
    scope.fork(arm::extend);

    scope.join();
  }
}

This system always encapsulates subtasks, and even preserves the exception model we used earlier: an exception in any one of the subtasks will automatically shut down the other tasks, and then get appropriately rethrown in the main thread. If we want to handle another type of exception such as TimeoutException, we could declare it when opening our scope:

private void goToPosition0(double position) throws InterruptedException, TimeoutException {
  try (var scope = HardwareTaskScope.open(TimeoutException.class)) {
    scope.fork(() -> left.goToPosition(position));
    scope.fork(() -> right.goToPosition(position));

    scope.join();
  }
}

This system drastically reduces the amount of work we need to do when error handling. We could use this to implement a complex, fault-tolerant linear slides controller with custom error handling like this:

class OuttakeSlides {
  private final OuttakeMotor left;
  private final OuttakeMotor right;

  // ...

  public void goToPosition(OuttakeSlidePosition position) throws InterruptedException {
    double oldPosition = left.getPositionMeters();
    try {
      ensureMotorsEngaged();
      goToPosition0(position.position);
    } catch (TimeoutException e) {
      handleJam(position.position, oldPosition);
    }
  }

  /** basic example: go back and try again */
  private void handleJam(double targetPosition, double oldPosition) throws InterruptedException {
    double currentPosition = left.getPositionMeters();
    try {
      double recoveryPosition = currentPosition < targetPosition ?
        currentPosition - 0.1 // retract a little
        : MAX_HEIGHT; // fully extend slides to unjam

      goToPosition0(recoveryPosition);
      goToPosition0(targetPosition); // try again
      // if successfully recovered, return without exception
    } catch (TimeoutException e) {
      try {
        goToPosition0(oldPosition);
        throw new RuntimeException("Slides failed: returned to previous position");
      } catch (TimeoutException e) {
        disengageMotors();
        throw new RuntimeException("Slides completely stuck: temporarily disengaged motors");
      }
    }
  }
}

The best part about this is that we’re writing ordinary Java code! Anyone who knows basic Java syntax could take a guess at what the code does: code gets run line-by-line, and can also be run in parallel when forked. This makes it much easier for people to contribute to code, and drastically reduces the amount of overhead that developers have to deal with.

It’s really intuitive

Switching to a synchronous code base might be difficult for robotics programmers who are used to writing asynchronous APIs. It involves learning a new way to program robots that might feel complex, just like how many programmers in the 1970s had to learn to restructure their code using if and while loops instead of goto statements. As Donald Knuth wrote in 1974:

Probably the worst mistake any one can make with respect to the subject of go to statements is to assume that “structured programming” is achieved by writing programs as we always have and then eliminating the go to’s. Most go to’s shouldn’t be there in the first place! What we really want is to conceive of our program in such a way that we rarely even think about go to statements, because the real need for them hardly ever arises. The language in which we express our ideas has a strong influence on our thought processes. Therefore, Dijkstra asks for more new language features – structures which encourage clear thinking – in order to avoid the go to’s temptations towards complications.

The same applies to programming robots. Although asynchronous logic is highly customizable, it tends to create unnecessary complications. Modeling robots intuitively as a synchronous system is what we should be doing in the first place! Ordinary Java code is blocking, and there shouldn’t be any reason to program robots any differently. Synchronous code can be just as powerful as asynchronous code, and help encourage clear, step-by-step thinking when programming the higher-level logic of our robots. The mental gymnastics that we had to do when converting our mental model of our robot into FSM code could be entirely eliminated if we programmed our robots with ordinary Java code.

HardwareTaskScope in practice

The only way to really know how well it works is to try it out yourself! Structured Concurrency is a relatively new concept and, as of writing, has never before been applied to program physical mechanisms or coordinate moving parts. With anything new, it might need refinement and further modification before being adopted. We would greatly appreciate it if you tried using synchronous code on your robots, and provided feedback on your experience.

We programmed our entire robot with a blocking API at Kuriosity Robotics (FTC 12635) for the Center Stage (2023-2024) season. Although the mechanisms were more complicated than we would have hoped, the limitations that synchronous programming created definitely encouraged clear thinking and made our code more robust. Frequently, the Java language helped us out while coding by ensuring that our code appropriately handled exceptions, and debugging tools assisted debugging by displaying a full stack trace detailing exactly what triggered an exception when driving the actual robot. Using structured concurrency has greatly improved code quality across the board.

The code for the Shuttle library is available on GitHub, and has also been packaged into a Gradle dependency that can be easily imported into the FTC SDK. Our Center Stage FTC robot code, which uses structured concurrency, is also available here.

One of the most cumbersome parts of programming an FTC robot is making it work during the Autonomous period. It often requires complex timing and parallelism, which can be very difficult to implement using non-blocking code. We were actually surprised at how simple the auto sequence could be. Structured concurrency made it very intuitive to organize tasks, enabling lightweight and easy customization- no pesky while loops or wait conditions.

Conclusion

Asynchronous patterns such as finite state machines are powerful, but also incredibly dangerous and tricky to use. In practice, they can easily lead to uncontained control flow, and become difficult to compose at a higher level, making code needlessly complicated. Asynchronous code can’t use many of Java’s basic control flow tools as simple as sequential method calls and branching if statements. Often, it leads to tangled and unreadable logic, obscuring the original intent of your code.

Synchronous code, on the other hand, allows us to regain the many benefits of our programming language, such as stack traces, cancellations, and exceptions directly in Java, making our code much easier to understand and debug. However, we can only get these benefits by eliminating FSMs from all higher-level code and abstractions. This means creating new concurrency frameworks, just like how computer scientists had to invent new programming languages to get rid of the goto.

Structured concurrency provides a safe and convenient way to parallelize code that preserves all the benefits of synchronous code, and can improve overall productivity, reliability, and readability. Our experiments with Shuttle pioneering the use of structured concurrency in FTC demonstrate that this is a practical and viable model for controlling robot mechanisms.