What are Real-Time Systems?

Explaining research ideas clearly was perhaps my most difficult undertaking while pursuing a PhD. More often than not, my explanations were frankly incomprehenible and that is entirely my own fault. But sometimes my audience had inaccurate ideas and misplaced expectations about the subject matter. I found real-time systems to be one of those misunderstood topics. In the remaining of this post, I will attempt to clear up some of the confusion and explain what real-time systems are.

NOTE: This is the first post of a two-part series about real-time systems. The following post will mostly talk about hard real-time analysis.

What are Real-Time Systems?

Every applications imposes a set of constrains that computer engineers are constantly battling to meet. Of course, a constraint is (almost!) always that the program should produce the correct result. But it is also common to want a system to perform a task as fast as possible, these are run-time constraints, or to ensure that a device’s battery life lasts longer, as in a constraint on power consumption. A real-time system is simply a system that has timing constraints in addition to correctness and (potentially) run-time or power requirements. In a real-time system, the time when a result is produced (i.e. before a deadline) is as important as ensuring that the result is correct.

Lets consider a couple of use-cases. Classic examples of real-time applications are safety-critical systems such as cardiac pacemakers, engine control systems and avionics. Clearly, these systems must always operate correctly and deliver results on time (i.e. before a deadline) otherwise patients may die, your car engine might explote or a plane could crash. Interactive systems are also good examples of real-time systems, although not quite as critical, such as mobile phones or desktop PCs. For example, we like responsive PCs that keep the mouse pointer moving smoothly, windows switching quickly and videos or music playing clearly. These are all real-time tasks and the deadline usually is “before the user gets annoyed”. Unresponsive PCs make unhappy and frustrated users, and I have been known to angrily strike the keyboard once or twice when this happens!

NOTE: A coincidentally pedantic note on the use of the words run-time and runtime. Lots of people (myself included!) often use these words interchangeable, but I was very fortunate to have been corrected by someone peer reviewing my journal submission. So nowadays I use the word run-time to mean the length of time that something takes to run. I also refer to a program’s status of being running as run-time, e.g. as in variables are resolved at run-time. In contrast, I use runtime to mean the environment in which a program is running, e.g. as in a runtime library. These definitions are by no means perfect, but I strive to be consistent!

Real-Time Systems are NOT…

Fast, power-efficient, secure, small, etc. Or rather, real-time systems are not necessarily fast, power-efficient, secure, small, etc. These are individual constraints that a system could be required to meet, but neither of them implies the other. Many people are particularly puzzled to hear that a fast system, i.e. the run-time of a task is very short, does not mean that it is real-time. In fact, real-time are systems often slower than their non-real-time counterparts due to the overheads inherent to concurrent operation, such as context switching, synchronization, etc.

Garbage collection is an example that neatly illustrates this real-time and run-time performance tradeoff. The collector is a component of the runtime environment responsible for automatically reclaiming unused memory and repurposing it to fulfil future dynamic memory allocation requests. Its work is critical to both the real-time and run-time performance of many programming languages like Python, JavaScript, Java, etc.

Stop-the-world garbage collectors.

Stop-the-world garbage collectors. Each bar represents the execution in a single core. White regions are the execution of the user's program threads and shaded regions correspond to the collector's execution.

The simplest and fastest collectors are stop-the-world: the user’s program is stopped while the collector does its job and then the program is restarted as shown above. The run-time performance of these collectors is great because there is little interaction with the user’s program. In fact, it is straight-forward to run these collectors in a single-core or in parallel across multiple cores to reclaim memory faster. But the real-time performance of the system is atrocious either way because the user’s program is stopped (i.e. paused, does not run) while the collector runs, so any task that the program is supposed to do is delayed. Concurrent collectors mitigate this real-time problem by either running collection operations interleaved or in parallel with the program as shown below.

Concurrent garbage collectors.

Concurrent garbage collectors.

However, concurrent collectors require a lot of synchronization to make sure the system works properly. Sadly, the extra synchronization often incurs high run-time overheads although this is unavoidable for the system to operate in real-time.

You can clearly see this tradeoff for yourself in this presentation about the JavaScript V8 Engine which is used in Chrome. Ideally, the web browser runs at 60 fps to avoid jank, so each frame must be generated in about 16 ms. But using a stop-the-world garbage collector caused earlier Chrome versions to often miss these deadlines. More recent Chrome versions use a mostly concurrent collector. The video below compares the very visible effects of both collectors.

Types of Real-Time Systems

You might have noticed that not all real-time systems are the same. It is important to consider the kind and strictness of timing requirements when engineering such systems. So it is common to clasify real-time systems as:

Soft Real-Time: There are many systems, like phones, tablets, etc, where it is desirable to meet deadlines, but there is no harm if deadlines are missed occasionally. For example, we said that ideally Chrome runs at 60 fps although nobody will die if sometimes it does 55, 58, 56 fps! In other words, it is desirable for the real-time system to meet all its deadlines, but in practice, its acceptable if some deadlines are missed because the worst consequece is perhaps an annoyed user.
Hard Real-Time: The system must always meet deadlines if hard real-time operation is required. Failure to meet a deadline is equivalent to an incorrect calculation and can result in catastrophic consequences such as the death of a patient or an airplane crashing. Cardiac pacemakers and avionics are good examples of hard real-time systems.

Naturally, the approach for engineering and testing soft real-time systems is a lot more informal. The system is developed and empirical experiments are run in various settings to ensure that most deadlines are met i.e. that the frequency of missed deadlines is below a threshold. I have even heard of engineers testing soft real-time systems with slow processors and then switching to faster ones (or cranking up the clock!) just to make extra sure that the final product does meet most deadlines. In contrast, there are formal static analysis methods to evaluate hard real-time systems and guarantee that no deadlines are missed under any circumstance. I will describe one such method in the next blog post.

Conclusion

In conclusion, real-time systems are simply systems that have timing contraints. That is, the device must perform a task before a deadline. Broadly speaking, real-time devices are either soft or hard real-time depending on how strict we need to be about meeting deadlines. In the following post, we will focus almost exclusively on how to statically analyze hard real-time programs to guarantee that timing requirements are always met.