Concurrency and Multithreading in C++17

Introduction

Concurrent programming allows a C++ program to execute multiple tasks simultaneously (or at least appear to). In practice, this means breaking a program into threads that run independently. Threads are units of execution that can run concurrently and potentially in parallel on multi-core systems. C++17 builds on the threading facilities introduced in C++11, providing standard tools to create threads, synchronise access to shared data, and avoid race conditions. In this post, we’ll explore key multithreading concepts in modern C++: starting threads with std::thread, using mutual exclusion with std::mutex, coordinating threads with condition variables (std::condition_variable), and leveraging atomic operations with std::atomic. Along the way, we’ll look at code examples and discuss best practices to write safe and efficient multi-threaded C++17 code.

What is Concurrency? In simple terms, concurrency is the ability for different parts of a program to execute out-of-order or in partial overlap, without affecting the final outcome (Category:Concurrency (computer science) - Wikimedia Commons). This is distinct from parallelism – truly executing at the same time – but on modern multi-core processors, threads may indeed run in parallel. The C++ standard library’s concurrency support (largely introduced in C++11) gives developers a portable way to write multi-threaded code, moving beyond platform-specific APIs. Let’s start by looking at how to spawn new threads in C++.

Starting Threads with `std::thread`

C++17 uses the std::thread class (defined in the <thread> header) to represent and manage threads of execution. A std::thread object starts running as soon as it’s created (std::thread - cppreference.com), invoking a function or callable that you provide. For example, we can launch a simple thread that prints a message:

#include <iostream>
#include <thread>

void printMessage(const std::string& msg, int id) {
    std::cout << "Thread " << id << ": " << msg << "\n";
}

int main() {
    std::thread t1(printMessage, "Hello from thread", 1);
    std::thread t2(printMessage, "Hello from thread", 2);
    std::cout << "Threads launched\n";
    // Wait for the threads to finish execution
    t1.join();
    t2.join();
}

In this example, the main program spawns two threads (t1 and t2), each executing the printMessage function with different arguments. The threads begin executing immediately upon construction (subject to OS scheduling) (std::thread - cppreference.com). The main thread then continues to its next statement (printing “Threads launched”) without waiting for the new threads to complete. We call join() on each thread to block the main thread until those threads finish. The output might interleave the thread messages with the main thread’s output, e.g.:

Threads launched  
Thread 2: Hello from thread  
Thread 1: Hello from thread

The exact order may vary on each run, which is a hallmark of concurrent execution.

Joining and Detaching Threads: It is crucial to either join or detach every thread that you spawn. Calling join() waits for the thread to finish, as shown above. Alternatively, you can call detach() to allow the thread to run independently (in the background, with no way to synchronise with it thereafter). If a std::thread object is destroyed while still joinable (i.e., it represents an active thread that hasn’t been joined or detached), the C++ runtime will call std::terminate() and likely abort the program (multithreading - C++ - std::thread crashes upon execution - Stack Overflow). This design prevents difficult bugs that could arise from threads continuing after their std::thread object goes out of scope. In summary, always ensure each thread is either joined (if you need to wait for it) or detached (if it should run on its own) before the thread object is destroyed.

Passing Arguments: As seen above, std::thread can take a function pointer or any callable (such as a lambda) and parameters for that function. The parameters are copied (or moved) into the new thread’s context. Be mindful when passing pointers or references to avoid accessing data that might go out of scope. For example, passing a pointer to a local variable into a detached thread can lead to undefined behaviour if the local variable is destroyed while the thread is still running.

Race Conditions and Mutual Exclusion (Mutexes)

When multiple threads access the same data without proper coordination, you may hit a race condition – a situation where the program’s outcome depends on the unpredictable timing of threads. Let’s illustrate a simple race condition:

#include <thread>
#include <vector>
#include <iostream>

int counter = 0;  // shared data

void increment() {
    for (int i = 0; i < 100000; ++i) {
        ++counter;  // increment shared counter
    }
}

int main() {
    std::vector<std::thread> threads;
    // Launch multiple threads that increment the counter
    for (int i = 0; i < 4; ++i) {
        threads.emplace_back(increment);
    }
    // Wait for all threads to finish
    for (auto& t : threads) {
        t.join();
    }
    std::cout << "Final counter value: " << counter << "\n";
}

We might expect the final counter to be 400000 (since four threads increment it 100000 times each), but in practice, the result is often lower and non-deterministic. This is a classic race condition: the threads interfere with each other when updating counter. The ++counter operation is not atomic – internally it involves a read, an addition, and a write, which can be interleaved between threads. In a run of this program, two threads might read the same old value of counter and then both write back updates, losing one increment. A data race like this leads to undefined behaviour in C++ (Multi-threaded executions and data races (since C++11) - cppreference.com), so we need to prevent it.

The typical solution is to use a mutex to protect the shared resource. A mutex (mutual exclusion) is a synchronisation primitive that allows only one thread to access a section of code at a time (std::mutex - cppreference.com). Think of it as a lock: a thread must acquire the mutex before entering a critical section (the code that accesses shared data), and release it when done, so that other threads can proceed. In C++, we use std::mutex from <mutex>:

#include <mutex>
std::mutex counterMutex;

void incrementSafe() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(counterMutex);
        ++counter;
        // mutex is automatically released at end of scope (lock_guard destructor)
    }
}

Here we introduced a global counterMutex. In incrementSafe(), each iteration locks the mutex before incrementing and unlocks upon leaving the scope (thanks to std::lock_guard). std::lock_guard is a convenient RAII wrapper that locks a given mutex upon construction and unlocks it when destroyed (when leaving scope) (std::lock_guard - cppreference.com). By using lock_guard, we ensure the mutex is released even if an exception occurs within the block, making our code exception-safe. Only one thread can hold the mutex at a time, so the increments on counter are serialised – one thread’s loop iteration will exclude others until it finishes incrementing. This guarantees correct results, at the cost of some performance due to threads waiting their turn.

Let’s apply this fix to our example:

void incrementSafe() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(counterMutex);
        ++counter;
    }
}

int main() {
    // ... launch threads with incrementSafe instead ...
    for (auto& t : threads) t.join();
    std::cout << "Final counter value: " << counter << "\n";
}

Now the final counter value will reliably be 400000. We have removed the race condition by preventing concurrent access to the shared variable. Note that locking has a runtime cost, and if the critical section is very small (like just incrementing a variable), a mutex might be overkill – we’ll discuss alternatives like atomic variables shortly. However, for more complex shared structures, mutexes are the go-to tool.

Mutex Basics: A std::mutex provides exclusive ownership: when one thread locks it, other threads attempting to lock will block until it is unlocked (std::mutex - cppreference.com). If a thread tries to lock the same mutex twice (without unlocking first), it will deadlock itself (standard std::mutex is non-recursive). Always design your locking carefully to avoid deadlocks where two or more threads are waiting indefinitely for each other to release locks. A simple deadlock scenario is if thread A locks mutex X then mutex Y, while thread B locks mutex Y then mutex X – each will wait forever for the other. In general, to prevent deadlock, ensure all threads lock multiple mutexes in a consistent global order, or use higher-level concurrency primitives when possible.

Deadlocks can be subtle in larger programs. As a rule, keep the duration for which a mutex is locked as short as possible (hold a lock only for the necessary operations, then release). This maximises concurrency and minimises the chance of cyclic lock dependencies. If you need to lock multiple mutexes, C++17 offers std::scoped_lock to lock several at once (avoiding intermediate interleavings), or you can use std::lock() on multiple mutexes safely.

(Side note: “deadlock” formally means a set of threads are all blocked, each waiting for a resource owned by one of the other threads, such that none can proceed (Deadlock). Avoiding deadlocks is a key part of multithreaded program design.)

Condition Variables for Thread Synchronisation

Mutexes provide mutual exclusion, but they don’t by themselves provide a mechanism for threads to wait for certain conditions or events. This is where condition variables come in. A std::condition_variable (from <condition_variable>) allows one or more threads to wait (sleep) until some condition is met and another thread signals them to wake up (std::condition_variable - cppreference.com). Condition variables are often used in conjunction with a mutex to coordinate producer-consumer scenarios or other situations where threads need to rendezvous.

(What is producer consumer problem in C? | Scaler Topics) An illustration of a producer-consumer setup. One thread (the producer) inserts data into a shared buffer, and another thread (the consumer) removes data from it. A condition variable allows the consumer thread to wait until new data is produced before consuming.

For example, imagine one thread is producing data and another is consuming it. The consumer should wait (not loop continuously) until data is available. We can achieve this with a condition variable:

#include <queue>
#include <condition_variable>

std::queue<int> dataQueue;
std::mutex dataMutex;
std::condition_variable dataCond;
bool finished = false;  // flag to indicate producer is done

// Consumer thread function
void consumer() {
    std::unique_lock<std::mutex> lock(dataMutex);
    while (!finished) {
        // Wait until dataCond is notified and condition (queue not empty or finished) is true
        dataCond.wait(lock, []{ return !dataQueue.empty() || finished; });
        // Now we have the lock again and either queue is not empty or finished is true
        while (!dataQueue.empty()) {
            int item = dataQueue.front();
            dataQueue.pop();
            lock.unlock();  // unlock while processing item
            std::cout << "Consumed: " << item << "\n";
            lock.lock();
        }
    }
}

// Producer thread function
void producer() {
    for (int i = 1; i <= 5; ++i) {
        {
            std::lock_guard<std::mutex> lock(dataMutex);
            dataQueue.push(i);
            std::cout << "Produced: " << i << "\n";
        } // release lock before notifying
        dataCond.notify_one();  // wake up consumer
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    // Signal the consumer that production is finished
    {
        std::lock_guard<std::mutex> lock(dataMutex);
        finished = true;
    }
    dataCond.notify_one();
}

In this example, the consumer thread waits on dataCond for the condition “queue not empty or finished” to become true. The wait(lock, predicate) call atomically releases the mutex and suspends the thread until notify_one() is called on the condition variable and the predicate returns true. When another thread (the producer) pushes data and calls notify_one, the consumer wakes up, reacquires the lock, and checks the condition. We use a loop to handle spurious wake-ups – a condition variable may wake up without a notification, so the condition must be re-checked (std::condition_variable - cppreference.com). The predicate version of wait does this logic internally, looping until the condition is satisfied.

A few important points about condition variables in C++:

A condition variable must always be paired with a mutex that guards the shared state (in our case, dataMutex guards access to dataQueue and finished). The waiting thread should hold the mutex before waiting, and the mutex will be released while waiting and reacquired upon wake-up (std::condition_variable - cppreference.com).
std::unique_lock<std::mutex> is typically used for waiting, rather than a lock_guard. A unique_lock can be unlocked and locked, and wait requires a unique_lock so it can atomically unlock the mutex during wait (std::condition_variable - cppreference.com).
Use notify_one() to wake a single waiting thread, or notify_all() to wake all waiting threads (if, for example, multiple consumers might be waiting for work).
Always check the condition in a loop after waking, because notifications can be lost or wake-ups can happen without a notification (i.e., spurious wake-ups). The helper wait(lock, predicate) takes care of this by looping internally until the predicate is true.
Condition variables are lower-level primitives; higher-level message-passing or task frameworks can sometimes simplify thread coordination, but condition variables are versatile for many patterns like the producer-consumer.

In our producer-consumer code, the output might look like:

Produced: 1  
Produced: 2  
Consumed: 1  
Produced: 3  
Consumed: 2  
Produced: 4  
Consumed: 3  
Produced: 5  
Consumed: 4  
Consumed: 5

The consumer prints items as they become available. We carefully unlocked the mutex while processing an item (around the std::cout) to allow the producer to run concurrently – holding locks only when necessary is a good practice.

Atomic Operations

Mutexes ensure exclusive access but come with overhead and complexity (risk of deadlocks, etc.). For simple shared variables like counters, atomic operations can be a lighter alternative. C++17 provides the <atomic> header, which defines atomic types such as std::atomic<int>. An atomic variable provides operations (like ++, store, load) that are indivisible and thread-safe. In other words, if one thread modifies an atomic variable while another reads it, the outcome is well-defined and no data race occurs (Concurrency support library (since C++11) - cppreference.com) (c++ - What exactly is std::atomic? - Stack Overflow). This can offer a more efficient way to coordinate simple state.

Revisiting our earlier counter example, we can eliminate the mutex by using an atomic counter:

#include <atomic>
std::atomic<int> atomicCounter(0);

void incrementAtomic() {
    for (int i = 0; i < 100000; ++i) {
        atomicCounter++;
    }
}

Now multiple threads can increment atomicCounter concurrently without data races. Under the hood, these increments might use special CPU instructions to ensure atomicity. Each atomic operation is guaranteed to happen fully before any other atomic operation on the same object begins (Concurrency support library (since C++11) - cppreference.com). After running four threads with this incrementAtomic function, atomicCounter will reliably end up as 400000, just like the mutex-protected version, but likely with less overhead.

It’s important to note that while atomics avoid explicit locking, they are not a magic bullet for all concurrency problems. They work best for simple shared variables or flags. If you have multiple related variables that need to be kept in sync, a mutex (to make larger sections of code atomic) or higher-level constructs might be needed. Moreover, atomic operations obey C++’s memory ordering rules – by default they enforce a total order (sequential consistency) on operations, which is the safest but not always the fastest. C++ allows relaxed or acquire-release memory orders for advanced use cases, but those are beyond the scope of this post. For most purposes, std::atomic with default memory order provides a straightforward way to get thread-safe operations on a single variable without the fuss of locks.

Best Practices and Common Pitfalls

Writing correct multithreaded code is challenging. Here are some best practices and potential pitfalls to keep in mind when using C++17 concurrency features:

Avoid Data Races: Any unsynchronised access to shared data (where at least one access is a write) is a bug. Data races lead to undefined behaviour (Multi-threaded executions and data races (since C++11) - cppreference.com). Use mutexes or atomics to protect shared variables. If a piece of data is only touched by one thread (or is immutable), then you don’t need synchronisation for that data.
Use RAII for Managing Locks: Prefer using std::lock_guard or std::unique_lock instead of manually calling mutex.lock() and mutex.unlock(). RAII wrappers ensure that locks are released when a scope is exited, even if exceptions are thrown, preventing deadlocks caused by forgotten unlocks. Similarly, consider wrapping thread management in RAII classes (or at least in try/catch) to ensure threads are joined even if exceptions occur.
Minimise Lock Granularity: Hold locks for the shortest duration necessary. Only lock around the critical section that truly needs exclusive access. This reduces contention and the chance of deadlocks. If possible, do work (especially expensive I/O or computation) outside the locked section.
Consistent Lock Ordering: When a situation requires multiple mutexes, always lock them in a consistent global order. Inconsistent ordering between threads can easily cause deadlock. If locking multiple mutexes at once, C++17’s std::scoped_lock (or std::lock with std::adopt_lock) can be used to lock without risking interleaved locking.
Beware of Deadlocks: Deadlocks occur when threads cyclically wait on each other and none can proceed (Deadlock). Aside from lock ordering issues, deadlock can also happen if a thread tries to lock a mutex twice, or if there’s a circular wait involving condition variables or other resources. Avoid long-held locks and design clear ownership of resources.
Condition Variable Usage: Always use a loop when waiting on a condition variable (or use the predicate overload of wait). Ensure the condition and associated state are protected by the same mutex. Also, be careful to notify after updating the condition state (as seen in the producer-consumer example, we unlocked before notifying to avoid waking the consumer before the state was ready).
Thread Lifetime and Exceptions: As noted, make sure threads are joined or detached properly. A common pitfall is to forget to join a thread before a function returns (leading to std::terminate). If your program throws exceptions, consider what happens to threads – you might need to catch exceptions, join threads, then rethrow, or use std::jthread (introduced in C++20) which automatically joins on destruction. C++17 doesn’t have jthread, so manual care is needed.
Use Atomics for Simple Flags/Counters: If you just need to signal a boolean flag or count events, use std::atomic instead of a full mutex+condition pair. For example, an atomic bool can be used to publish a “stop” flag to worker threads. This is simpler and often faster. But don’t mix atomic and non-atomic accesses to the same variable.
Tools and Debugging: Multithreading bugs can be non-deterministic and hard to reproduce. Use tools like ThreadSanitizer (available in many compilers) to catch data races. When debugging, try to simplify and enforce ordering (e.g., with logging or sleeps) to reproduce issues, but be aware that adding debug output can sometimes “heal” a race due to timing changes.

In conclusion, C++17 provides robust low-level constructs for concurrency that make cross-platform multithreading achievable in standard C++. We have std::thread for creating threads, mutexes and condition variables for coordinating access to shared resources, and atomic types for lock-free operations on single variables. By understanding these tools and following best practices, you can harness the power of multi-core systems, making your C++ programs faster and more responsive. Just remember that with great power comes great responsibility – always consider the complexity that threads introduce, and strive to write clear, well-synchronised code. Happy threading!