0

Suppose there are two threads accessing a global variable var that is protected by a lock:

#include <pthread.h>
#include <stdio.h>
 
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static int var;
 
void *thread1_func(void *ptr) {
    pthread_mutex_lock(&lock);
    var = var + 1;
    printf("Thread 1: %d\n", var);
    pthread_mutex_unlock(&lock);
}
 
int main() {
    pthread_t thread1;
 
    var = 0;
    pthread_create(&thread1, NULL, thread1_func, NULL);
 
    pthread_mutex_lock(&lock);
    printf("Thread 0: %d\n", var);
    pthread_mutex_unlock(&lock);
 
    pthread_join(thread1, NULL);
 
    return 0;
}

Assume the instructions are interleaved as follows:

Thread 0:                                                  Thread 1:
1. var = 0;
2. pthread_create(&thread1, NULL, thread1_func, NULL);
                                                           3. pthread_mutex_lock(&lock);
                                                           4. var = var + 1;
                                                           5. printf("Thread 1: %d\n", var);
                                                           6. pthread_mutex_unlock(&lock);
7. pthread_mutex_lock(&lock);
8. printf("Thread 0: %d\n", var); // Should print 1!
9. pthread_mutex_unlock(&lock);

The C standard avoids formally defining a concurrency model (source), therefore, it appears that a standards compliant C compiler is allowed to cache the value of var in a register across the call to pthread_mutex_lock(), causing "0" to be printed at step 8. However, in practice I would assume implementations of the pthreads API do something to prevent this.

What do pthreads implementations do to prevent the compiler from caching the value of var in a register across a call to pthread_mutex_lock()? I'm interested in the specific annotation in the pthreads source code that conveys the necessary information to the compiler. Since there are many implementations of pthreads, feel free to restrict yourself to one in your answer. For example, glibc compiled for aarch64.

For this question, assume pthread_mutex_lock() is defined in the same file and the compiler is able to prove that it does not access var.

Jorge
  • 109
  • 5

2 Answers2

1

For this question, assume pthread_mutex_lock() is defined in the same file and the compiler is able to prove that it does not access var.

This is the key point: most extant implementations of pthreads avoid this problem by using an external definition of synchronising functions like pthread_mutex_lock() that aren't available to the compiler, preventing the compiler from proving anything about their access to global memory.

If you want to implement pthreads using a definition of pthread_mutex_lock() that is available to the compiler, you will need to create or use some compiler-specific annotation that conveys the necessary information to the compiler.

caf
  • 233,326
  • 40
  • 323
  • 462
  • What if you use an external definition of `pthread_mutex_lock()` but enable link time optimizations? The linker would see the whole of the program, so wouldn't it be able to prove that it does not access `var`? – Jorge Jun 17 '21 at 08:45
  • @Jorge: With LTO things are a little different. Firstly the pthreads library is in most cases dynamically linked, and even when statically linked the pthreads library wouldn't itself be compiled with LTO, so only its imports would be visible to the linker. Most importantly though, common use of LTO post-dates pthreads, so from a practical perspective if a compiler wants to continue to be part of a working pthreads implementation it has to implement LTO in a manner that allows this to continue to work, which likely means conservative escape analysis for globals. – caf Jun 18 '21 at 00:43
  • Could you elaborate on what you mean by "only its imports would be visible to the linker"? I'm not sure what "imports" means in this context. Also, why wouldn't the pthreads library be compiled with LTO? – Jorge Jun 18 '21 at 03:33
  • Going back to my previous question: Suppose `var` is defined as `static`. In this case, external definitions cannot access `var`, so wouldn't the compiler infer that `pthread_mutex_lock()` does not access `var`? – Jorge Jun 18 '21 at 05:29
  • @Jorge: By imports I meant undefined symbols that will be resolved at link time. Although of course that doesn't mean the linked code couldn't use `dlsym()` to retrieve other symbol values at runtime. For a symbol with internal linkage (`static`), external modules can access it if its address escapes. This is what I meant by conservative escape analysis - assuming these addresses *do* escape rather than trying to prove they don't. The upshot is that if LTO had been in common use when pthreads was proposed, it would likely have required explicit compiler support to work as it does. – caf Jun 18 '21 at 12:27
  • Thanks for your answer. It seems like defining `var` as `static` isn't sufficient because its address could escape. Therefore, in practice the compiler doesn't even try to prove that the address of `var` doesn't escape. It just assumes it escapes, preventing the compiler from caching the value of `var` in a register. – Jorge Jun 18 '21 at 16:33
  • From your answer I'm gathering that there are no annotations in the pthreads source code for preventing the compiler from caching variables in registers across calls to `pthread_mutex_lock()`. It seems like there are no requirements in the C standard that prevent this either. However, in practice compiler writers validate their changes by running test suites that use pthreads. If a new optimization breaks pthreads, they don't push the change to upstream, even if the optimization doesn't violate requirements in the C standard. Is my understanding correct? – Jorge Jun 18 '21 at 16:49
  • @Jorge: It's hard to make any definitive statement like that because there isn't a single instance of "the pthreads source code" - it's a standard implemented by many different implementations, and perhaps some of those *do* use special compiler annotations (the ones I am aware of do not, though). If a C compiler aims to be part of a complete POSIX implementation, then the developers are going to ensure that the complete system doesn't violate the requirements in POSIX standard as well as in the C standard, yes. – caf Jun 21 '21 at 02:37
  • Would it be safe to say that compiler writers adhere to two standards, the C standard and the POSIX standard, where the POSIX standard is a superset of the C standard? If so, does this seem weird to you? I'm asking because I thought compiler writers only had to worry about the C standard. – Jorge Jun 21 '21 at 22:37
  • @Jorge: Compiler writers that have the goal of their compiler being part of a complete POSIX system do, sure (the C standard is in fact included within the POSIX standard by reference, so it is formally a superset). Compiler writers targeting non-POSIX systems, like QNX or Windows likely don't care, though. It doesn't seem weird to me: compiler writers concern themselves with whatever their users/customers concern themselves with. – caf Jun 22 '21 at 00:12
  • I find it weird that some of the compiler requirements are defined by C libraries (like pthreads). To me, it would seem more natural for compiler requirements to stand by themselves, without dependencies on external components. In any case, is the requirement that the compiler must not cache a variable in a register across a call to `pthread_mutex_lock()` explicitly written in the pthreads standard? If it is, could you point me to the relevant document? – Jorge Jun 22 '21 at 03:36
  • @Jorge: The requirement is that `pthread_mutex_lock()` is listed in [the set of functions that "synchronize memory with respect to other threads")[https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12]. The requirement is on the complete implementation, of which the C compiler is just one part. You could of course write a conforming C compiler that did cache variables across such calls, but you couldn't combine it with existing pthreads implementations to create a POSIX-conforming resulting system. – caf Jun 22 '21 at 07:53
  • @Jorge: Note also that even if you did write a C compiler that introduced an extension to mark a function as "synchronizes memory in the POSIX meaning of the term" then wrote a pthreads implementation for some kernel that used this extension, you'd have to propagate that function attribute recursively to all callers of such functions and of course make it available to the LTO. – caf Jun 22 '21 at 07:57
  • I dug into the glibc implementation of pthreads. It seems like glibc relies on built-in functions for `pthread_mutex_lock()` (see my answer below). Maybe these built-in functions guarantee that variables are not cached in registers across calls to `pthread_mutex_lock()`? – Jorge Jun 22 '21 at 23:25
  • @Jorge: There is more to it than that, because if the compiler only sees the declaration of `pthread_mutex_lock()` and not the definition which contains the call to the builtins, it still has to do the right thing. This is also the case if it only sees the declaration of a user-defined function where the implementation may call `pthread_mutex_lock()`. – caf Jun 24 '21 at 04:53
0

In the glibc version of pthreads, pthread_mutex_lock calls __lll_lock.

If no other threads are holding the lock, __lll_lock calls atomic_compare_and_exchange_bool_acq, which calls __arch_compare_and_exchange_bool_32_int, which calls __atomic_compare_exchange_n (this flow is for aarch64).

If the lock is held by another thread, __lll_lock also calls __lll_lock_wait_private, which calls atomic_exchange_acquire, which calls __atomic_exchange_n.

__atomic_compare_exchange_n and __atomic_exchange_n are built-in functions that come with the GCC compiler. These functions ensure that register values are flushed to memory before the call and that values are read from memory after the call.

To answer the question, there is no compiler annotation in the glibc version of pthreads per se, but the implementation relies on GCC built-in functions that guarantee that variables are not cached in registers across calls to pthread_mutex_lock.

Notice that __lll_lock passes __ATOMIC_ACQUIRE to the GCC built-in functions, which creates a happens-before constraint from the release semantic store to the acquire load. Memory barriers are used to force the compiler to flush register values to memory or reload register values from memory.

Jorge
  • 109
  • 5