What is causing this error: SSE register return with SSE disabled?

Question

I'm new to kernel development, and I need to write a Linux kernel module that performs several matrix multiplications (I'm working on an x64_64 platform). I'm trying to use fixed-point values for these operations, however during compilation, the compiler encounters this error:

error: SSE register return with SSE disabled

I don't know that much about SSE or this issue in particular, but from what i've found and according to most answers to questions about this problem, it is related to the usage of Floating-Point (FP) arithmetic in kernel space, which seems to be rarely a good idea (hence the utilization of Fixed-Point arithmetics). This error seems weird to me because I'm pretty sure I'm not using any FP values or operations, however it keeps popping up and in some ways that seem weird to me. For instance, I have this block of code:

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>

const int scale = 16;
#define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
#define FIXED_TO_DOUBLE(x) ((x) / (1 << scale))
#define MULT(x, y) ((((x) >> 8) * ((y) >> 8)) >> 0)
#define DIV(x, y) (((x) << 8) / (y) << 8)

#define OUTPUT_ROWS 6
#define OUTPUT_COLUMNS 2

struct matrix {
    int rows;
    int cols;
    double *data;
};

double outputlayer_weights[OUTPUT_ROWS * OUTPUT_COLUMNS] = 
{ 
         0.7977986,  -0.77172316,
        -0.43078753,  0.67738613,
        -1.04312621,  1.0552227 ,
        -0.32619684,  0.14119884,
        -0.72325027,  0.64673559,
         0.58467862, -0.06229197
};

...

void matmul (struct matrix *A, struct matrix *B, struct matrix *C) {
    int i, j, k, a, b, sum, fixed_prod;
    
    if (A->cols != B->rows) {
        return;
    }

    for (i = 0; i < A->rows; i++) {

        for (j = 0; j < B->cols; j++) {
            sum = 0;
            for (k = 0; k < A->cols; k++) {
                a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
                b = DOUBLE_TO_FIXED(B->data[k * B->rows + j]);
                fixed_prod = MULT(a, b);
                sum += fixed_prod;
            }
            /* Commented the following line, causes error */
            //C->data[i * C->rows + j] = sum;
        }
    }
}


...

static int __init insert_matmul_init (void)
{
    printk(KERN_INFO "INSERTING MATMUL");
    return 0;
}

static void __exit insert_matmul_exit (void)
{
    printk(KERN_INFO "REMOVING MATMUL");
}

module_init (insert_matmul_init);
module_exit (insert_matmul_exit);

which compiles with no errors (I left out code that I found irrelevant to the problem). I have made sure to comment any error-prone lines to get to a point where the program can be compiled with no errors, and I am trying to solve each of them one by one. However, when uncommenting this line:

C->data[i * C->rows + j] = sum;

I get this error message in a previous (unmodified) line of code:

 error: SSE register return with SSE disabled
 sum += fixed_prod;
 ~~~~^~~~~~~~~~~~~

From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error. Maybe my fixed-point implementation is flawed (I'm no expert in that matter either), or maybe I'm missing something obvious. Just in case, I have tested the same logic in a user-space program (using Floating-Point values) and it seems to work fine. In either case, any help in solving this issue would be appreciated. Thanks in advance!

Edit: I have included the definition of matrix and an example matrix. I have been using the default kbuild command for building external modules, here is what my Makefile looks like:

obj-m = matrix_mult.o
KVERSION = $(shell uname -r)

all:
    make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules

Does `struct matrix` hold floating point data? With regards to the commented out line: if the function produces no output then the whole function probably gets optimized away. — aqrit, Jan 23 '22 at 22:06
Your compiler could be auto-vectorizing your inner loop. Can you [edit] your question to contain a [mre] (what is `matrix` and what compilation options are you using?) — chtz, Jan 23 '22 at 22:38
*"I'm pretty sure I'm not using any FP values or operations"* - hmm, your `DOUBLE_TO_FIXED(A->data[i * A->rows + k])` seems to suggest that you *are* indeed manipulating floating-point values in kernel code. What is the definition of `struct matrix`? — Marco Bonelli, Jan 23 '22 at 23:10
Without a definition for `struct matrix`, this isn't a [mcve]. Can you reproduce the same error on https://godbolt.org/z/sdPs57vxG with `gcc -O3 -mgeneral-regs-only` (which implies `-mno-sse` and so on; Linux kernel code is compiled with that option). With a simple test cast that uses `float` with those options, I can provoke that exact error message from just *using* float, not returning it from a function: https://godbolt.org/z/581Wszv38 — Peter Cordes, Jan 23 '22 at 23:49
Probably without the assignment, `sum` and all the calculations leading up to get it get optimized away, so even reading `A->data[i * A->rows + k]` doesn't lead to GCC trying to generate any FP-using asm. The FP->integer conversion is already optimized away by the time it gets to the point of emitting instructions. So that would explain how reading the matrix can be in your source code without causing the same error. (I'm assuming that `matrix::data` is a float or double array.) — Peter Cordes, Jan 23 '22 at 23:52
I have edited the question and included the missing code and my Makefile to make it reproducible (sorry about that). @MarcoBonelli yes, my mistake, I am indeed working with doubles. What I meant was that i'm not doing arithmetic operations on any floating-point operands, or at least it's not my intention to do so. I figured as long as all arithmetic operations are handled as fixed-point, there should be no problem, but to be honest this topic isn't really that clear to me. Please let me know if I am wrong. — kelos, Jan 24 '22 at 06:55
@PeterCordes Thanks, I have reproduced the error [here](https://godbolt.org/z/xdz31aE5W). — kelos, Jan 24 '22 at 06:59
Ok, seems seems pretty obvious that your code uses `double`, and converting it to integer and back uses FP instructions (https://godbolt.org/z/fW45fjTvd). If you stop the compiler from using FP instructions, it refuses to compile your file instead of doing software floating-point. The only odd thing is that it pins down the error to an integer `+=` instead of one of the statements that compiles to a `mulsd` and `cvttsd2si` — Peter Cordes, Jan 24 '22 at 07:16
Does this answer your question? [SSE register return with SSE disabled](https://stackoverflow.com/questions/1556142/sse-register-return-with-sse-disabled) — Tsyvarev, Jan 24 '22 at 07:28

score 2 · Answer 1 · answered Jan 24 '22 at 07:48

Linux compiles kernel code with -mgeneral-regs-only on x86, which produces this error in functions that do anything with FP or SIMD. (Except via inline asm, because then the compiler doesn't see the FP instructions, only the assembler does.)

From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error.

GCC optimizes whole functions when optimization is enabled, and you are using FP inside that function. You're doing FP multiply and truncating conversion to integer with your macro and assigning the result to an int, since the MCVE you eventually provided shows struct matrix containing double *data.

If you stop the compiler from using FP instructions (like Linux does by building with -mgeneral-regs-only), it refuses to compile your file instead of doing software floating-point.

The only odd thing is that it pins down the error to an integer += instead of one of the statements that compiles to a mulsd and cvttsd2si

If you disable optimization (-O0 -mgeneral-regs-only) you get a more obvious location for the same error (https://godbolt.org/z/Tv5nG6nd4):

<source>: In function 'void matmul(matrix*, matrix*, matrix*)':
<source>:9:33: error: SSE register return with SSE disabled
    9 | #define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
      |                            ~~~~~^~~~~~~~~~~~~~~
<source>:46:21: note: in expansion of macro 'DOUBLE_TO_FIXED'
   46 |                 a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
      |                     ^~~~~~~~~~~~~~~

If you really want to know what's going on with the GCC internals, you could dig into it with -fdump-tree-... options, e.g. on the Godbolt compiler explorer there's a dropdown for GCC Tree / RTL output that would let you look at the GIMPLE or RTL internal representation of your function's logic after various analyzer passes.

But if you just want to know whether there's a way to make this function work, no obviously not, unless you compile a file without -mgeneral-registers-only. All functions in a file compiled that way must only be called by callers that have used kernel_fpu_begin() before the call. (and kernel_fpu_end after).

You can't safely use kernel_fpu_begin inside a function compiled to allow it to use SSE / x87 registers; it might already have corrupted user-space FPU state before calling the function, after optimization. The symptom of getting this wrong is not a fault, it's corrupting user-space state, so don't assume that happens to work = correct. Also, depending on how GCC optimizes, the code-gen might be fine with your version, but might be broken with earlier or later GCC or clang versions. I somewhat expect that kernel_fpu_begin() at the top of this function would get called before the compiler did anything with FP instructions, but that doesn't mean it would be safe and correct.

See also Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?

Apparently -msse2 overrides -mgeneral-regs-only, so that's probably just an alias for -mno-mmx -mno-sse and whatever options disables x87. So you might be able to use __attribute__((target("sse2"))) on a function without changing build options for it, but that would be x86-specific. Of course, so is -mgeneral-regs-only. And there isn't a -mno-general-regs-only option to override the kernel's normal CFLAGS.

I don't have a specific suggestion for the best way to set up a build option if you really do think it's worth using kernel_fpu_begin at all, here (rather than using fixed-point the whole way through).

Obviously if you do save/restore the FPU state, you might as well use it for the loop instead of using FP to convert to fixed-point and back.

Thanks a lot, I guess I had the wrong idea about FP operations. I wasn't sure because conversions from double to fixed weren't pointed out by the compiler. I'll look into the correct usage of `kernel_fpu_begin` as you pointed out, but I guess for now using fixed-point the whole way through seems like simpler solution. — kelos, Jan 24 '22 at 18:16

What is causing this error: SSE register return with SSE disabled?

1 Answers1