0

I'd like to "equate" two arrays, where one is inside a fixed union (should not be changed). Instead of using memcpy, I'd simply point the head of myUnion.RawBytes to the head of array. But the compiler throws an error for the myUnion.RawBytes = &array[0]; assignmet. Why is this so? Is there any way I can circumvent this problem?

The faulty code below tries to illustrate this.

#include <stdio.h>

typedef union{
    unsigned char  RawBytes[2];
    unsigned short RawWord;
} MyUnion;

int main(){
    MyUnion myUnion;

    char array[2] = {1, 1};
    myUnion.RawBytes = &array[0];

    printf("%d", myUnion.RawWord);

    return 0;
}

Error:

main.c: In function ‘main’:
main.c:12:22: error: assignment to expression with array type
     myUnion.RawBytes = &array[0];
hat
  • 781
  • 2
  • 14
  • 25
davidanderle
  • 638
  • 6
  • 12
  • 2
    You cannot copy array contents using assignment operator, use `memcpy`,`forloop` or `pointer`. – KBlr Sep 07 '18 at 11:39
  • Yes, but I don't need a "true" copy. I simply want the two arrays to point to the same location, not to 2 separate blocks of memory. – davidanderle Sep 07 '18 at 11:41
  • 1
    Arrays do not "point" they are allocated by linker like vairables. Pointers do point. Change to a pointer, point to the first element in the array. Many things you can do wiht arrays will be possible then. – Yunnosch Sep 07 '18 at 11:43
  • thought about that but that changes the size of my union (from 2 to 8), and so RawWord returns a false value. – davidanderle Sep 07 '18 at 11:52

3 Answers3

1

The correct way of union punning.

#include <stdio.h>

typedef union{
    unsigned char  RawBytes[2];
    unsigned short RawWord;
} MyUnion;

int main(){
    MyUnion myUnion;

    char array[2] = {1, 1};
    myUnion.RawBytes[0] = array[0];
    myUnion.RawBytes[1] = array[1];

    printf("%d", myUnion.RawWord);

    return 0;
}
0___________
  • 60,014
  • 4
  • 34
  • 74
  • This is the very same thing as memcpy though. – Lundin Sep 07 '18 at 11:57
  • @Lundin there is no other way of union punning without breaking the aliasing rules – 0___________ Sep 07 '18 at 11:58
  • That's not entirely clear to me. As proven in comments to another answer, `myUnion = (MyUnion*)array; ... *myUnion` is just fine and doesn't break strict aliasing. The question is if the same could be said about `myUnion = (MyUnion*)array; ... myUnion->RawWord`, where the lvalue access is of type `unsigned short`. Probably not. Then what about `myUnion = (MyUnion*)array; ... MyUnion foo = *myUnion; ... foo->RawWord` . I believe that would be well-defined. – Lundin Sep 07 '18 at 14:14
  • @P__J__: The Standard imposes no requirements on what happens if code uses an lvlaue of type `unsigned short` to access an object of type `MyUnion`. The authors of the Standard likely thought it sufficiently obvious how any quality implementation should behave that there was no need to add condescending language to that effect. On the other hand, the fact that it should be obvious how implementations should behave in some case doesn't mean some compiler writers won't have other ideas. That can happen even when the rationale *says* how common implementations are expected to behave. – supercat Sep 07 '18 at 20:19
1

I read the question as can one take any array of 2 characters and interpreter its value as an unsigned short without copying, by using this clever union trick, and the answer is no, you can't.

The reason is not that of strict aliasing, but that it can brealk alignment requirements. Almost all platforms have the alignment requirement of at least 2 for unsigned short. Behaviour is undefined if a pointer is being converted to another that doesn't have the fundamental alignment requirement.

Yes, this can crash on x86. Forget about being able to access unaligned objects with machine language - you're programming in C, not in assembly.


The correct way to do this is to use memcpy which will tell the compiler that the access can be unaligned, i.e.

char array[2] = {1, 1};
uint16_t raw_word;
memcpy(&raw_word, array, 2);

Do note that memcpy is a standard library function and the compiler is allowed to generate any kind of machine code for as long as that behaves as if the memcpy function from standard library was called.

  • I am doing this is Embedded C, with a well-known architecture and without an underlying OS – davidanderle Sep 07 '18 at 12:22
  • @davidanderle and you're using a *C* compiler. There is no language called "Embedded C". Better read the compiler manual thoroughly. Many have been wrong before. – Antti Haapala -- Слава Україні Sep 07 '18 at 12:29
  • Yes, that is true. I'll dig deeper – davidanderle Sep 07 '18 at 12:33
  • Well, in embedded there's a whole lot of 8 and 16 bitters that have no alignment requirements at all, so the code will work just fine there, as far as alignment is concerned. – Lundin Sep 07 '18 at 14:11
  • @Lundin it still comes from the C compiler, not the platform. – Antti Haapala -- Слава Україні Sep 07 '18 at 14:39
  • @AnttiHaapala: Any quality compiler that is designed and intended to be suitable for embedded-systems programming will define behaviors beyond those required by the Standard; in cases where the Standard doesn't mandate particular behaviors, but the execution environment documents them, a quality implementation designed for low-level programming will either expose that characteristic behavior of the environment or document a good reason for doing otherwise. The authors of the Standard were well aware of this, but thought compiler writers would realize it as well without having to be told. – supercat Sep 07 '18 at 20:14
-1

To solve your purpose you can use below approach.

Note: Below approach doesn't follow strict aliasing rule.

#include <stdio.h>

typedef union{
    unsigned char  RawBytes[2];
    unsigned short RawWord;
} MyUnion;

int main(){
    MyUnion *myUnion;

    unsigned char array[2] = {1, 1};
    myUnion = &array;

    printf("%d", myUnion->RawWord);
    printf("\n%d %d", myUnion->RawBytes[0], myUnion->RawBytes[1]);

    return 0;
}

I strictly recommend you to have array inside union and use memcpy or for loop.

KBlr
  • 312
  • 1
  • 11
  • 1
    You must use a cast `myUnion = (MyUnion*)&array;`. And your remark about strict aliasing is not correct: `MyUnion` is a union type that includes an `unsigned char[2]` among its members. _However_, those who aren't sure about how strict aliasing works should definitely not use this method. – Lundin Sep 07 '18 at 11:54
  • 1
    This approach will break pointer alignment rules. `MyUnion` and `unsigned char` possibly have different alignment requirements. (In particular, the `unsigned short RawWord` member possibly has different alignment requirements.) The pointers are simply not compatible. – Ian Abbott Sep 07 '18 at 11:57
  • @Lundin actually I am not sure that the standard says that this is OK, when it talks about type punning it talks about *storing* into an union object, but here an array[2] of `unsigned char` is typepunned into an union first. – Antti Haapala -- Слава Україні Sep 07 '18 at 11:59
  • @Lundin perhaps you should write a QA about this :D Indeed it seems that 6.5.7 allows this. – Antti Haapala -- Слава Україні Sep 07 '18 at 12:02
  • @AnttiHaapala Seems pretty clear? `unsigned char array[2]` is _an object_ that _shall have its stored value accessed only by an lvalue expression_ such as `myUnion->...` _that has one of the following types:_ ... _an aggregate or union type that includes one of the aforementioned types among its members_. Italics being taken from 6.5 in the standard. Or are you saying that the lvalue should only be access by for example `*myUnion` rather than `myUnion->...`? – Lundin Sep 07 '18 at 12:05
  • There is still one more thing: it can break alignment, i.e. it is still nono. – Antti Haapala -- Слава Україні Sep 07 '18 at 12:06
  • I guess if `myUnion->RawWord` is a lvalue expression of type `unsigned short`, it _could_ be a strict aliasing violation, as this is not the union type? – Lundin Sep 07 '18 at 12:07
  • @KBlr, the solution you provided indeed works, but I am not entirely sure what potential errors its usage would impose. I must add that `array` would never be used again after the `myUnion = &array;`. Also, for clarity, I think using `myUnion = (MyUnion*)array;` is a better interpretation. – davidanderle Sep 07 '18 at 12:48
  • @Lundin: Given something like `union blob {uint32_t w[4]; uint16_t h[8];} blob; uint32_t i,j,temp;` gcc won't recognize the possibility that the statements `temp=*(blob.w+i);` and `*(blob.h+j)=2;` might access the same part of `blob` unless type-based aliasing analysis is disabled, rendering the rules moot. – supercat Sep 07 '18 at 20:05