move instruction from eax to es register gives error

Question

i cant figure out a way to move code from one location to other in memory

so i put in a way some thing like this but it doesn't work

extern _transfer_code_segment

 extern _kernel_segment

  extern _kernel_reloc


 extern _kernel_reloc_segment

  extern _kernel_para_size


    section .text16



    global transfer_to_kernel




transfer_to_kernel:



    ;cld

    ;
    ; Turn off interrupts -- the stack gets destroyed during this routine.
    ; kernel must set up its own stack.
    ;
    ;cli
    ; stack for only for this function

    push ebp
    mov ebp, esp








    mov eax, _kernel_segment             ; source segment
    mov ebx, _kernel_reloc_segment       ; dest segment
    mov ecx, _kernel_para_size

.loop:



    ; XXX: Will changing the segment registers this many times have
    ; acceptable performance?


    mov ds, eax  ;this the place where the error
    mov es, ebx  ; this to
    xor esi, esi
    xor edi, edi
    movsd
    movsd
    movsd
    movsd
    inc eax
    inc ebx
    dec ecx
    jnz .loop



    leave
    ret

do have any other way to do it or how can i solve this problem

Cody Gray - on strike · Answer 1 · 2017-07-10T04:28:21.960

The segment registers are all 16 bits in size. Compare that to the e?x registers, which are 32 bits in size. Obviously, these two things are not the same size, prompting your assembler to generate an "operand size mismatch" error—the sizes of the two operands do not match.

Presumably, you want to initialize the segment register with the lower 16 bits of the register, so you would do something like:

mov  ds, ax
mov  es, bx

Also, no, you don't actually need to initialize the segment registers on each iteration of the loop. What you're doing now is incrementing the segment and forcing the offset to 0, then copying 4 DWORDs. What you should be doing is leaving the segment alone and just incrementing the offset (which the MOVSD instruction does implicitly).

    mov eax, _kernel_segment             ; TODO: see why these segment values are not
    mov ebx, _kernel_reloc_segment       ;        already stored as 16 bit values
    mov ecx, _kernel_para_size

    mov ds, ax
    mov es, bx

    xor esi, esi
    xor edi, edi

.loop:

    movsd
    movsd
    movsd
    movsd

    dec  ecx
    jnz .loop

But note that adding the REP prefix to the MOVSD instruction would allow you to do this even more efficiently. This basically does MOVSD a total of ECX times. For example:

mov ds, ax
mov es, bx
xor esi, esi
xor edi, edi
shl ecx, 2         ; adjust size since we're doing 1 MOVSD for each ECX, rather than 4
rep movsd

Somewhat counter-intuitively, if your processor implements the ERMSB optimization (Intel Ivy Bridge and later), REP MOVSB may actually be faster than REP MOVSD, so you could do:

mov ds, ax
mov es, bx
xor esi, esi
xor edi, edi
shl ecx, 4
rep movsb

Finally, although you've commented out the CLD instruction in your code, you do need to have this in order to ensure that the moves happen according to plan. You cannot rely on the direction flag having a particular value; you need to initialize it yourself to the value that you want.

(Another alternative would be streaming SIMD instructions or even floating-point stores, neither of which would care about the direction flag. This has the advantage of increasing memory copy bandwidth because you'd be doing 64-bit, 128-bit, or larger copies at a time, but introduces other disadvantages. In a kernel, I'd stick with MOVSD/MOVSB unless you can prove isn't a significant bottleneck and/or you want to have optimized paths for different processors.)

_kernel_segment value is a 32 bit address so if i use lower 16 bits like you side in "mov ds, ax" that would be a problem — sakura, Jul 08 '17 at 16:57
Why is it a 32-bit value? Segments are only 16 bits. Well, I guess you'll have to do the segment arithmetic yourself; see: http://thestarman.pcministry.com/asm/debug/Segments.html — Cody Gray - on strike, Jul 08 '17 at 16:59
Although legitimate, 32 bit values `esi, edi and ecx` are of no consequence. Even if you did set ECX to 1FC00 let's say, pointers would just simply wrap around on every 65536 iteration or 32768 if moving words or 16384 if moving dwords. There is no way to circumvent this unless there is some logic to adjust segment registers accordingly. — Shift_Left, Jul 08 '17 at 17:00
what i am saying is that for example take 0xc0000000 even if i right shift it by 4 to remove last digit i end with a value that can't be stored in 16 bits — sakura, Jul 08 '17 at 17:14
`rep movsb` is only faster on Intel IvB and later. So you should probably define what you mean by "modern". `rep movsb` is significantly worse than `rep movsd` on Sandybridge and earlier, since the microcode is optimized for small copies. IDK what AMD does. — Peter Cordes, Jul 09 '17 at 11:59
Thanks, @Peter! In fact, I had used the weasel word "modern" because I'd forgotten exactly when that optimization was introduced. I was thinking circa Sandy Bridge, since Ivy Bridge was just a die-shrink, but I was too lazy to actually look it up. — Cody Gray - on strike, Jul 10 '17 at 04:30
Turns out that [`rep movsd` is also fast in practice on CPUs with ERMSB](https://stackoverflow.com/questions/42558907/why-is-stdfill0-slower-than-stdfill1/45018779?noredirect=1#comment77014359_45018779), so if it's easier to always use `rep movsd` there's no performance downside on real hardware. In a simple piece of software, that's certainly easier than doing CPU detection / dispatching. — Peter Cordes, Jul 10 '17 at 21:19

Peter Cordes · Accepted Answer · 2017-07-12T14:38:20.000

That will have horrible performance. Agner Fog says mov sr, r has one per 13 cycle throughput on Nehalem, and I'd guess that if anything it's worse on more recent CPUs since segmentation is obsolete. Agner stopped testing mov to/from segment register performance after Nehalem.

Are you doing this to let you copy more than 64kiB total? If so, at least copy a full 64kiB before changing a segment register.

I think you can use 32-bit addressing modes to avoid messing with segments, but segments that you set in 16-bit mode implicitly have a "limit" of 64k. (i.e. mov eax, [esi] is encodable in 16-bit mode, with an operand-size and address-size prefix. But with a value in esi of more than 0xFFFF, I think it would fault for violating the ds segment limit.) The the osdev link below for more.

As Cody says, use rep movsd to let the CPU use an optimized microcoded memcpy. (or rep movsb, but only on CPUs with the ERMSB feature. In practice, most CPUs that support ERMSB give the same performance benefit for rep movsd too, so it's probably easiest to just always use rep movsd. But IvyBridge might not.) It's much faster than separate movsd instructions (which are slower than separate mov loads/stores). A loop with SSE 16B vector loads/stores might go almost as fast as rep movsd on some CPUs, but you can't use AVX for 32B vectors in 16-bit mode.

Another option for big copies: huge unreal mode

In 32-bit protected mode, the values you put in segments are descriptors, not the actual segment base itself. mov es, ax triggers the CPU to use the value as an index into the GDT or LDT and get the segment base / limit from there.

If you do this in 32-bit mode and then switch back to 16-bit mode, you're in huge unreal mode with segments that can be larger than 64k. The segment base/limit/permissions stay cached until something writes a segment register in 16-bit mode and puts it back to the usual 16*seg with a 64k limit. (If I'm describing this correctly). See http://wiki.osdev.org/Unreal_Mode for more.

Then you may be able to use rep movsd in 16-bit mode with operand-size and address-size prefixes so you can copy more than 64kiB in one go.

This works well for ds and es, but interrupts will set cs:ip, so this isn't convenient for big flat code address space, just data.

thanks man, big real mode helped but when i am returning from the function it's not returning — sakura, Jul 09 '17 at 12:34
@sakura: I have near zero interest in obsolete 16-bit mode. I keep reading about it in SO questions, which is the only reason I was able to write this answer (glad it helped, BTW). All I can recommend is to get a good debugger, e.g. the one built-in to BOCHS, so you can single-step your kernel. — Peter Cordes, Jul 09 '17 at 12:36
After more details about this question have become clear, I can't help but wonder why you don't just switch into 32-bit protected mode *before* doing this copy. Presumably that's what the kernel is ultimately going to do anyway. Why would you want to switch back to real mode? Even in "unreal mode", as far as I know, you can't go past 1 MB, so this isn't exactly a silver-bullet solution. @sakura — Cody Gray - on strike, Jul 10 '17 at 04:33
@CodyGray: According to the wiki page, you can get things set up so you can use 32-bit addressing modes and access 4G of data, but still only 64k of code: "*The DS and ES segment registers are set to 0, so C pointers can work as flat 32-bit physical addresses and address data or memory-mapped devices anywhere in the first 4GB of memory.*" But yeah, seems like just switching to 32-bit mode would be more sensible. You don't have to enable paging right away if you don't want. — Peter Cordes, Jul 10 '17 at 10:50
@PeterCordes : You can set up a 16-bit code segment with a limit of 4gb (or size other than 64kb) just like a data segment. Unfortunately you have to deal with issues in such an environment like interrupts only saving CS:IP and not CS:EIP. The moniker for such an environment is usually _huge unreal mode_ — Michael Petch, Jul 12 '17 at 13:33
@MichaelPetch: Thanks for the correction. I have a bad habit of misrembering and/or making up names :P I meant to just turn a quick comment into a quick answer, but hopefully it's not too sloppy at this point. — Peter Cordes, Jul 12 '17 at 14:40

move instruction from eax to es register gives error

2 Answers2

Linked