2

For 8086 it is possible to override the segment of the source index SI in order to use ES instead of DS. In a book (the old Scanlon) I found this MASM code:

LEA SI,ES:HERE
LEA DI,ES:THERE
MOVSB

As LEA retrieves just the OFFSET of a memory address (16 bits for the 8086), how MOVSB knows that SI refers to the ES segment and not the DS segment? Is LEA changing the default segment for SI? I have not read anything about that in the many pages and manuals I found.

Claus
  • 93
  • 5
  • 3
    It doesn't. This code looks wrong. – fuz Jan 10 '22 at 23:59
  • The "effective address" is the offset part of a seg:off addressing mode, so ES overrides have exactly zero effect on the value that LEA puts into the destination register. You can specify segment overrides for LEA as padding instead of later NOPs, if you want to align some later code. – Peter Cordes Jan 11 '22 at 01:39

2 Answers2

4

That code looks wrong. Without a segment override prefix, movsb will use DS:SI and ES:DI always. Unless you have to worry about errata of ancient processors you can make this code work by giving a segment override prefix to movsb. es:MOVSB will tell it to use ES:SI rather than DS:SI. movsb always copies to ES:DI; no segment override prefix will change it.

The code could actually be right if DS is guaranteed to equal ES at this location. The old assemblers had their own ides of things and sometimes funny segment overrides had to be used to keep the assembler happy.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 1
    To be clear, `movsb` does take a segment override, but of course the code in the question put it onto the `lea` which is indeed pointless (unless maybe as a hint to an assembler which uses `assume`). – Jester Jan 11 '22 at 01:16
  • @Jester: If you can determine which processor added it that would improve this answer. My 16 bit manual from ages ago says no. – Joshua Jan 11 '22 at 01:18
  • Does it explicitly say no? Interesting. Does it say yes for other instructions? :) – Jester Jan 11 '22 at 01:25
  • @Jester: It does. Although I just found my other book says yes, so I wonder if there was some CPU erratum somewhere that I just don't know about or if the first book is wrong. – Joshua Jan 11 '22 at 01:26
  • I found this: http://matthieu.benoit.free.fr/cross/data_sheets/8086_family_Users_Manual.pdf This says on page 2-13 _"the only exception is the destination operand of a string instruction which must be in the extra segment"_ – Jester Jan 11 '22 at 01:34
  • 3
    The only weirdness I'm aware of with segment overrides for string instructions is that 8086 (or some early 8086?) would save CS:IP for interrupt-return on the last prefix, not first, so `rep es movsb` would resume as `es movsb`. Which you could handle with a `loop` instruction around it or something, as long as you had prefixes in that order, not the disastrous other order. Maybe you're remembering reading that segment overrides were unusable for rep-string instructions on 8086 because of this design flaw? Other than that, they follow the manual, allowing override of DS:SI. – Peter Cordes Jan 11 '22 at 01:36
  • @PeterCordes: That would certainly explain why the one book said no, because there is a case where it didn't work. – Joshua Jan 11 '22 at 01:39
  • 1
    `es movsb` always works; it's only with a 2nd prefix (e.g. `rep`) that there's a problem. If your book was talking about `rep`-string stuff, that would be it. Or if they glossed over the issue instead of explaining the details, yeah they might just say that segment prefixes didn't work with string instructions in general. Fun fact: the 8086 doesn't remember instruction starts at all; there are no cases where it actually pushes the instruction start on a faulting insn. (IIRC, #DE pushes the end of the faulting divide, unlike later x86 where it pushes the CS:IP of the div) – Peter Cordes Jan 11 '22 at 01:47
  • @PeterCordes: I'm going with the author of the one book had encountered it not working due to data corruption for the very reason you mentioned and couldn't figure it out and listed it as not working. – Joshua Jan 11 '22 at 01:49
2

I have installed a MASM6.11 in a DOSBOX and did some experiments. Here is the memory map pf the data segments:

 0000               dseg segment para public 'data'
 0000 41 42 43 44   src db 'ABCD'
 0004               dseg ends

 0000               eseg segment para public 'data'
 0000 5A 5A 5A 5A   dummy db 'ZZZZ'
 0004 31 32 33 34   dst db '1234'
 0008               eseg ends

 0000               cseg segment para public 'code'
                    assume cs:cseg, ds:dseg, es:eseg

The results are that the code:

LEA SI,ES:HERE
LEA DI,ES:THERE
MOVSB

is wrong: the segments are not considered at all, it copies from DS to ES in any case (OP-CODE is A4):

8D 36 0000 R
8D 3E 0004 R
A4

In order to achieve a copy from ES to ES you write:

LEA SI,ES:HERE
LEA DI,ES:THERE
MOVS ES:THERE, ES:HERE

which translates to:

8D 36 0000 R
8D 3E 0004 R
26: A4

Syntaxes ES MOVSB and ES:MOVSB I read in the answers do not work with MASM 6.11 (but they actually corresponds to what it is translated to: 26 is the code for ES).

Claus
  • 93
  • 5
  • 1
    Oh weird, ASM86 ignores the (useless) `es:` override for LEA, not emitting a useless `26` prefix like one might expect from that source syntax. (And like NASM does in practice for `lea si, [es: foo]`). Does ASM86 complain if you *don't* use an `ES:` with LEA for those symbols, given whatever `assume` settings make sense? – Peter Cordes Jan 12 '22 at 02:04
  • 1
    @Peter, in my experiments I see that LEA in this case ignores the segment overrides as the assembler knows where each label is located, after removing them from the code I get the same hex codes. But if I use just the offsets, MASM complains about lines in the form LEA SI,[00], whille it does not complain for these LEA SI,ES:[00]. – Claus Jan 12 '22 at 17:33
  • Oh, the latter is because of MASM's weird rules for `[]` - [Confusing brackets in MASM32](https://stackoverflow.com/q/25129743) - `[0]` is parsed as an immediate `0` because MASM's syntax design is IMO insane. You could fix that with `byte ptr [0]` I think. So this is all with MASM, not [ASM86](https://winworldpc.com/product/intel-asm86-macro-assembler/31)? You should tag your question and fix the title to say MASM, not ASM86, if that's the case. (I wasn't sure if you were just also trying this with MASM as well as ASM86.) – Peter Cordes Jan 12 '22 at 17:51
  • "as the assembler knows where each label is located," - more like, the assembler knows that LEA ignores segment overrides in machine code. I'd guess that `ES:symbol` maybe means offset of the symbol wrt. the segment that it's assuming for ES, but I haven't checked the manual. (And hopefully it doesn't matter for anyone in the future; if you want to write programs larger than 64k, it's usually easier or at least more efficient to switch to protected mode or write for a modern OS.) – Peter Cordes Jan 12 '22 at 17:55
  • Sorry Peter I ask for your expertise to understand some points. 1) Why you say " the assembler knows that LEA ignores segment overrides in machine code"? I see that `8d 36` is the code for `LEA esi,[esi]` and `8d 3e` is for `LEA edi,[esi]`. It seems to me that LEA encodes segment overrides in its OPCODES. 2) The offset of `[4]` is `4`, and actually `LEA si,[4]` puts `4` into `SI`, like `MOV SI, 4` (that's an immediate 4). Is `byte ptr [0]` just an immediate offset 0 in this case? 3) I checked and can say your assumption about `ES:symbol` is correct. Thank you. – Claus Jan 12 '22 at 18:37
  • (1) no, `8d` is the opcode, the following byte is ModRM that encodes the destination register and the source addressing mode. `36` and `3e` in *that* position are still just ModRM bytes. The fact that those bytes are prefixes when they come before the opcode is completely irrelevant. Just like `mov eax, 0x90909090` has nothing to do with NOPs, even though it involves four `0x90` bytes. x86 machine code is a byte stream that's not self-synchronizing, and context matters. It's just total coincidence that the ModRM encoding for an EDI destination and an `[esi]` source is `3e`, not DS. – Peter Cordes Jan 12 '22 at 18:56
  • (2) yes, `lea si, byte ptr [4]` is a less efficient way to do the same thing as `mov si, 4`. In MASM syntax, `byte ptr [4]` is one way to specify that a numeric literal should be treated as a memory address, rather than an immediate. (4) LEA can't take an immediate, only a memory addressing mode. e.g. `mov si, word ptr [0]` would load from `ds:0`, while `mov si, 0` would set SI = 0 without accessing memory at all. i.e. that source operand doesn't involve an effective address, so it isn't a valid source for an LEA. – Peter Cordes Jan 12 '22 at 18:58