Assembly string instructions register DS and ES in real mode

Question

I have been studying this assembly program from my book and I have a question about it. The purpose of this program is to simply copy string1 to string2. My question relates to the following two instructions:

mov    AX,DS        
mov    ES,AX

I see without them, the program doesn't work properly, but I would have thought by pointing ESI to string1 and EDI to string2, that would be all you need to do. Then just increment ESI and EDI and move it character by character. What exactly does DS hold and why do we need to move it to ES?

.DATA
string1    db    'The original string',0
strLen     EQU   $ - string1
.UDATA
string2    resb    80
.CODE
    .STARTUP
    mov    AX,DS          ; set up ES
    mov    ES,AX          ;  to the data segment
    mov    ECX,strLen     ; strLen includes NULL
    mov    ESI,string1
    mov    EDI,string2
    cld                   ; forward direction
    rep    movsb

Is this 16-bit code? If it is you really should use CX, DS, DI. And I assume you mean ` mov ESI,offset string1` and ` mov EDI,string1`? — Michael Petch, Nov 30 '17 at 05:38
So that will be 16-bit real mode code. You need to copy DS to ES since `movsb` requires a full segment:offset for source and destination. A segment and offset combine together to form a physical address. Since string1 and string2 are in the same data segment (using NASM/IO.MAC and building for DOS) you need to initialize ES to DS. _DS_ gets set by the `.STARTUP` macro defined in `IO.MAC`. — Michael Petch, Nov 30 '17 at 05:58
More on segment:offset addressing and how they relate to physical addresses can be found here: [starmans real mode segmentation](http://thestarman.pcministry.com/asm/debug/Segments.html) — Michael Petch, Nov 30 '17 at 05:59
If you are using Windows, DOSBox and NASM I highly doubt that you need to copy DS to ES. In that environment you'd making a COM program and at startup CS=DS=ES=SS. You'd also have to have an ORG 0x100 statement and including IO.MAC.Did you actually test this code. As it is its not complete. I'd like to see your entire program not just the snippet you show here. As it stands the `mov AX,DS` `mov ES,AX` shouldn't be needed unless you clobbered ES elsewhere. — Michael Petch, Nov 30 '17 at 15:03

Peter Cordes · Accepted Answer · 2017-11-30T21:38:11.867

All the string instructions that use EDI use ES:EDI. (or di or rdi)

Explicit addressing modes using EDI (like [edi]) default to DS, but movs/stos/scas/cmps (with/without rep/repz/nz) all use es:edi. lods only uses ds:esi. (rep lods "works", but is rarely useful. With cx=0 or 1 it can work as a slow conditional load, because unlike loop, rep checks cx before decrementing.)

Note that even though scas is read-only, it uses (r|e)di. This makes it work well with lods: load from one array with lods, the scas to compare against a different array. (Optionally with some kind of processing of (r|e)ax before the compare).

Normally when you can use 32-bit addresses, you have a flat memory model where all segments have the same base and limit. Or if you're making a .COM flat binary with NASM, you have the tiny real-mode memory model where all segments have the same value. See @MichaelPetch's comments on this answer and on the question. If your program doesn't work without setting ES, you're doing something weird. (like maybe clobbering es somewhere?)

Note that rep movsb in 16-bit mode without an address-size prefix uses CX, DS:SI, and ES:DI, regardless of whether you used operand-size prefixes to write edi instead of di.

Also note that rep string instructions (and especially the non-rep versions) are **often not the fastest way to do things. They're good for code-size, but often slower than SSE/AVX loops.

rep stos and rep movs have fast microcoded implementation that store or copy in chunks of 16 or 32 bytes (or 64 bytes on Skylake-AVX512?). See Enhanced REP MOVSB for memcpy. With 32-byte aligned pointers and medium to large buffer sizes, they can be as fast as optimized AVX loops. With sizes below 128 or 256 bytes on modern CPUs, or unaligned pointers, AVX copy loops typically win. Intel's optimization manual has a section on this.

But repne cmpsb is definitely not the fastest way to implement memcmp: use SSE2 or AVX2 SIMD compares (pcmpeqb), because the microcode still only compares a byte at a time. (Beware of reading past the end of the buffer, especially avoid crossing a page (or preferably cache line) boundary.) Anyway, repne / repe don't have "fast strings" optimizations in Intel or AMD CPUs, unfortunately.

Just-To-Confuse-Op-With-More-Information: the "source"-like part `ds:si` of string instructions can be overridden by segment prefix opcode, i.e `es lodsb` will load byte from `[es:si]`. But the "destination"-like is hard-wired to `es:di` only, segment prefix like `ds stosb` will be ignored. — Ped7g, Nov 30 '17 at 11:48
The SIMD compares have the disadvantage that they may read past the end of the memory area unless you are extra careful when writing your code. This is reasonably annoying to deal with. — fuz, Nov 30 '17 at 12:30
I think a better question is if the user is using the environment he is in why did he need to copy DS to ES at all. NASM will generate COM programs, not EXEs and a COM program starts with CS=DS=ES=SS. Personally the question doesn't pass a sniff test and it isn't a minimal complete example. The code posted won't compile as is (since it needs io.mac) and to work properly an `org 0x100` directive. — Michael Petch, Nov 30 '17 at 15:12
@MichaelPetch: Updated with a section about the OP's code being weird. The basic question about (e|r)di always using ES with string instructions is still what they're asking about, but your point is what they maybe should be asking about but aren't :P — Peter Cordes, Nov 30 '17 at 21:33
No Peter, the OP specifically says in his question _I see without them, the program doesn't work properly,_ . The reality is that if that is what he's observing then that is wrong. Period.It makes me wonder if he actually attempted to assemble the code. If he had added `%include "io.mac"` and an `org 0x100` directive to his file (assuming he has IO.MAC which his code is based on) - then it should work without copying to ES. Since NASM can only directly generate COM programs then ES=DS=CS=SS is what DOS does before control is transferred to the program. — Michael Petch, Nov 30 '17 at 21:35

Assembly string instructions register DS and ES in real mode

1 Answers1