2

I am trying to write a DOS clone in 16-bit real mode, although once I am done with the current issue, I may just learn 32-bit assembly instead. My questions have been negatively recieved in the past, but this one last time, I'm afraid I do need to consult this site. Improving upon my previous question, I have made a bit more effort to learn more about pointers and the stack in my assembly code.
Apparently, the instruction cmpsb compares two strings, located in ES and DS respectively. I am trying to move the value of INPUT_STRING to DS and my value for shutmsg to ES, which seem like the correct registers (these are variables declared before). My instructions seem OK and they compile fine, but it doesn't work when I type shutdown, and when I run it through GDB, all it shows is

0x0000fff0 in ?? ()

I don't know what's wrong. I really don't. Here is my code:

prompts:
    mov si, prompt
    call prints
    call scans
    push ds
    mov ds, [INPUT_STRING]
    mov es, [shutmsg]  
    mov cx, 0xFFFF
    cld
    repe cmpsb
    pop ds
    je shutdown
    mov si, newline
    call prints
    mov si, INPUT_STRING
    call prints
    mov si, newline
    call prints
    jmp prompts

Thanks in advance and sorry if this is once again a bad question.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Safal Aryal
  • 155
  • 1
  • 8
  • 4
    DS and ES are _segment_ registers, only half the pointer. The full pointers to the two strings are DS:SI and ES:DI – cHao Nov 10 '17 at 15:43
  • 1
    Where did you read that `cmpsb` compares two strings "located in ES and DS respectively"? Seems like you needed to keep reading. – davmac Nov 10 '17 at 15:43
  • @cHao Hmmm... when tried, the assembler throws 'invalid combination of opcode and operands' – Safal Aryal Nov 10 '17 at 15:44
  • @davmac I think on an online documentation website somewhere, however, checking again, I see it's DS:SI and ES:DI – Safal Aryal Nov 10 '17 at 15:45
  • ....Tried what? – cHao Nov 10 '17 at 15:46
  • @cHao `mov ds:si, [INPUT_STRING]` and `mov es:di, [shutmsg]` – Safal Aryal Nov 10 '17 at 15:46
  • 2
    @SafalAryal I do wish you the best, but you're not going to get far with this if you skim-read stuff and jump to stackoverflow every time you hit the slightest obstacle. You need to do your own research. – davmac Nov 10 '17 at 15:47
  • @davmac Ok.. I'm sorry... – Safal Aryal Nov 10 '17 at 15:49
  • if you write `mov al,[bx]` the CPU will load byte from address `ds:bx` (`ds` is default for addressing by `bx`). If you use `mov al,[es:bx]`, the nasm will add "es" prefix to the same instruction, making CPU to fetch byte value form address `es:bx`. `mov al,[bp]` loads by default byte from address `ss:bp` (stack segment is default for `bp`) => lot of tiny details, which you should remember, or check in docs. Valid 16b addressing: https://stackoverflow.com/a/12474190/4271923 ... – Ped7g Nov 10 '17 at 15:54
  • @SafalAryal: You can't set both parts at once with `mov`. You have to set each part separately, or use `lds` and `les` to load a pointer from memory. – cHao Nov 10 '17 at 15:54
  • and it looks like you don't understand the segment:offset 20 bit memory addressing: http://thestarman.pcministry.com/asm/debug/Segments.html ... finally in your code you set at the start `ds` to `0x2000`, which looks like your data segment = probably correct, if linker script defines is as that. Then as both buffer are in data segment, you can load `es` with the same value (in the same way as you set `ds`). Then to load the offset part just `mov di,shutmsg` is enough. ... actually I'm getting less and less sure about that 0x2000 data segment... hm hm. – Ped7g Nov 10 '17 at 15:56

1 Answers1

4

In fact cmpsb compares two strings which are pointed to by ES:DI and DS:SI respectively. By ES:DI I mean the pair of registers ES and DI, where ES will hold the segment address and DI will hold the offset address. (Do you understand segmented addressing? It is not clear from your question).

Looking at your code there are several things not right, for instance:

mov ds, [INPUT_STRING]

... would load the value in the memory location INPUT_STRING into the ds register. First, it's not clear from your question whether INPUT_STRING really contains a pointer to the string or the string itself, but I suspect (based on your subsequent code) that it is the location of the string itself. So, not only are you loading the wrong value, but you're only loading ds and not si.

To load the correct value into ds depends on the segment address of the string and again that's not apparent in the question (so it is difficult to give you useful advice). If it's the same as the code segment, which is likely for a small program, you can set es and ds to the same value as cs, something like:

mov ax, cs
mov ds, ax
mov es, ax

(But again, that's just an example. The correct solution depends on the rest of your code). You would then need to load the si and di registers:

mov si, INPUT_STRING
mov di, shut_msg

You are loading cx with 0xFFFF:

mov cx, 0xFFFF

Why? are the strings exactly 0xFFFF bytes long? If not you will compare more than just the length of the strings and get a bogus result.

In summary:

  • you need to understand the segmented addressing model, and the purpose of the segment registers
  • you need to understand the difference between string values and string pointers, and between values and pointers generally
  • you need to understand how strings are stored in memory
  • you need to read and understand the instruction behaviour/semantics
prl
  • 11,716
  • 2
  • 13
  • 31
davmac
  • 20,150
  • 1
  • 40
  • 68
  • Thanks! According to a previous question I asked, both values are pointers to the string. If you want to (by no means do, I can sort this out now) my full code is https://github.com/safsom/kerryos. And I was told on another SO question (not one of my own) that C-style strings created by my input function have that length – Safal Aryal Nov 10 '17 at 16:08
  • Ok, I'm reading on the segmented addressing model now. I do need to learn this. – Safal Aryal Nov 10 '17 at 16:10
  • @SafalAryal from your code I see that `INPUT_STRING` is a label to a reserved area of bytes. So, `mov si, INPUT_STRING` loads `si` with a pointer to that area, yes. But `mov si, [INPUT_STRING]` does something else - it loads a word-sized value from that memory area. If `INPUT_STRING` is a variable, then we would _not_ say that it is a pointer. However different people might phrase this slightly differently. – davmac Nov 10 '17 at 16:11
  • @SafalAryal in your case `mov si, INPUT_STRING` is correct, yes. – davmac Nov 10 '17 at 16:13
  • 1
    @SafalAryal I am glad to help. However: there are plenty of code examples out there - read them, understand them. They will help you even more. – davmac Nov 10 '17 at 16:15
  • 1
    @SafalAryal also: If `scans` is your input function, it most certainly does not always create strings of length `0xffff`. If you type 'shutdown` and that is then terminated by a nul, how could the length be 0xffff? It is 8, or 9 if you count the nul terminator. – davmac Nov 10 '17 at 16:22
  • 1
    So I will put 9 in the cx register! – Safal Aryal Nov 11 '17 at 00:32