2

Do string-literals have a particular type in C# (like const char* in c++) or does C# just create a new string object for each string-literal that appears in a program ? I am curious to understand what happens behind the scene when the following statement is executed:

string s1 = "OldValue";

does this call a particular method in the string class (a constructor, or an impicit conversion operator, ...) or does C# create a new string object that contains "OldValue" ( then just assign its reference to s1, just like it would for any reference type ) ?

i am trying to understand what it is in the design of the string class that garantees the value of s1 remains "OldValue":

string s2 = s1;
s2 = "NewValue";
quetzalcoatl
  • 32,194
  • 8
  • 68
  • 107
sam95
  • 99
  • 2
  • 5
  • 1
    http://stackoverflow.com/questions/4286614/c-sharp-do-string-literals-get-optimised-by-the-compiler – Tim Schmelter Jun 29 '15 at 11:45
  • 1
    Do you know what is [reference types](https://msdn.microsoft.com/en-US/library/490f96s2.aspx)? – Sinatr Jun 29 '15 at 11:48
  • @Sinatr : yes I know what ref-types are. just trying to understand : when i write s2 = "NewValue" , does it mean that a string object with value "NewValue" was created, and then a refernce to it is assigned to s2 ? it seems to be the case, judging from what quetzalcoatl's answer below – sam95 Jun 30 '15 at 13:39

3 Answers3

6

To you last question, why values were preserved - it is not in the String class. It is in the way that object references work.

The String class is not a value type, it is a reference type. It is a full-featured object that is not copied-around when "passed intto/from variables".

When you write:

string s1 = "mom";
string s2 = s1;
string s3 = s1;
s3 = "dad";

there is only one instance of "mom", that is first created somewhere in the heap, then a reference to it is assigned to s1. Then another reference is created and assigned to s2. Then another reference is created and assigned to s3. No copies. References. Like for any real, normal CLR object.

Finally, in the last line, another string is created on the heap and then a reference to it is assigned to the s3 variable. Note that this sentence says absoltely nothing about the "mom" or s1/s2 variables. They didn't note a thing.

Remember that String is not a value-type. It is just an normal immutable object that has some handy Equals and GetHashCode overrides. String class has some little magic inside, but it is not relevant here.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
quetzalcoatl
  • 32,194
  • 8
  • 68
  • 107
  • 1
    Nice answer, in looking up details for this question I discovered something I didn't know. The CLR implements a flyweight pattern as far as different string literals of an exact same value are concerned. Duplicate string literals are actually assigned to the same reference object. – Antonio Haley Jun 29 '15 at 11:56
  • 1
    @AntonioHaley That's called string interning. You can invoke it yourself using [`string.Intern()`](https://msdn.microsoft.com/en-us/library/system.string.intern(v=vs.110).aspx) – Yuval Itzchakov Jun 29 '15 at 12:01
  • @quetzalcoatl : thanks ! so regarding my first question : string s1 ="mom" means that a string object with value "mom" is first created, then s1 is set to "point" to it...so string literals are treated just like normal string objects (meaning everything time a string literal is mentioned in the code, a string object is created to contain it). – sam95 Jun 30 '15 at 13:43
  • @sam95: you start to get it. but no, it is not like that. Strings literals are (or may be) **interned automatically**. This means that in `var s1 = "mom"; var s2 = "mom"; var s3 = "mom"` and even in completely different class `var tadam = "mom"` the compiler will notice that the same exact stringliteral exists 4 times and will create **only one** string-literal-entry in the compiled assembly. Later when executing, the runtime notices that a string-from-literal-table is used and will **not create duplicates** but will reuse the same string object whenever that literal-entry was referenced. – quetzalcoatl Jun 30 '15 at 15:02
  • @sam95: the reason is simple: `string` objects are assumed and designed to be immutable. Hence, there is no real reason **not to intern** strings that are hardcoded in the code. Since they are written directly in code, they **will** be used at some point of time and they will be always the same and since they are 'code' you want them to be *fast*. Other strings created at runtime from user/file/network/etc data, can be **larger** and **their contents wary much** hence there's no automatic interning of them. It would spam the intern pool too much, so careful with `string.Intern()` method. – quetzalcoatl Jun 30 '15 at 15:02
  • And one more thing: string interning **has nothing to do** with original question of `var s1="mom"; var s2 = s1; s2 = "dad"`. Here literals "mom" and "dad" exist only once, so you'd not even notice any interning. The "preservation" and "single-instance" of "mom" object here is completely due to the way the code is written: one fetch-literal("mom"), reference-assign, reference-assign, one fetch-literal("dad"), reference-assign. – quetzalcoatl Jun 30 '15 at 15:08
  • In the answer I written "in the last line, another string is created on the heap" very loosely to make you focus on **references vs. objects**. The fact that the string was not actually "created" at that point and only a ref-to-intern-pool was used was irrelevant. Moreover, IIRC, the interning might be skipped at all for some very small strings, but I wouldn't be digging there now. In (100%-epsilon) of cases it's completly transparent to you.. – quetzalcoatl Jun 30 '15 at 15:12
0

Good question.

Actually in c# strings are stored in the format of buffer array where in every string declaration required 20bytes to store data and post that 2 bytes for each character. so whenever you declare any string for e.g. string s1 = 'Bhushan'; then on string buffer will be created and will have memory requirements as follows,

Bytes required for Data (Overhead) : 20 Bytes 2 bytes per character so (2 * 7) : 14 Bytes Overall it will required 20 + 14 = 34 Bytes.

0

string is an immutable class, that means every time we change the value, it will create new instance.

string s2 = s1;
s2 = "NewValue";

it can be explain like this.

string s2 = s1;
s2 = new string("NewValue"); // It doesn't compile, just an example.

And for string modification, it can be explained like this.

string s = "blah";
s.Insert(0, "blah"); // s is a new instance

The same like:

string s = "blah";
s = new string("blah") + new string("blah"); // Doesn't compile, just an explanation
Yohanes Nurcahyo
  • 601
  • 8
  • 19