1

I'm just getting into the native applications world and I can't figure out how to assign a pointer to part of the string.

For example I have the following:

std::string data = "put returns between paragraphsfor linebreak add 2 spaces at end_italic_ or **bold**indent code by 4 spacesbacktick escapes `like _so_`quote by placing > at start of line to make links";

First char of the word return has index 4 and the last has index 10. My question is how I can point to the word return for example, without copying it to new string.

Thanks.

EDIT:

struct Node { 
     const char * ptrToNodeStart; 
     int nodeLen;
}

I found this snippet, but can anyone please explain me how to use it.

Deepsy
  • 3,769
  • 7
  • 39
  • 71
  • You can point to the characters, but there won't be a C-string style `NULL` terminator there without modifying the original string. You're going to have to copy to make a self-contained string that contains only "return". – tadman Apr 15 '14 at 17:50
  • Well my texts contain 10k+ chars and I might need to do 1000+ links to single text, it won't be really memory efficient. I edited my question. – Deepsy Apr 15 '14 at 17:53
  • How many of these do you have? How many copies will you be creating? Some simple math might show it's not a big deal, that you might use megabytes of data at most. Your `Node` idea could work, but it's going to be a big hassle to make those work reliably. A lot of things can invalidate the results of [`c_str`](http://en.cppreference.com/w/cpp/string/basic_string/c_str), so your `const char*` pointer might end up in an invalid memory location if it's not kept in perfect sync. – tadman Apr 15 '14 at 18:04
  • For an implementation strategy, I'd start with something awful that works, then see how much memory it uses. If it's mostly acceptable, stick with it. If not, you might want to go down your `Node` path. I'd create a container object for both `const std::string` and a list of `Node` objects to co-exist in a safe environment. – tadman Apr 15 '14 at 18:07

2 Answers2

2

1. "memory efficient"

"my texts contain 10k+ chars and I might need to do 1000+ links to single text, it won't be really memory efficient" - let's say you will have 1000 strings of length 10 000 characters, that's 10 mil bytes = ~9.54MB + assuming some little overhead, this definitely won't consume more than 10MB of memory

2. pros and cons

Of course you can working with these "links" in form of structures, just spend a while thinking about why you are doing it. What advantages will this approach have? If memory efficiency is the only reason why you would do that, you are most likely entering the dungeons of premature optimization.

3. "how to assign a pointer to part of the string"

std::string str = "My simple string";
Node n;
n.ptrToNodeStart = &str[3];
n.nodeLen = 6;
// n is expected to refer to string "simple" here

but actually it's not that simple. str is object with automatic storage duration, once the execution leaves this scope, the memory where string was stored is freed, i.e. you have problem 1: lifetime ... if your string is gone and you still keep your Nodes, these become nothing but a bunch of invalid / dangling pointers.

problem 2: Since there is no null-terminating character, you are unable to treat this pointer in Node as a string. If you are going to need it, you are most likely about to end up with C-style error-prone code that will be doing evil stuff like:

std::string tempStr(n.nodeLen + 1, '\0');
memcpy(&tempStr[0], n.ptrToNodeStart, n.nodeLen);
// yehey !!! I can finally use this annoying Node as a normal string now ....

or even worse:

char *buffer = new char[n.nodeLen + 1];
memcpy(buffer, n.ptrToNodeStart, n.nodeLen);
buffer[n.nodeLen] = '\0';
// using buffer here ...
delete[] buffer;

My advices:
  • avoid using pointers if possible
  • stick to neat objects from the standard library
  • go for std::string objects, std::vector<std::string> stringParts ...
  • use std::string::substr():

    std::string str = "My simple string";
    std::string part = str.substr(3,6);
    // part holds string "simple" now
    
LihO
  • 41,190
  • 11
  • 99
  • 167
  • but why everytime I pass &str[3] to a function, when I cout the value I get the whole string, not just this char? – Deepsy Apr 16 '14 at 15:48
  • 1
    @Deepsy: Because that's the thing that I reffered to in my answer as "problem 2". String is not null terminated and that's why you have to abuse the `std::string`'s constructor and `memcpy` to its internal buffer, to retrieve the substring that you need... this whole thing is really bad design decision. – LihO Apr 16 '14 at 16:36
  • Hmm, so the last answer http://stackoverflow.com/questions/23072690/parse-simple-html-with-pure-c/23075222?noredirect=1#23075222 here is wrong? – Deepsy Apr 16 '14 at 16:40
  • 1
    @Deepsy: Well the linked answer is definitely not the ideal approach and definitely not a general "pattern" how to "optimize". Take your time to read what Steve Jessop wrote in his answer in that question. He came to the same conclusion that I did, that the most reasonable would probably be a vector of objects without optimizing it in advance at all (that's premature optimization...). – LihO Apr 16 '14 at 16:58
0

You could define an object that holds an pair of iterators pointing to the start and end in your big string. Then, give it a string-like interface. However, when the original string gets out of scope, the iterators will point to nirvana and your problem hopefully crashes, so using it can be dangerous.

class StringView {
public:
    typedef std::string::const_iterator iterator;
    typedef std::string::const_iterator const_iterator;
    typedef std::string::size_type size_type;

    StringView(std::string const& s, 
               size_type from, 
               size_type to): data_(&s), first_(from), last_(to) {}

    bool empty() const {return s.empty();}
    size_type length() const {return last_ - first_;}

    iterator begin() const {data_.begin() + first_;}
    iterator end() const {data_.begin() + last_;}

    // add more functions for strings here

private:
    std::string const* data_;
    std::string::size_type first_, last_;
};

When you start doing this, you could also have a look at Monoliths "Unstrung" which explains why the std::string class interface should be improved and how.

A string_view class is proposed for standardization. STLport also contains an example implementation.

Jens
  • 9,058
  • 2
  • 26
  • 43