5

I'd like to write a simple string split function.

The function should take one std::basic_string and a delimiter (possibly a CharT or std::basic_string), and put the result into a ContainerT.

My first try is

template <typename StringT, typename DelimiterT, typename ContainerT>
void split(
    const StringT &str, const DelimiterT &delimiters, ContainerT &conts) {
    conts.clear();
    std::size_t start = 0, end;
    std::size_t len = delimiters.size();
    while ((end = str.find(delimiters, start)) != StringT::npos) {
        if (end - start) {
            conts.emplace_back(str, start, end - start);
        }
        start = end + len;
    }

    if (start != StringT::npos && start < str.size()) {
        conts.emplace_back(str, start, str.size() - start);
    }
}

My final goal is to extend this function to achieve:

  1. The final results are always std::basic_string<CharT> put into some conts.
  2. The first argument str could be std::basic_string<CharT>, const CharT* or a string literal.
  3. The second argument delimiter could be a char, or a std::basic_string<CharT>/const CharT*/string literal, meaning that the length of the delimiter is greater than 1, e.g. split aaa,,bbb,c with ,, gives aaa/bbb,c.
  4. The third argument can be any sequence container from STL.

Since one usually deals with modern stings in C++, 2 may be std::basic_string<CharT> only for simplification.

Given that the function (template) can be overloaded, I wonder

  1. At least how many functions would I need in this situation?
  2. And what's the best practice to design such functions(How to write more generic functions)? For example, maybe to make the above function work with a const CharT* delimiter, the line std::size_t len = delimiters.size(); must be changed to some std::distance(...)?

Update:

A revalent code review is added here.

Saddle Point
  • 3,074
  • 4
  • 23
  • 33
  • 1
    I find this tricky because of all the potential combinations. I have tried many approaches. The current one I use has a **master** template function taking only iterators for the input string and for the delimiters (four iterators s_begin, s_end, d_begin, d_end). Then I have a whole lot of different overloads that convert their input parameters into that most basic form. – Galik May 27 '18 at 07:00
  • Have you seen the top answer to https://stackoverflow.com/questions/236129/the-most-elegant-way-to-iterate-the-words-of-a-string – KarlM May 27 '18 at 23:04

3 Answers3

3

You can use std::string_view for both text to be split and delimeter. Additionally, you can use template template parameter to choose type of elements in the result:

template<typename Char, template<typename> class Container, typename String>
Container<String> split_impl(std::basic_string_view<Char> text, std::basic_string_view<Char> delim)
{
    Container<String> result;
    //...
    result.push_back(String(text.substr(start, count)));
    //...
    return result;
}

template<template<typename> class Container, typename String = std::string_view>
Container<String> split(std::string_view text, std::string_view delim)
{ return split_impl<char, Container, String>(text, delim); }

template<template<typename> class Container, typename String = std::u16string_view>
Container<String> split(std::u16string_view text, std::u16string_view delim)
{ return split_impl<char16_t, Container, String>(text, delim); }

This way, it can be used with std::string, std::string_view and const char* without redundant allocations:

// vector of std::string_view objects:
auto words_1 = split<std::vector>("hello world", " ");

// list of std::string objects:
auto words_2 = split<std::list, std::string>(std::string("hello world"), " ");

// vector of std::u16string_view objects:
auto words_3 = split<std::vector>(u"hello world", u" ");

Edit: added overloads for char and char16_t

Edit 2

In code above, split_impl does actual work. split overloads are provided only to simplify user code, so that you don't have to explicitly specify character type to be used. It would be necessary without overloads, because compiler can't deduce Char when type of parameter is basic_string_view and you're passing an argument of different type (for example, const char* or std::wstring). In general, I think it isn't a big problem - probably, you want to have four overloads (char, char16_t, char32_t, wchar_t), if not less.

However, for completeness, here's an alternative that doesn't use overloads:

template<typename ContainerT, typename TextT, typename DelimT>
ContainerT split(const TextT& text, const DelimT& delim)
{
    using CharT = std::remove_reference_t<decltype(text[0])>;

    std::basic_string_view<CharT> textView(text);
    std::basic_string_view<CharT> delimView(delim);

    ContainerT result;

    // actual implementation, but using textView and delimView instead of text and delim

   result.push_back(textView.substr(start, count));

   return result;
}

// usage:
auto words = split<std::vector<std::string_view>>("some text", " ");

With this approach you cannot use default value of String template parameter, as above (because it would have to depend on TextT type). For this reason, I removed it. Also, this code assumes that text and delim use the same character type and can be converter to basic_string_view.

Personally, I prefer version 1. It doesn't use template types for function parameters, which is IMHO better, as it gives caller better idea about what should be passed in. In other words, interface of the first split is better specified. Also, as noted above, I don't consider having to add four overloads of split a problem.

joe_chip
  • 2,468
  • 1
  • 12
  • 23
  • Thanks for you answer. This version seems not care for `wstring/u16string/u32string`. While `string` can be converted to `string_view` implicitly, `basic_string` seems can not be converted `basic_string_view` when doing template argument deduction. – Saddle Point May 27 '18 at 07:33
  • I've updated answer to include overloads for char and char16_t. I think that solves the problem. – joe_chip May 27 '18 at 09:03
  • Is it possible to merge these `char/wchar_t/char16_t...` in some way? – Saddle Point May 27 '18 at 11:03
  • Updated again with one way I can think of. – joe_chip May 27 '18 at 20:22
1

Construct basic_string_view from the input strings, then operate on those. A basic_string_view has an explicit constructor taking a char*, and basic_string has a cast operator to basic_string_view.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • Should one use some `std::basic_string_view` as template function parameter? It seems that if passing a `std::string` it can't be converted implicitly which seems lose some convenience when passing parameters. – Saddle Point May 27 '18 at 07:02
  • `std::string` has a conversion operator for converting to `std::string_view` so this shouldn't be a problem. – joe_chip May 27 '18 at 07:29
0

my suggestion would be to use a two template parameters one for input string and one for output container, because in almost all cases, input string, deliminator and output container will be of same type so you can define your function something like this -

template<typename charT, typename Container)
void split(const std::basic_string<charT> input,
    const charT deliminator,
    Container<std::basic_string<chart>> &cont)

for 2nd scenario your deliminator can be of std::basic_string<charT> type.

Vikash Kesarwani
  • 850
  • 5
  • 14