C++ - Strings & Text Processing

Overview

Estimated time: 60–80 minutes

Manipulate text with std::string and std::string_view. Learn searching, replacing, trimming, splitting, and best practices for performance.

Learning Objectives

  • Use std::string and std::string_view effectively.
  • Implement trim and split helpers and understand typical pitfalls (encoding, lifetime).
  • Choose between copying strings and viewing them.

Prerequisites

std::string basics

#include <string>
#include <iostream>
int main(){
  std::string s = "hello";
  s += " world";
  std::cout << s.size() << " " << s << "\n";
}

Expected Output: 11 hello world

string_view for non-owning views

#include <string_view>
#include <string>
#include <iostream>
void print_sv(std::string_view sv){ std::cout << sv << "\n"; }
int main(){
  std::string s = "example";
  print_sv(s);           // ok, view into s
  print_sv("literal");   // ok, view into literal
}

Find and replace

#include <string>
#include <iostream>
int main(){
  std::string s = "bananarama";
  auto pos = s.find("ana");     // 1
  if (pos != std::string::npos) s.replace(pos, 3, "ANA");
  std::cout << s << "\n"; // bANArama
}

Trim helpers

#include <string>
#include <algorithm>
#include <cctype>
static inline void trim_inplace(std::string& s){
  auto not_space = [](unsigned char ch){ return !std::isspace(ch); };
  s.erase(s.begin(), std::find_if(s.begin(), s.end(), not_space));
  s.erase(std::find_if(s.rbegin(), s.rend(), not_space).base(), s.end());
}

Split into tokens

#include <vector>
#include <string>
#include <iostream>
std::vector<std::string> split(const std::string& s, char delim){
  std::vector<std::string> out; std::string cur;
  for (char ch : s) {
    if (ch == delim){ out.push_back(cur); cur.clear(); }
    else cur.push_back(ch);
  }
  out.push_back(cur);
  return out;
}
int main(){
  for (auto& t : split("a,b,,c", ',')) std::cout << '[' << t << "]\n";
}

Expected Output: [a] [b] [] [c]

Beginner Boosters

#include <string>
#include <iostream>
#include <algorithm>
#include <cctype>
int main(){
  std::string s = "  Hello  ";
  // trim and lowercase
  auto lower = [](unsigned char c){ return char(std::tolower(c)); };
  s.erase(0, s.find_first_not_of(" \t\n\r"));
  s.erase(s.find_last_not_of(" \t\n\r")+1);
  std::transform(s.begin(), s.end(), s.begin(), lower);
  std::cout << s << "\n"; // hello
}

Common Pitfalls

  • string_view does not own data; do not return a view to a temporary.
  • Trimming and case conversion are byte-oriented; Unicode needs specialized libraries.

Checks for Understanding

  1. When is string_view preferable over string?
  2. How do you avoid returning a dangling view?
Show answers
  1. When you only need to read from an existing string or literal without copying.
  2. Ensure the underlying data outlives the view; otherwise return a string (by value).

Exercises

  1. Write a split function that returns string_views referencing the original string; discuss lifetime constraints.
  2. Write a replace_all function that replaces all occurrences of a substring.