C++ - Strings & Text Processing
Overview
Estimated time: 60–80 minutes
Manipulate text with std::string and std::string_view. Learn searching, replacing, trimming, splitting, and best practices for performance.
Learning Objectives
- Use std::string and std::string_view effectively.
- Implement trim and split helpers and understand typical pitfalls (encoding, lifetime).
- Choose between copying strings and viewing them.
Prerequisites
std::string basics
#include <string>
#include <iostream>
int main(){
std::string s = "hello";
s += " world";
std::cout << s.size() << " " << s << "\n";
}
Expected Output: 11 hello world
string_view for non-owning views
#include <string_view>
#include <string>
#include <iostream>
void print_sv(std::string_view sv){ std::cout << sv << "\n"; }
int main(){
std::string s = "example";
print_sv(s); // ok, view into s
print_sv("literal"); // ok, view into literal
}
Find and replace
#include <string>
#include <iostream>
int main(){
std::string s = "bananarama";
auto pos = s.find("ana"); // 1
if (pos != std::string::npos) s.replace(pos, 3, "ANA");
std::cout << s << "\n"; // bANArama
}
Trim helpers
#include <string>
#include <algorithm>
#include <cctype>
static inline void trim_inplace(std::string& s){
auto not_space = [](unsigned char ch){ return !std::isspace(ch); };
s.erase(s.begin(), std::find_if(s.begin(), s.end(), not_space));
s.erase(std::find_if(s.rbegin(), s.rend(), not_space).base(), s.end());
}
Split into tokens
#include <vector>
#include <string>
#include <iostream>
std::vector<std::string> split(const std::string& s, char delim){
std::vector<std::string> out; std::string cur;
for (char ch : s) {
if (ch == delim){ out.push_back(cur); cur.clear(); }
else cur.push_back(ch);
}
out.push_back(cur);
return out;
}
int main(){
for (auto& t : split("a,b,,c", ',')) std::cout << '[' << t << "]\n";
}
Expected Output:
[a]
[b]
[]
[c]
Beginner Boosters
#include <string>
#include <iostream>
#include <algorithm>
#include <cctype>
int main(){
std::string s = " Hello ";
// trim and lowercase
auto lower = [](unsigned char c){ return char(std::tolower(c)); };
s.erase(0, s.find_first_not_of(" \t\n\r"));
s.erase(s.find_last_not_of(" \t\n\r")+1);
std::transform(s.begin(), s.end(), s.begin(), lower);
std::cout << s << "\n"; // hello
}
Common Pitfalls
- string_view does not own data; do not return a view to a temporary.
- Trimming and case conversion are byte-oriented; Unicode needs specialized libraries.
Checks for Understanding
- When is string_view preferable over string?
- How do you avoid returning a dangling view?
Show answers
- When you only need to read from an existing string or literal without copying.
- Ensure the underlying data outlives the view; otherwise return a string (by value).
Exercises
- Write a split function that returns string_views referencing the original string; discuss lifetime constraints.
- Write a replace_all function that replaces all occurrences of a substring.