Hashing strings
Use character frequency counters to check anagrams, group anagrams with a sorted-tuple key, and understand Python's O(n) string hashing cost.
- Implement anagram checking with character frequency comparison
- Group anagrams using a sorted-tuple key in a defaultdict
- Explain why Python string hashing is O(n)
Hashing turns a string into a compact key so that lookups, membership tests, and grouping become O(1) per query (amortised). The catch: building the hash value in the first place requires reading the entire string, so the initial cost is O(n). Understanding that distinction prevents subtle performance mistakes.
Character frequency as a fingerprint
Two strings are anagrams if and only if they contain exactly the same characters
in the same quantities. A Counter (frequency map) captures this perfectly.
Why tuple(sorted(s)) works as a key
Sorting a string produces its characters in alphabetical order. Any two
anagrams produce the same sorted sequence. Wrapping it in a tuple makes it
hashable so it can be used as a dict key. The cost per word of length m is
O(m log m) for sorting.
An alternative is a frozen Counter: frozenset(Counter(s).items()). This
is also O(m) but slightly more verbose.
Python interns (caches) the hash of str objects, so if you put a string
into a set or dict and then look it up again, the second lookup does not
re-hash. But string concatenation (s + t) produces a new object whose hash
has not been computed yet, so it always pays the O(n) cost. Be careful when
building keys by concatenation inside tight loops.
Where to go next
Next: lab — string problems — three exercises combining sliding window, two-pointer, and hashing: character replacement, palindrome checking on raw input, and finding all anagram positions.
Two-pointer on strings
Use two pointers moving inward from both ends, understand palindrome checking as the canonical example, and contrast with sliding window.
Lab: string problems
Three guided exercises combining sliding window, two-pointer, and hashing — character replacement, palindrome check, and anagram search.