Code of the Day
AdvancedString Algorithms

Hashing strings

Use character frequency counters to check anagrams, group anagrams with a sorted-tuple key, and understand Python's O(n) string hashing cost.

CS FundamentalsAdvanced10 min read
By the end of this lesson you will be able to:
  • Implement anagram checking with character frequency comparison
  • Group anagrams using a sorted-tuple key in a defaultdict
  • Explain why Python string hashing is O(n)

Hashing turns a string into a compact key so that lookups, membership tests, and grouping become O(1) per query (amortised). The catch: building the hash value in the first place requires reading the entire string, so the initial cost is O(n). Understanding that distinction prevents subtle performance mistakes.

Character frequency as a fingerprint

Two strings are anagrams if and only if they contain exactly the same characters in the same quantities. A Counter (frequency map) captures this perfectly.

Python — editable, runs in your browser

Why tuple(sorted(s)) works as a key

Sorting a string produces its characters in alphabetical order. Any two anagrams produce the same sorted sequence. Wrapping it in a tuple makes it hashable so it can be used as a dict key. The cost per word of length m is O(m log m) for sorting.

An alternative is a frozen Counter: frozenset(Counter(s).items()). This is also O(m) but slightly more verbose.

Python interns (caches) the hash of str objects, so if you put a string into a set or dict and then look it up again, the second lookup does not re-hash. But string concatenation (s + t) produces a new object whose hash has not been computed yet, so it always pays the O(n) cost. Be careful when building keys by concatenation inside tight loops.

Where to go next

Next: lab — string problems — three exercises combining sliding window, two-pointer, and hashing: character replacement, palindrome checking on raw input, and finding all anagram positions.

Finished reading? Mark it complete to track your progress.

On this page