Duplicate Code: Writing the Same Address on 50 Wedding Cards
Learn the Duplicate Code smell with a wedding card story. Understand DRY, the Rule of Three, and how Extract Method removes dangerous copy-paste code.
๐ Fifty wedding cards, one aching hand
Big news in the Sharma family of Jaipur โ Anjali didi is getting married! The whole house smells of laddoos. Relatives are calling from three states. And the wedding cards have just arrived from the printer, in a big cardboard box tied with golden thread. They look beautiful โ cream paper, red lettering, a little peacock embossed in the corner.
But there is one problem. Papa opens a card, reads it twice, and his face falls. The printer forgot to print the venue address. Fifty cards, and not one of them says where the wedding is.
There is no time to reprint. So Anjali's younger brother Kabir, fourteen years old and famous for his neat handwriting, is given the job. "Write the address on every card," says Papa. "All fifty of them. Neatly."
So Kabir sits down at the dining table with a blue pen. "Shubham Garden, Plot 14, Tonk Road, Jaipur โ 302018." He writes it once. Beautiful. Twice. Still good. Ten times. His hand starts aching. By card twenty, his little sister is watching cartoons and he wants to join her. By card thirty, his writing is getting lazy and slanted. On card thirty-four, he writes "Plot 41" instead of "Plot 14". On card forty-two, he forgets the PIN code entirely. He does not notice either mistake. Nobody checks fifty cards one by one โ who has the time?
Two weeks later, Sharma uncle's family reaches Plot 41 โ an empty plot with one sleeping dog โ in their full wedding clothes. They are very annoyed. They phone Papa from the empty plot. Papa apologises eleven times.
And then the worst news arrives: the venue changes. Shubham Garden got double-booked, and the wedding moves to "Mangal Vatika, Ajmer Road". Now somebody must find all fifty cards โ half of them already posted! โ and correct every single one by hand. It is impossible. The family ends up calling every guest one by one, and even then, two families show up at the old venue.
Now compare this with what the printer should have done in the first place: keep the address in one place โ the printing plate โ and stamp all fifty cards from it. One correction on the plate, and all fifty cards are correct. No aching hand, no Plot 41, no missing PIN code, no calling sixty relatives.
This is exactly the Duplicate Code smell. Writing the same thing by hand in many places feels simple, but every copy is a chance to make a mistake, and every future change must hunt down every copy.
๐ค What is this smell?
Duplicate Code is the same idea expressed in more than one place in a program. It may be an exact copy-paste, or two blocks that look slightly different but carry the same rule. Whenever that rule changes, every copy must be found and changed โ correctly โ or the copies start to disagree.
Duplicate Code is famous. It is the very first smell described in Martin Fowler's book Refactoring. Fowler and Kent Beck put it first for a reason: it is the most common smell and one of the most damaging.
Its cure is connected to a famous principle from the book The Pragmatic Programmer by Andy Hunt and Dave Thomas: DRY โ Don't Repeat Yourself. DRY says that every piece of knowledge in a system should have a single, authoritative home. The venue address is one piece of knowledge. It should live on one printing plate, not in fifty handwritten copies.
Read DRY carefully: it talks about knowledge, not text. Two blocks of code that look identical but represent different business rules are not true duplication โ they only look alike by accident today. True duplication is one rule living in many places. Merge rules, not lookalikes.
College corner: Research on code clones gives this smell a formal vocabulary. A Type-1 clone is an exact copy (whitespace aside). Type-2 renames identifiers but keeps structure. Type-3 adds or deletes a few statements. Type-4 is semantically equivalent code with different syntax โ the same job done a different way. Clone-detection tools (PMD CPD, SonarQube, jscpd, Simian) are reliable on Types 1โ2, weaker on Type 3, and nearly blind to Type 4. That asymmetry matters: the clones that tools cannot find are exactly the ones only a human reader who understands the meaning can catch. Studies on industrial codebases routinely measure 5โ20% cloned code, and find that inconsistent edits to clone groups are a significant source of defects.
Here is the whole territory in one map:
๐ How to spot it
Run through this checklist on any codebase:
- Two methods that look almost the same, differing only in one number, one type, or one called method.
- The same sequence of statements appearing in several subclasses of the same parent.
- A bug that was "fixed" but appears again somewhere else โ because a copy of the buggy code was never fixed.
- Team habit of "I copied the existing handler and changed two lines" for every new feature.
- The same constant, regex pattern, or formula typed by hand in multiple files.
- Parallel
if/switchladders in different files that all list the same set of cases.
Here is a quick table of duplication types, from easiest to hardest to see:
| Type | What it looks like | How visible? | Example |
|---|---|---|---|
| Exact copy | Same lines, character by character | Easy โ tools catch it | Copy-pasted validation block |
| Copy with renamed variables | Same logic, different names | Medium | total/sum, cust/customer |
| Same steps, different details | Same skeleton, one step differs | Hard | Domestic vs international billing |
| Same job, different algorithm | Two ways to compute one answer | Very hard | Loop in one file, formula in another |
| Knowledge duplication | One rule in code AND config AND docs | Hardest | GST rate in three layers |
The lower rows are the dangerous ones โ no tool can fully catch them. Only a reader who understands the meaning can say, "Wait, these two blocks are the same rule."
When teams audit where their duplication came from, the sources usually split like this:
โ ๏ธ Why it is a problem
Problem 1: Every change is multiplied. A rule in one place is changed once. A rule copied five times must be found five times and edited five times. Miss one, and your program now follows two different rules at the same time โ like fifty cards showing two different venues.
Problem 2: Copies disagree silently. Nobody announces "copy three is now different!" The mismatch hides until a customer hits it. Remember Sharma uncle at Plot 41 โ he discovered the bug in production, in his wedding clothes.
Problem 3: Bugs resurrect. You fix a bug in one copy and close the ticket. Months later the same bug walks in again through another copy. The team thinks the fix "did not work". Trust drops. Watch the resurrection happen:
Problem 4: The design becomes invisible. When one concept is named once and reused, the design announces itself: "this is the subtotal rule." When it is smeared across copies, every reader must rediscover that these blocks are the same thing. That is wasted brainpower, every day, for every reader.
Problem 5: The cost grows with every copy. The pain is not linear โ with more copies you spend time finding them, editing them, testing them, and double-checking you did not miss one:
And here is the slow life story of a single pasted block. Notice that drift never knocks on the door โ it just happens:
๐ป A real-life code example
Let us put the wedding card story into code. The family hires an event app to print and send invitations. A junior developer wrote it โ by copy-paste, of course.
// Smelly version: the "address rule" is hand-written in three places
class InvitationService {
printCard(guest: Guest): string {
const name = guest.title + " " + guest.firstName + " " + guest.lastName;
return (
"Dear " + name + ",\n" +
"You are invited!\n" +
"Venue: Shubham Garden, Plot 14, Tonk Road, Jaipur - 302018"
);
}
sendWhatsApp(guest: Guest): string {
const name = guest.title + " " + guest.firstName + " " + guest.lastName;
return (
"Namaste " + name + "! Wedding invitation: " +
"Venue: Shubham Garden, Plot 14, Tonk Rd, Jaipur 302018"
);
}
sendEmail(guest: Guest): string {
const name = guest.title + " " + guest.firstName + " " + guest.lastName;
return (
"Dear " + name + ", you are cordially invited. " +
"Venue: Shubham Gardens, Plot 14, Tonk Road, Jaipur - 302018"
);
}
}Look closely. The smell is everywhere:
- The guest name formula (
title + first + last) is written three times. If the family decides to add "ji" after every name, that is three edits. - The venue address is written three times โ and the copies already disagree! The WhatsApp version says "Tonk Rd", the email says "Shubham Gardens". The copies have drifted, exactly like Kabir's tired handwriting on card thirty-four.
- When the venue changes to Mangal Vatika, someone must find and fix all three โ and any fourth copy hiding in some other file.
๐งน Cleaning it up, step by step
Step 1: Find the knowledge. Ask: what pieces of knowledge are repeated here? Two of them: "how to write a guest's full name" and "what the venue address is".
Step 2: Give each piece one home. Use Extract Method for the name formula, and a single constant for the address. This is the printing plate.
Step 3: Make every caller use the one home. Replace each handwritten copy with a call.
// Clean version: one printing plate, many stamps
const VENUE_ADDRESS = "Shubham Garden, Plot 14, Tonk Road, Jaipur - 302018";
class InvitationService {
printCard(guest: Guest): string {
return `Dear ${this.fullName(guest)},\nYou are invited!\nVenue: ${VENUE_ADDRESS}`;
}
sendWhatsApp(guest: Guest): string {
return `Namaste ${this.fullName(guest)}! Wedding invitation: Venue: ${VENUE_ADDRESS}`;
}
sendEmail(guest: Guest): string {
return `Dear ${this.fullName(guest)}, you are cordially invited. Venue: ${VENUE_ADDRESS}`;
}
private fullName(guest: Guest): string {
return `${guest.title} ${guest.firstName} ${guest.lastName}`;
}
}Now the venue change is one edit. The name rule is one edit. The copies physically cannot drift apart, because there are no copies โ only one plate and many stamps. The cleaned structure looks like this:
Step 4: Choose the right tool for harder duplication. Our example was inside one class, so Extract Method was enough. But duplication lives in other places too, and each location has its own cure:
- Identical methods sitting in sibling subclasses? Move the method up to the parent with Pull Up Method.
- Subclass methods with the same steps but different details? Keep the skeleton in the parent and let children fill in the differing steps with Form Template Method.
- Copies scattered across unrelated classes that secretly share a concept? Give that concept its own home with Extract Class.
- Two blocks doing the same job in different ways? Pick the clearer way and replace both using Substitute Algorithm.
๐ฆ The same smell in C#
Two billing methods that are twins, except for one number:
// Before: the subtotal loop is duplicated; only the rate differs
public decimal DomesticTotal(List<Item> items)
{
decimal subtotal = 0;
foreach (var i in items) subtotal += i.Price * i.Quantity;
return subtotal + subtotal * 0.05m; // domestic shipping
}
public decimal InternationalTotal(List<Item> items)
{
decimal subtotal = 0;
foreach (var i in items) subtotal += i.Price * i.Quantity;
return subtotal + subtotal * 0.18m; // international shipping
}Extract the shared shape; let the difference become a parameter:
// After: one definition of subtotal, one definition of shipping
public decimal DomesticTotal(List<Item> items) => TotalWithShipping(items, 0.05m);
public decimal InternationalTotal(List<Item> items) => TotalWithShipping(items, 0.18m);
private static decimal TotalWithShipping(List<Item> items, decimal shippingRate)
{
var subtotal = items.Sum(i => i.Price * i.Quantity);
return subtotal + subtotal * shippingRate;
}Now "how a subtotal is computed" exists exactly once. If tomorrow the business says "ignore items with zero quantity", that is one edit, and domestic and international can never disagree about it.
A Python taste of the same medicine, because copy-paste speaks every language:
# Before: the same "clean phone number" rule, typed twice
def save_guest(name, phone):
phone = phone.replace(" ", "").replace("-", "")[-10:]
db.guests.insert(name, phone)
def send_invite_sms(phone, text):
phone = phone.replace(" ", "").replace("-", "")[-10:]
sms.send(phone, text)
# After: one rule, one home
def normalize_phone(phone: str) -> str:
return phone.replace(" ", "").replace("-", "")[-10:]
def save_guest(name, phone):
db.guests.insert(name, normalize_phone(phone))
def send_invite_sms(phone, text):
sms.send(normalize_phone(phone), text)๐ข Where this smell hides in real projects
- Validation rules copied between frontend and backend. The email regex lives in the React form and in the API controller โ slightly different in each. Users get "valid" on screen and "invalid" from the server.
- Copy-paste-driven feature development. "Make the new report? Just copy the old report handler and adjust." After ten reports, a bug in the shared logic needs ten fixes.
- The same formula in code and in SQL. Discount computed in the application and re-computed inside a database view. They drift; finance notices at year-end.
- Test code duplication. Twenty tests each building the same five-line test order by hand. One constructor change breaks all twenty.
- Cross-team duplication. Two teams each write their own "retry helper" in the same month because neither knew the other existed. Code search and shared libraries are the medicine.
- AI-generated code. Code assistants happily generate a fresh copy of logic instead of finding the existing helper. Review generated code for duplication just like human code.
College corner: There is a deeper systems argument here, from The Pragmatic Programmer: DRY violations break what the authors call the "single source of truth" property, and the failure mode is representational drift โ two representations of one fact evolving independently. This is the same root problem as cache invalidation, denormalized databases, and documentation rot. Whenever you intentionally duplicate knowledge (for performance, for decoupling deployments, for offline copies), you must also build a synchronization mechanism โ code generation from one schema, contract tests between frontend and backend, or a build step that derives one copy from the other. Duplication without synchronization is a time bomb; duplication with synchronization is an engineering decision.
โ๏ธ When it is okay to ignore
Here is the honest part. Not every repetition should be merged, and merging too early causes a different disease: the wrong abstraction.
| Situation | Merge the copies? | Why |
|---|---|---|
| Same business rule, copied | Yes | One rule must have one home |
| Looks similar, but changes for different reasons | No | Incidental duplication; merging couples strangers |
| Second occurrence, shape still unclear | Wait | Rule of Three: refactor at the third copy |
| Two-line fragment used in one or two places | Usually no | A tiny helper adds indirection, removes little |
| Tests that repeat for readability | Often no | A test should be readable alone, on one screen |
| Same constant in many files | Yes | Constants are cheap to centralize, drift is costly |
Two famous guidelines help you judge:
- The Rule of Three (popularized in Fowler's Refactoring, credited to Don Roberts): write it once. Copy it once, and just wince. When you need it a third time โ refactor. By then you have three real examples, so you can see the true shared shape instead of guessing it.
- Sandi Metz's warning: "Duplication is far cheaper than the wrong abstraction." If you merge two blocks that were only accidentally similar, you must later thread flags and parameters through the shared code to pull them apart again. That tangled "shared" code is worse than the honest copies were.
You can place any suspicious pair of code blocks on this chart and read off the decision:
Before merging two similar blocks, ask one question: "If the business changes one of these, must the other change too?" If yes โ same knowledge, merge them. If no โ they are strangers who happen to dress alike today. Let them stay separate, and do not feel guilty about it.
๐ ๏ธ Which refactorings cure it
| Where the duplication lives | Curing refactoring |
|---|---|
| Inside one class | Extract Method |
| Identical methods in sibling subclasses | Pull Up Method |
| Same steps, different details, in subclasses | Form Template Method |
| Scattered across unrelated classes | Extract Class |
| Same job done two different ways | Substitute Algorithm |
| Long methods grown from pasted blocks | Extract Method + Consolidate Duplicate Conditional Fragments |
๐ฆ Quick revision box
+--------------------------------------------------------------+
| DUPLICATE CODE โ QUICK REVISION |
+--------------------------------------------------------------+
| Story : Hand-writing one address on 50 wedding cards |
| instead of printing from one plate. |
| Smell : The same KNOWLEDGE living in many places. |
| Danger : Every change -> many edits; missed copy -> |
| silent disagreement -> bug found by a customer. |
| DRY : Every piece of knowledge has ONE home. |
| (The Pragmatic Programmer) |
| Rule of : 1st time write, 2nd time wince, |
| Three 3rd time refactor. |
| Caution : Lookalikes that change for different reasons |
| are NOT duplication. Wrong abstraction > copies. |
| Cures : Extract Method, Pull Up Method, Form Template |
| Method, Extract Class, Substitute Algorithm. |
+--------------------------------------------------------------+โ๏ธ Practice exercise
A school fee program has grown by copy-paste. Clean it up.
function tuitionFeeReceipt(student: Student): string {
let fee = 2000;
if (student.hasSibling) fee = fee - fee * 0.1; // sibling discount
if (student.isStaffChild) fee = fee - fee * 0.5; // staff discount
return "Receipt for " + student.name + ": Rs " + fee + " (Tuition)";
}
function busFeeReceipt(student: Student): string {
let fee = 800;
if (student.hasSibling) fee = fee - fee * 0.1;
if (student.isStaffChild) fee = fee - fee * 0.5;
return "Receipt for " + student.name + ": Rs " + fee + " (Bus)";
}
function labFeeReceipt(student: Student): string {
let fee = 500;
if (student.hasSibling) fee = fee - fee * 0.1;
if (student.isStaffChild) fee = fee - fee * 0.45; // <-- bug? or rule?
return "Receipt for " + student.name + ": Rs " + fee + " (Lab)";
}Your tasks:
- List the pieces of knowledge that are duplicated. (Hint: there are at least two โ the discount rules and the receipt format.)
- Extract a
applyDiscounts(fee, student)function and aformatReceipt(name, fee, feeType)function. Rewrite all three receipt functions as one-liners using them. - Investigate the
0.45inlabFeeReceipt. Is it a typo that drifted, or a real special rule? Write one sentence for each possibility, and explain what you would do in each case. (This is exactly the "copies disagree silently" problem from Figure 6.) - The school adds a fourth fee: library fee, Rs 300, same discounts. Add it. Count how many lines you needed. Compare with how many lines the copy-paste style would have needed โ then check your numbers against the cost curve in Figure 5.
- Bonus: a classmate suggests also merging
tuitionFeeReceiptfrom another school's program because "it looks the same". Use the Rule of Three, the wrong-abstraction warning, and the quadrant chart in Figure 10 to explain whether that is true duplication or incidental lookalikes.
If your final version changes the sibling discount in exactly one place, you have earned your DRY badge.
Frequently asked questions
- What is duplicate code in simple words?
- Duplicate code is the same idea written in more than one place. It may be an exact copy-paste, or two blocks that look slightly different but do the same job. The danger is that every future change must now be made correctly in every copy, and missing even one copy creates a bug.
- What is the DRY principle?
- DRY stands for Don't Repeat Yourself. It comes from the book The Pragmatic Programmer by Andy Hunt and Dave Thomas. It says every piece of knowledge in a system should have exactly one authoritative home. If a rule lives in one place, you change it once and it can never disagree with itself.
- What is the Rule of Three?
- It is a practical guideline popularized in Martin Fowler's Refactoring book: tolerate the first copy, take note at the second, and refactor when the third appears. By the third occurrence you can clearly see the real shared shape, so the abstraction you extract is more likely to be correct.
- Is all repeated-looking code really duplication?
- No. Two pieces of code that look alike today but change for different business reasons are only accidentally similar โ this is called incidental duplication. Merging them couples unrelated rules together, and later you must add flags and parameters to pull them apart. Duplication is cheaper than the wrong abstraction.
- Which refactorings remove duplicate code?
- Extract Method for copies inside one class, Pull Up Method for identical methods in sibling subclasses, Form Template Method when steps are the same but details differ, Extract Class when scattered copies hide a shared concept, and Substitute Algorithm when two different-looking blocks do the same job.
Further reading
Related Lessons
Long Method: When One Function Tries to Do Everything
Learn the Long Method code smell with simple stories, TypeScript and C# examples, and step-by-step refactoring using Extract Method. Beginner friendly guide.
Shotgun Surgery: One Small Change, Ten Offices to Visit
Learn the Shotgun Surgery code smell with an address-change story, simple definitions, TypeScript and C# examples, a clear comparison with Divergent Change, and practice.
Alternative Classes with Different Interfaces: Two Tiffin Services, Two Languages
Learn this code smell with a tiffin delivery story: two classes do the same job with different method names, so you cannot swap them. Fix it step by step.
Extract Method: Turn One Giant Function into Small Named Helpers
Learn Extract Method step by step. Pull a messy block out of a long function, give it a clear name, and make your code read like a clean to-do list.