Data Class: The Register With No Rules — Anyone Can Scribble Anything
Learn the Data Class smell with a society register story. See why data without behavior breaks encapsulation, and when DTOs and records are perfectly fine.
📓 The register where anyone scribbles anything
Sagar Apartments in Mumbai keeps a visitor register at the gate. It is just a plain notebook with columns drawn in ballpoint pen: name, flat number, time in, time out. No rules. No watchman checking entries — the watchman, Ganpat bhai, is usually helping someone park. The pen hangs on a string, and whoever comes writes whatever they like.
The society secretary, Mr. Kulkarni, is proud of this register. "Full record of everyone who enters," he tells the committee. "Complete security."
After one month, look inside the register:
- One visitor wrote his name as "guest". Just "guest".
- Someone entered flat number 1403 — the building has only 8 floors.
- A delivery boy wrote his time-in as 25:70.
- One full row is blank except a doodle of a cat.
- A teenager named Rohan went back and quietly changed his friend's exit time from 11 pm to 9 pm so nobody would know they came back late from the cricket match.
Then, one Tuesday, a bicycle goes missing from the parking. Mr. Kulkarni marches to the gate, opens the register to check who visited that evening — and finds it useless. "guest" visited flat 1403 at 25:70. The data is garbage, because the register had no rules. It would happily accept anything, from anyone, at any time, and let anyone change anything afterwards. The notebook stored data; it never protected data.
Compare this with the bank just down the road. Try writing "guest" as your name on a bank form. Try depositing to account 1403 when accounts have 11 digits. The clerk stops you instantly, at the counter, before the wrong data ever enters the books. The bank's "register" has a guardian who enforces rules at the moment of writing — so the stored data can always be trusted. That is the whole difference between the two notebooks: not the paper, the guardian.
In code, a class that is only fields plus getters and setters — and no rules, no behavior, no guardian — is the visitor notebook. This is the Data Class smell.
🤔 What is this smell?
A Data Class is a class that holds fields and accessors but no behavior. It cannot validate itself, calculate anything about itself, or protect itself. All the thinking about its data is done by other classes, scattered across the codebase — forcing everyone else to know its rules and to repeat them.
Martin Fowler describes a system-wide version of this smell on his bliki: the Anemic Domain Model. The objects look like a real domain model — Order, Customer, Invoice — but they are bloodless. Just bags of getters and setters. All the logic lives in big procedural "service" classes that pull data out of the bags, think for them, and stuff results back in. Fowler calls this an anti-pattern because it pays all the costs of object-orientation and collects none of its benefits.
The principle being broken has a memorable name: Tell, Don't Ask. Healthy object design says: tell the object what you want (order.total()), and let it use its own data. Anemic design instead asks for the raw data (order.lines, order.discountRate) and computes outside — in five different places, five slightly different ways.
A class is the natural guardian of its own data. The moment data and the rules about that data live in the same class, the rules are enforced in exactly one place and can never be bypassed. The moment they live apart, every caller becomes responsible for remembering the rules — and someone always forgets.
One thing right away, because it matters: not every behavior-free class is a smell. DTOs, records, and view models are meant to be plain data, and we will give them their full honest defense later in this post. The smell is specifically a domain object that should own behavior but does not.
College corner: The deep idea here is the invariant — a condition that must hold for an object during its whole lifetime ("discount is between 0 and 1", "exit time is after entry time"). Encapsulation, in its serious meaning, is not "make fields private and add getters" — it is making invariants impossible to break from outside. A data class has private fields but zero invariants, so it is encapsulated only in syntax, not in substance. In Domain-Driven Design this becomes the aggregate pattern: an aggregate root (like Order) is the single entry point that guards the invariants of everything inside it. The visitor register with a clerk is an aggregate; the notebook on a string is not.
Here is the territory in one map:
🔍 How to spot it
Checklist for your code review:
- A class with only public fields, or private fields wrapped in mechanical getters and setters that any IDE could generate.
- Logic that operates on the class's data lives everywhere except inside the class.
- The same validation or calculation on this class's fields is repeated in several callers.
- Collections handed out raw through getters, so outsiders can add or remove items behind the class's back.
- The class travels through the whole codebase, but only ever to have its fields read or poked.
| Question | Smelly answer | Healthy answer |
|---|---|---|
| Who computes the order total? | Every caller, separately | order.total(), once |
| Who checks the discount is between 0 and 1? | Hopefully someone, somewhere | The applyDiscount method, always |
| Can an outsider empty the lines list? | Yes — order.lines is the real list | No — read-only view + addLine() |
| Can the object ever hold nonsense values? | Yes, any field, any value | No — guarded at every entry point |
| Where do I read to learn the rules? | Every caller in the codebase | The class itself, one file |
A useful sorting tool: place any data-holding class on this chart. The danger zone is "has real rules to guard" plus "guards nothing".
⚠️ Why it is a problem
Problem 1: Encapsulation collapses. When anyone can set any field to any value, the class cannot protect its own correctness. A -500 rupee price, a flat 1403, a 25:70 time — the class accepts them all silently. Correctness now depends on every caller remembering every rule, forever.
Problem 2: Rules get duplicated. The "discount must be 0 to 1" rule gets written in the order screen, the admin screen, and the import job. Three copies. When the rule changes to "maximum 0.5", you must find all three — this is the Duplicate Code smell being born directly from the Data Class smell.
Problem 3: The data has no single explanation. To understand what discountRate means and how it may legally change, you must read every place that touches it. With a behavior-owning class, you read one file.
Problem 4: Leaked internals invite back-stabbing. A getter that returns the real internal list lets any caller do order.lines.clear() from anywhere. The order is corrupted, and the stack trace points nowhere near the culprit. Rohan editing his friend's exit time is exactly this: write access to internals, no audit, no guard.
Problem 5: It feeds Feature Envy. The behavior that should live on the data class must live somewhere — so it squats in services and helpers, enviously poking at the data class's fields all day. Data Class and Feature Envy are two sides of the same coin.
Watch the moment garbage enters, in slow motion. Notice that the object never objects:
The damage also grows with the number of places that touch the data. Each new caller is one more place where a rule can be forgotten:
With a rich class, that line stays flat at 1 — forever, no matter how many callers arrive. That flat line is the entire argument for this refactoring.
💻 A real-life code example
The society finally digitizes its visitor register. The first version copies the notebook faithfully — including its lawlessness.
// Smelly version: the digital notebook with no rules
class VisitorEntry {
name = "";
flatNumber = 0;
inTimeMinutes = 0; // minutes since midnight
outTimeMinutes = 0;
}
class VisitorRegister {
entries: VisitorEntry[] = [];
}
// gate screen, somewhere:
const e = new VisitorEntry();
e.name = prompt("Name?") ?? "";
e.flatNumber = Number(prompt("Flat?"));
e.inTimeMinutes = nowInMinutes();
register.entries.push(e);
// security report, in another file:
function visitDuration(e: VisitorEntry): number {
return e.outTimeMinutes - e.inTimeMinutes; // negative if out < in!
}
// admin panel, in a third file:
function isStillInside(e: VisitorEntry): boolean {
return e.outTimeMinutes === 0; // "0 means not exited"... says who?
}
// and a prank, from anywhere at all:
register.entries.length = 0; // entire register wiped, silentlyEvery notebook disaster is now possible in code:
e.name = ""— the "guest"/blank-row problem. Nobody checks.e.flatNumber = 1403— flats go 101 to 804, but the class accepts anything.visitDurationcan go negative;isStillInsideinvents a secret rule ("0 means not exited") that lives only in one caller's head.register.entriesis the real array, so anyone can wipe it — Rohan editing exit times, now with one line of code.- Each caller carries its own private understanding of the rules. They already disagree.
🧹 Cleaning it up, step by step
We hire a guardian. Step by step, the notebook becomes a bank register.
Step 1: Encapsulate Field. Make fields private and force all writing through methods that check the rules. Construction itself should refuse garbage.
Step 2: Move Method. visitDuration and isStillInside use only VisitorEntry's data — classic Feature Envy. Move them home, onto the class.
Step 3: Encapsulate Collection. The register exposes a read-only view and offers checkIn/checkOut methods. The raw array becomes untouchable.
// Clean version: the register now has a guardian
class VisitorEntry {
private outTime: number | null = null;
constructor(
private readonly name: string,
private readonly flatNumber: number,
private readonly inTime: number,
) {
if (name.trim().length < 2) throw new Error("Real name required");
if (!isValidFlat(flatNumber)) throw new Error(`No such flat: ${flatNumber}`);
if (inTime < 0 || inTime >= 1440) throw new Error("Invalid time");
}
checkOut(outTime: number): void {
if (this.outTime !== null) throw new Error("Already checked out");
if (outTime < this.inTime) throw new Error("Exit before entry? No.");
this.outTime = outTime;
}
isStillInside(): boolean {
return this.outTime === null; // the rule, stated once, clearly
}
visitDurationMinutes(): number | null {
return this.outTime === null ? null : this.outTime - this.inTime;
}
}
class VisitorRegister {
private readonly entries: VisitorEntry[] = [];
get allEntries(): ReadonlyArray<VisitorEntry> {
return this.entries; // a view, not the real thing
}
checkIn(name: string, flatNumber: number, inTime: number): VisitorEntry {
const entry = new VisitorEntry(name, flatNumber, inTime);
this.entries.push(entry);
return entry;
}
}Look what changed:
- A
VisitorEntrywith a blank name or flat 1403 cannot exist. The constructor is the clerk at the bank counter. - "Still inside" has exactly one definition —
outTime === null— written once, inside the class, instead of a secret0convention in some caller. - A negative duration is impossible; exit-before-entry is rejected at the door.
register.entries.length = 0no longer compiles. The prankster is out of business.- Callers now follow Tell, Don't Ask: they ask
entry.visitDurationMinutes()instead of pulling fields and computing.
The structure after the operation:
There is also a nice way to see the entry itself as a tiny machine. The rich class makes illegal jumps impossible:
In the anemic version, every one of those "rejected" arrows was an open door.
🟦 The same smell in C#
The classic anemic order, exactly as it appears in a thousand real codebases:
// Before: anemic data holder; callers do its thinking
public class Order
{
public List<OrderLine> Lines { get; set; }
public decimal DiscountRate { get; set; }
}
// far away, in some service:
decimal total = 0;
foreach (var line in order.Lines)
total += line.UnitPrice * line.Quantity;
total -= total * order.DiscountRate; // duplicated wherever a total is neededMove the behavior home and lock the doors:
// After: the class owns the rules about its own data
public class Order
{
private readonly List<OrderLine> _lines = new();
public IReadOnlyList<OrderLine> Lines => _lines; // no outside mutation
public decimal DiscountRate { get; private set; }
public void AddLine(OrderLine line) => _lines.Add(line);
public void ApplyDiscount(decimal rate)
{
if (rate is < 0 or > 1)
throw new ArgumentOutOfRangeException(nameof(rate));
DiscountRate = rate;
}
public decimal Total()
{
var subtotal = _lines.Sum(l => l.UnitPrice * l.Quantity);
return subtotal - subtotal * DiscountRate;
}
}
// every caller, everywhere:
decimal total = order.Total();The total rule exists once. An illegal discount cannot be set. Nobody can clear the lines behind the order's back. This journey — from anemic to rich — is the heart of domain-driven design.
And in Python, where @dataclass makes plain data easy — which is wonderful at boundaries and risky in the domain:
# Fine as a boundary DTO: plain by design
from dataclasses import dataclass
@dataclass(frozen=True)
class VisitorSummaryDto:
name: str
flat_number: int
# Rich in the domain: the guardian pattern
class VisitorEntry:
def __init__(self, name: str, flat_number: int, in_time: int):
if len(name.strip()) < 2:
raise ValueError("Real name required")
if not is_valid_flat(flat_number):
raise ValueError(f"No such flat: {flat_number}")
self._name = name
self._flat = flat_number
self._in_time = in_time
self._out_time: int | None = None
def check_out(self, out_time: int) -> None:
if self._out_time is not None:
raise ValueError("Already checked out")
if out_time < self._in_time:
raise ValueError("Exit before entry? No.")
self._out_time = out_timeCollege corner: Notice the architectural pattern hiding in that Python snippet: plain at the boundary, smart at the core. In hexagonal/clean architecture terms, DTOs live in the adapter layer (they mirror JSON, database rows, message formats), while invariant-guarding entities live in the domain layer. CQRS pushes this further: write-side models are rich (they must guard invariants during changes), while read-side projections are deliberately anemic (they only ever display). So "is a data class a smell?" has a precise architectural answer: it depends which layer you are standing in. The same shape that is correct in an adapter is a disease in the domain.
🏢 Where this smell hides in real projects
- Layered "enterprise" architectures taken too far. The culture of "entities are just data; all logic goes in the service layer" mass-produces anemic models. The service layer swells into thousand-line procedural scripts while entities stay bloodless.
- ORM entities used as the domain model. Database mapping tools historically wanted public getters and setters on everything, training a generation of developers to hollow out their entities and never look back.
- IDE-generated accessor reflex. Create fields, press the generate-getters-setters shortcut, done. The class is born anemic, and behavior gets written wherever the developer happens to be standing.
- Exposed collections.
public List<Student> Students { get; set; }— every consumer can add, remove, clear, or replace the whole list. Invariants like "a section has at most 40 students" become unenforceable. - Validation living only in the UI. The form checks the rules, the domain object accepts anything. Then an import job, a message consumer, or a second UI writes directly — and garbage enters through the side door, exactly like the import job in Figure 4.
When teams audit where their anemic classes came from, the blame usually splits like this:
⚖️ When it is okay to ignore
This is the most important honesty section in this post. Plain data classes are sometimes exactly the right design. Calling every one of them a smell is a beginner's mistake.
| Kind of class | Smell? | Why it is fine (or not) |
|---|---|---|
| DTO crossing a boundary (API payload, queue message) | No | Its whole job is to be a transparent shape that maps to JSON or a wire format |
C# record / Java record / Python @dataclass | No | Language-blessed immutable value bundles; adding ceremony fights the language |
| Read model / view model / CQRS projection | No | Deliberately flat, query-shaped data for display or reporting |
| Configuration objects | No | Settings are data by nature |
| Functional-programming style records + pure functions | No | In FP, immutable data plus separate functions is the intended design |
| Domain entity with rules, hollowed into getters/setters | Yes | It should guard invariants and own calculations, but cannot |
| "Domain" object whose rules are copied across callers | Yes | The duplication and drift prove the behavior belongs inside |
How to tell a healthy DTO from a sick domain object? Ask: does this data have rules and invariants that must always hold?
- An
OrderResponseDtogoing out as JSON has no rules to defend — it is a photograph of data, frozen and outbound. Plain is perfect. - The
Orderinside your domain has rules — "discount between 0 and 1", "total is computed this way", "lines cannot be mutated from outside". If it cannot defend them, it is anemic.
Do not "fix" your DTOs by stuffing business logic into them. A DTO with behavior is its own mess — now your wire format and your business rules change together. The correct shape of many systems is: rich domain objects in the middle, thin DTOs at the edges, and mapping between them. Plain at the boundary, smart at the core.
🛠️ Which refactorings cure it
| Symptom | Curing refactoring | Result |
|---|---|---|
| Behavior on this data lives in other classes | Move Method | Logic relocates to the class that owns the data |
| Naked public fields | Encapsulate Field | Writes pass through guarded methods |
| Raw collections handed out | Encapsulate Collection | Read-only view plus add/remove methods |
| Callers repeat the same getter-then-calculate dance | Extract Method + Move Method | The dance becomes one method on the class |
| Setters that should never exist | Remove Setting Method | Immutable after construction |
A practical hunting tactic from Fowler's catalog: look at the callers of each getter. If several callers take the value and perform the same calculation on it, that calculation is begging to move into the class as a method. Follow the getters; they lead you to the missing behavior.
📦 Quick revision box
+--------------------------------------------------------------+
| DATA CLASS — QUICK REVISION |
+--------------------------------------------------------------+
| Story : A visitor register with no rules — anyone |
| scribbles anything, so the data can't be trusted. |
| Smell : A DOMAIN class with fields + getters/setters |
| but no behavior; others do its thinking. |
| Why bad : Cannot guard its invariants; rules duplicated |
| across callers; internals leak; data goes bad. |
| Principle: Tell, Don't Ask — ask order.total(), don't |
| pull fields and compute outside. |
| NOT smell: DTOs, records, view models, config objects — |
| plain-by-design data at boundaries is GOOD. |
| Cures : Move Method, Encapsulate Field, |
| Encapsulate Collection, Remove Setting Method. |
| Motto : Plain at the boundary, smart at the core. |
+--------------------------------------------------------------+✏️ Practice exercise
A library management program has an anemic heart. Operate on it.
class LibraryBook {
title = "";
timesIssued = 0;
isIssued = false;
dueDateDay = 0; // day of month; 0 means "no due date"
}
class Library {
books: LibraryBook[] = [];
}
// in the issue-desk screen:
function issueBook(b: LibraryBook, today: number): void {
b.isIssued = true;
b.timesIssued = b.timesIssued + 1;
b.dueDateDay = today + 14; // can become 36 if today is 22!
}
// in the fine-counter screen:
function fineFor(b: LibraryBook, today: number): number {
return (today - b.dueDateDay) * 2; // negative fine if returned early!
}
// in the reports screen:
function isOverdue(b: LibraryBook, today: number): boolean {
return b.isIssued && today > b.dueDateDay && b.dueDateDay !== 0;
}Your tasks:
- List every rule about a book's data that currently lives outside
LibraryBook. (Hint: there are at least four, including the secret "0 means no due date" convention.) - Find two real bugs already present in the callers. (Look at the due-date arithmetic and the fine calculation.)
- Refactor: move
issue,fineFor, andisOverdueintoLibraryBookusing Move Method. Make the fields private. Fix both bugs while moving — the class is now responsible for its own correctness. - Replace the secret
0convention with something honest (adueDateDay: number | null). Note how the class can now hide this detail completely from callers. - Protect
Library.bookswith Encapsulate Collection: a read-only view plus anaddBookmethod. - Draw the state machine of a book (like Figure 8): available → issued → returned. Mark which illegal jumps your new class now rejects.
- Finally, design a
BookSummaryDtowithtitleandtimesIssuedfor the library's public website API — completely behavior-free. Write one sentence explaining why this plain class is not the Data Class smell.
When your LibraryBook can no longer hold impossible data — and your DTO is proudly, correctly plain — you have mastered both halves of this lesson, and Mr. Kulkarni's bicycle thief would have been caught.
Frequently asked questions
- What is a Data Class smell in simple words?
- It is a class that only holds fields with getters and setters but has no behavior of its own. All the thinking about its data — validation, calculation, rules — is done by other classes far away. The data and its rules live apart, even though they always change together.
- Are DTOs and records also Data Class smells?
- No. DTOs carry data across a boundary like an API or a message queue, and their whole job is to be a plain, transparent shape. Records and dataclasses are language-blessed ways to model immutable value bundles. The smell is only a DOMAIN object that should own its rules but has been hollowed out into a bag of getters and setters.
- What is an anemic domain model?
- It is Martin Fowler's name for a design where domain objects look real but contain no behavior — just data — while all logic sits in procedural service classes. It looks object-oriented from far away, but it loses the main benefit of objects: keeping data and the operations on that data together.
- What does Tell, Don't Ask mean?
- Instead of asking an object for its data and doing the calculation yourself, tell the object what you want and let it do its own thinking. Ask order.total() instead of pulling out lines and discount and computing the total in five different callers.
- Which refactorings cure a Data Class?
- Move Method brings the behavior that operates on the data into the class that owns the data. Encapsulate Field replaces naked public fields with controlled access. Encapsulate Collection stops outsiders from mutating internal lists by exposing read-only views with add and remove methods.
Further reading
Related Lessons
Feature Envy: The Method That Sits in Someone Else's Class All Day
Learn the Feature Envy code smell with a simple school story. When a method keeps using another class's data more than its own, it probably belongs in that other class. Cure it with Move Method.
Primitive Obsession: When Everything Is Just a String or a Number
Primitive Obsession explained simply — why plain strings and numbers hide bugs, and how value objects like Money and Address make code safe and clear.
Lazy Class: The Watchman Whose Only Job Is Pressing One Lift Button
Learn the Lazy Class code smell with a society watchman story. Find classes that do too little to deserve existing, and cure them with Inline Class.
Move Method: Shift Work to the Class Where It Truly Belongs
Learn the Move Method refactoring through a simple school story. Shift a method into the class whose data it uses most so behaviour and data stay together.