Back-of-the-Envelope Calculations: Tips for System Design Interviews
When I first started preparing for system design interviews, I found BOFE (back-of-the-envelope) calculations to be one of the trickiest parts. It felt like I was constantly guessing which numbers mattered and how to approach them. After struggling with it for a while, I decided to dive deeper into understanding how we can actually approach these calculations. Now, I’m going to share what I’ve learned with you, step by step.
In this blog, I’ll break down how to figure out which BOFE calculations are most important for a given system design problem, and how to approach them without getting lost in the numbers. Let’s dive in!
How to Pick the Right Calculations
Here’s how to figure out which back-of-the-envelope (BOFE) calculations matter most for any given system.
Step 1: Nail Down the Core Use Case
First things first—what’s the system actually doing? Get a grip on its primary function and workload:
- Is it a chat app, pushing real-time messages with low latency?
- A video streaming service, juggling massive storage and bandwidth?
- A metrics dashboard, crunching data for quick insights?
This sets the stage. A read-heavy system like a news feed points you toward throughput and caching, while a write-heavy one like a messaging app screams latency and queueing. Pinpoint the workload, and you’ll know what metrics to chase.
Step 2: Identify What Scales (and Where It Breaks)
Next, ask yourself: as users pile in, what’s going to grow—and what’s going to choke? Every system has its pressure points:
- Social media feed? Fan-out scales with followers.
- Search engine? Indexing balloons with data.
- File storage? Bandwidth and disk space take the hit.
This step helps you zero in on the calculations that’ll expose bottlenecks—whether it’s storage, queries per second (QPS), or network traffic.
Step 3: Match Requirements to Calculations
Once you’ve got the system’s pulse, map its needs to the right math. Here’s a quick guide to connect features to the numbers you’ll need:
| System Need | Calculation to Run |
|---|---|
| Huge datasets | Storage (daily, monthly, yearly) |
| Tons of users | QPS, peak load |
| User uploads | Bandwidth, object size, storage |
| Real-time features | Latency, throughput, queueing |
| API heavy | Payload size, batching, pagination |
| Caching | Hit rate, memory size |
| Database scaling | Row size, total rows, sharding |
| Uptime obsession | Availability vs. downtime |
Use this table as your compass.
Your System Design Cheat Sheet
Now, let’s get to the good stuff: the numbers and formulas you’ll lean on. These are rough, practical, and built for interviews.
Baseline Assumptions
Start with these constants to keep your math grounded:
- Users: Small (1M DAU), Medium (10M DAU), Large (100M DAU).
- Seconds in a Day: 86,400 (round to 100K for quick math).
- Data Sizes: Tweet (~300 bytes), URL (~100 bytes), Image (~1 MB), 1-min Video (~10 MB).
- Network: 10 ms latency, 1 Gbps = 125 MB/s.
- Server: Handles 100–1,000 RPS.
- Storage Units: 10³ (KB), 10⁶ (MB), 10⁹ (GB), 10¹² (TB).
Essential Calculations
1. Storage Capacity
- How: Total Size = # Items × Size per Item × Retention
- Example: 10M tweets/day × 300 bytes × 365 days = 1.1 TB/year
- Call: Single DB for <1 TB; shard or use S3 for >10 TB (especially images/videos).
2. Throughput (RPS)
- How: RPS = Total Requests ÷ Seconds
- Example: 10M messages/day ÷ 86,400 = ~120 RPS
- Call: One server for 100 RPS; load balancer for 1K+.
3. Latency
- How: Total = DB Query + Network + Processing
- Example: 1 ms (DB) + 10 ms (network) + 5 ms (compute) = 16 ms
- Call: Cache if <10 ms needed; optimize DB if >100 ms.
4. Bandwidth
- How: Bandwidth = # Users × Data per User ÷ Time
- Example: 1M users × 1 MB/image ÷ 86,400 = ~12 MB/s (~100 Mbps)
- Call: CDN if >1 Gbps.
5. Server Count
- How: # Servers = Total RPS ÷ RPS per Server
- Example: 1,200 RPS ÷ 100 RPS/server = 12 servers
- Call: Scale out or beef up instances.
6. Queue Size (Backlog)
- How: Backlog = (Arrival Rate - Processing Rate) × Time
- Example: 1,000 uploads/s - 100/s × 10 s = 9,000 items
- Call: Add workers or throttle input.
7. Availability
- How: Downtime = (1 - Uptime) × Time
- Example: 99.9% uptime = 0.001 × 8,760 hrs/year = 8.76 hrs
- Call: Replicate for 99.99% (52 min/year).
Pro Tips for the Interview Hot Seat
- Handle Peaks: Multiply averages by 2–5x for bursts.
- Rough Costs: $0.1/server-hour, $0.01/GB bandwidth.
- Trade-offs: “Caching slashes latency but ups cost—worth it here?”
Back-of-the-envelope calculations aren’t about precision—they’re about reasoning fast and showing you can size a system under pressure. Next time you’re sketching out a distributed system, you’ll have the math to justify your design. Good luck with your interviews!