Day 6: Scaling, Cost Optimization & Load Balancing!

6 min readFeb 26, 2025

🔥 Welcome Back, Future Cloud Architects!

Buckle up because today’s blog is going to be a rollercoaster ride! 🎢

Your startup has gone viral! 🚀 You see tweets about your app, news articles covering your success, and a flood of users signing up every day.

Then, one day, your founder calls you in a panic…

“AWS is charging me a fortune! 💰 How do we reduce hosting costs without breaking our app?”

Uh-oh. 😬 It’s time to put on your Cloud Engineer hat 🎩 and start optimizing!

💰 Why Are AWS Costs So High?

🔹 We’re using serverless technology (AWS Lambda).
🔹 AWS manages our servers for us, spinning them up and down as needed.
🔹 Serverless = convenience but at a premium cost.

💡 Solution? Let’s explore cheaper options!

🧐 Cost Optimization: Where Do We Start?

1️⃣ First, Can We Get AWS Credits?

✅ Our founder already tried this and exhausted all credits. ❌ No free money left!

2️⃣ Next, Can We Optimize Our Code?

✅ We already implemented caching and optimized database queries.
✅ There’s no more room for improvement at the code level.

🔎 So what’s left?
We need an infrastructure-level change.

🛠️ Moving Away from Serverless: Reserving an AWS Instance

One of the reasons serverless costs more is because AWS handles everything for us.
But what if we rent a dedicated server instead?

💡 The “Reserved Instance” Strategy

Instead of AWS managing our servers on-demand, we reserve a dedicated server for 1 year.
AWS gives discounts when you prepay for a full year.
We pay less per request, but we have to manage the server ourselves.

📌 Final Decision? ✅ We reserve an AWS EC2 instance to reduce costs.

🔄 How Does Our API Work Now?

So that will be let’s assume this is our api url :

Our ec2 instance is up and running on aws and we got this url for our server : www.api.ourcoolapp.com

www.api.ourcoolapp.com/purchase?id=23213&userId=3222

www.api.ourcoolapp.com : This helps in resolving the server.
purchase : This helps determine which route is handling all the purchase related functions.
?id=23213&userId=3222 : This are the GET request paramaters for the route to access some values.
You can also send authentication headers along with your api request, This are very basic things in the API and how routing works.

📌 Result? 💰 50% reduction in AWS costs!

AWS bill before: $1000/month
AWS bill after switching to EC2: $500/month

🎉 Our founder is super happy! … But then… 🚨

💀 The New Problem: “My App is Slow & Keeps Crashing!”

Customers start complaining:

“The app is too slow!”
“Sometimes, I can’t even access it!”

As a developer, you log into AWS and see…

📉 Our server is overloaded!
📈 We are using too much memory!

💡 Solution? We need bigger servers!

🚀 Scaling: But How?

We upgrade our instance from Large → XLarge → 4XLarge…
📌 But this keeps happening every few weeks!

At some point, we realize two major problems with this approach:

❌ 1. We’re Not Reserving Capacity Efficiently

We reserve for a year, but our app is growing too fast.
By the time we scale, our reserved instance is outdated!

❌ 2. We’re Overpaying During Low Traffic

T-shirt sales spike during festivals → High CPU usage 🔥
At night, barely any users → We still pay full price for an idle server!

🔮 Solution? We need Auto-Scaling!

⚖️ The Big Decision: Vertical Scaling vs. Horizontal Scaling

🛠️ 1. Vertical Scaling (What We Were Doing)

📌 “Just buy a bigger server!”

Ex : small, medium, large, extra large and etc…

Pros: Simple, keeps all data in one place.
Cons: Expensive, inefficient, single point of failure.

🛠️ 2. Horizontal Scaling (What We Should Do!)

📌 “Use multiple smaller servers instead of one big one.”

Pros: More efficient, cheaper, handles failure better.
Cons: More complex to manage.

📌 Final Decision? ✅ Move to Horizontal Scaling!

🌍 Enter Load Balancing: Distributing Traffic Efficiently

With multiple smaller servers, we need to ensure that:
✅ Each server gets an equal number of requests.
✅ If one server crashes, requests are sent to another.

🔄 How Load Balancing Works

1️⃣ User makes a request → Hits our Load Balancer (AWS Elastic Load Balancer — ELB)
2️⃣ ELB distributes the request to the least busy server.
3️⃣ If a server crashes, ELB automatically redirects traffic.

📌 Result? 🚀 Faster response times, better uptime, and lower costs!

🧠 Load Balancing Algorithms: How Do We Distribute Traffic?

Load Balancing Algorithms can be broken into two major pieces. The first can be classified as stateless algorithm and the other ione is statefull algorithm.

State here means memory. If I get a request and if I have a mapping of every request id to its corresponding server like, then that is called stateful algorithm.

Meaning I need memory. I need to manage state through route requests. If I do this without any memory, if I do this with a just a pure function, that will be stateless. I don’t store memory of each request to each server mapping.

1️⃣ Round Robin (Stateless)

✔️ Distributes requests evenly across all servers.
✔️ Simple & efficient.
✔️ Used when all servers have equal capacity.

2️⃣ Least Connections (Stateful)

3️⃣ Geo-Based Load Balancing

✔️ Routes users to the closest server region (e.g., India users → India server, US users → US server).
✔️ Reduces latency & improves performance.

📌 Final Decision? ✅ Use AWS Elastic Load Balancer (ELB) with Least Connections.

🛠️ Hybrid Scaling Strategy: The Best of Both Worlds!

Adding more servers is called horizontal scaling

Getting more compute more bandwidth more power into a single computer That is called vertical scaling

So buying bigger machines is called vertical scaling
buying more machines is called horizontal scaling.

We decide to use both Vertical & Horizontal Scaling:

✅ Start with a Large Instance.
✅ When traffic increases, add more servers (horizontal scaling).
✅ When traffic is low, scale down to fewer servers.

📌 Final Result? Lower costs, higher availability, and better performance!

🏆 Final Takeaways from Day 6

✅ AWS Reserved Instances cut costs by 50%
✅ Moving from vertical to horizontal scaling improved efficiency
✅ Load balancing ensures smooth traffic distribution
✅ Auto-scaling optimizes costs & performance

See you in the next blog! Let’s scale our system even further! Exiting. Right.