7. Scaling & Performance
One of the main advantages of running on a platform like Build.io is that scaling doesn't mean provisioning servers. You scale by adjusting dyno counts and sizes, and the platform handles the rest—load balancing, container placement, and traffic distribution all happen automatically.
This section covers how to scale your application and how to get the most out of the resources you're paying for.
7.1 Horizontal scaling (adding instances)
Horizontal scaling means running more dynos of the same type. Each additional dyno is an independent container running your code, and Build's router distributes incoming requests across all of them.
Scaling Dynos
From the dashboard, navigate to your app's Resources tab. Adjust the dyno count for each process type (web, worker, etc.) using the slider or input field.
From the CLI:
$ bld ps:scale web=3 -a my-app
Scaling dynos... done, now running web at 3
To check your current formation:
$ bld ps:list -a my-app
When to Scale Horizontally
Horizontal scaling is the right move when your dynos are handling too many concurrent requests and response times are climbing. Common signals include rising request queue times, increasing latency under load, and timeout errors.
Adding dynos helps when your application is I/O-bound—waiting on database queries, external API calls, or file operations. Each dyno can handle requests independently while others are blocked on I/O.
Horizontal scaling is less effective if every individual request is CPU-bound and slow. In that case, either optimize the work being done per request or consider vertical scaling.
Scaling to Zero
You can scale a process type to zero dynos if you want to stop it temporarily without removing the process from your formation. This is useful during maintenance or when you want to pause background workers without destroying their configuration.
$ bld ps:scale web=0 -a my-app
Scale it back up whenever you're ready. Config vars, add-ons, and the rest of your app remain untouched.
7.2 Vertical scaling (instance sizes)
Vertical scaling means moving to larger dyno sizes with more memory and CPU. Rather than adding more instances, you give each instance more resources.
Dyno Sizes
Build offers several dyno sizes to match your application's resource requirements. Smaller sizes are suitable for development, staging, and low-traffic production apps. Larger sizes provide more memory and compute for resource-intensive workloads.
You can adjust the dyno size from the Resources tab of an App.
When to Scale Vertically
Vertical scaling is the right approach when your dynos are running out of memory or when individual requests need more CPU time to complete.
Memory pressure — If your dynos are being restarted due to memory limits, you need a larger dyno size. Common causes include memory leaks, large in-memory data structures, or simply running a memory-hungry runtime.
CPU-bound work — If individual requests are slow because they involve heavy computation (image processing, report generation, data transformation), a larger dyno with more CPU capacity will help. Keep in mind that this is usually better handled by offloading the work to a background worker dyno, which can be sized independently of your web dynos.
Combining Horizontal and Vertical Scaling
Most production applications benefit from a combination of both approaches. A common pattern is to run several medium-sized web dynos and one or two larger worker dynos for background processing. This keeps web response times fast while giving heavy background jobs the resources they need.
7.3 Performance tuning
Scaling is one lever for improving performance. Before (or alongside) scaling, there's usually low-hanging fruit in your application itself. This section covers the most common areas to look at.
Optimize Slow Requests
Most performance issues come from a small number of slow endpoints. Use your logging and monitoring tools to identify which requests take the longest and address them first.
Common culprits include unindexed database queries, N+1 query patterns (fetching associated records in a loop instead of a single query), and calls to slow external APIs made synchronously in the request cycle.
Use Background Jobs for Heavy Work
Any work that doesn't need to complete before the user gets a response should run in a background job. Email delivery, report generation, image processing, webhook dispatch, and third-party API calls are all candidates.
Moving heavy work to background workers keeps your web dynos responsive and prevents long-running requests from tying up capacity. Background workers can be scaled independently, and if a job fails it can be retried without affecting the user's experience.
Cache Aggressively
Caching is often the single biggest performance improvement you can make. If your application repeatedly computes or fetches the same data, cache the result.
Application-level caching — Use Redis (see Section 5.2 Database & Add-ons) to cache database query results, rendered page fragments, serialized API responses, or any expensive computation. Even short TTLs (30–60 seconds) can dramatically reduce database load during traffic spikes.
HTTP caching — Set appropriate Cache-Control headers on responses that don't change frequently. Static assets should be cached aggressively with long expiry times and cache-busted filenames.
Tune Your Web Server Concurrency
Most web frameworks default to conservative concurrency settings. Tuning these for your dyno size and workload can significantly increase throughput without adding dynos.
Ruby (Puma) — Increase the number of workers and threads. A common starting point is 2–4 workers with 5 threads each on a standard dyno. Monitor memory usage to find the right balance—each worker is a separate process.
Python (Gunicorn) — Increase worker count. A common formula is (2 × CPU cores) + 1. Use async workers (gevent or uvicorn) if your app is I/O-bound.
Node.js — Node is single-threaded by default. Use the cluster module to fork multiple worker processes, or run one Node process per dyno and scale horizontally.
