Database Optimization Techniques for Backend Engineers
Database Optimization Techniques for Backend Engineers
Database performance is critical for backend applications. This comprehensive guide covers essential optimization techniques to improve query performance, reduce latency, and scale your database effectively.
Indexing Strategies
Primary Indexes
-- Primary key (automatically indexed) CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) UNIQUE NOT NULL, name VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Unique index for email lookups CREATE UNIQUE INDEX idx_users_email ON users(email);
Composite Indexes
-- Multi-column index for complex queries CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC); -- Query that benefits from this index SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC LIMIT 10;
Partial Indexes
-- Index only active users CREATE INDEX idx_users_active ON users(email) WHERE status = 'active'; -- Index only recent orders CREATE INDEX idx_orders_recent ON orders(user_id, created_at) WHERE created_at > '2024-01-01';
Covering Indexes
-- Include all needed columns in index CREATE INDEX idx_users_covering ON users(email, name, status) INCLUDE (created_at, updated_at); -- Query can be satisfied entirely from index SELECT email, name, status, created_at FROM users WHERE email = 'user@example.com';
Java/Spring Boot with JPA:
@Entity @Table(name = "users") public class User { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; @Column(unique = true, nullable = false) private String email; @Column(nullable = false) private String name; @CreationTimestamp @Column(name = "created_at") private LocalDateTime createdAt; // Getters and setters } // Repository with custom queries @Repository public interface UserRepository extends JpaRepository<User, Long> { // Simple query method (automatically optimized) Optional<User> findByEmail(String email); // Custom query with @Query @Query("SELECT u FROM User u WHERE u.email = :email AND u.createdAt > :date") List<User> findActiveUsersByEmail(@Param("email") String email, @Param("date") LocalDateTime date); // Native query for complex operations @Query(value = "SELECT * FROM users WHERE email = ?1 AND created_at > ?2", nativeQuery = true) List<User> findUsersByEmailAndDate(String email, LocalDateTime date); // Pagination with sorting Page<User> findByEmailContaining(String email, Pageable pageable); } // Service layer with optimization @Service @Transactional public class UserService { @Autowired private UserRepository userRepository; @Autowired private EntityManager entityManager; // Batch operations for better performance @Transactional public void saveUsersInBatch(List<User> users) { int batchSize = 50; for (int i = 0; i < users.size(); i += batchSize) { List<User> batch = users.subList(i, Math.min(i + batchSize, users.size())); userRepository.saveAll(batch); entityManager.flush(); entityManager.clear(); } } // Optimized query with projections public List<UserProjection> findUserProjections(String email) { return userRepository.findByEmailContaining(email, PageRequest.of(0, 100, Sort.by("createdAt").descending())); } } // Projection interface for selective data public interface UserProjection { Long getId(); String getEmail(); String getName(); LocalDateTime getCreatedAt(); }
Query Optimization
Use EXPLAIN to Analyze Queries
-- Analyze query execution plan EXPLAIN (ANALYZE, BUFFERS) SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id WHERE u.status = 'active' AND o.created_at > '2024-01-01';
Optimize JOIN Operations
-- Use appropriate JOIN types -- INNER JOIN for matching records only SELECT u.name, o.total FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE u.status = 'active'; -- LEFT JOIN to include all users SELECT u.name, COALESCE(SUM(o.total), 0) as total_spent FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name;
Avoid N+1 Queries
// Bad: N+1 query problem const users = await User.findAll(); for (const user of users) { const orders = await Order.findAll({ where: { userId: user.id } }); user.orders = orders; } // Good: Single query with JOIN const users = await User.findAll({ include: [{ model: Order, as: 'orders' }] }); // Good: Batch loading const userIds = users.map(u => u.id); const allOrders = await Order.findAll({ where: { userId: { [Op.in]: userIds } } }); // Group orders by userId
Use LIMIT and OFFSET Efficiently
-- Use cursor-based pagination for large datasets -- Instead of OFFSET (slow for large offsets) SELECT * FROM orders WHERE id > 1000 ORDER BY id LIMIT 20; -- Or use timestamp-based pagination SELECT * FROM orders WHERE created_at < '2024-01-15 10:30:00' ORDER BY created_at DESC LIMIT 20;
Connection Pooling
Node.js with pg-pool
const { Pool } = require('pg'); const pool = new Pool({ user: process.env.DB_USER, host: process.env.DB_HOST, database: process.env.DB_NAME, password: process.env.DB_PASSWORD, port: 5432, max: 20, // Maximum number of clients in the pool idleTimeoutMillis: 30000, // Close idle clients after 30 seconds connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: false } : false }); // Use pool for queries async function getUserById(id) { const client = await pool.connect(); try { const result = await client.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } finally { client.release(); } }
Connection Pool Configuration
// Optimal pool settings based on application needs const poolConfig = { // For read-heavy applications readPool: { max: 50, min: 5, acquireTimeoutMillis: 30000, idleTimeoutMillis: 60000 }, // For write-heavy applications writePool: { max: 20, min: 2, acquireTimeoutMillis: 30000, idleTimeoutMillis: 30000 } };
Caching Strategies
Application-Level Caching
const NodeCache = require('node-cache'); const cache = new NodeCache({ stdTTL: 600 }); // 10 minutes async function getCachedUser(userId) { const cacheKey = `user:${userId}`; // Try cache first let user = cache.get(cacheKey); if (user) { return user; } // Fetch from database user = await getUserFromDB(userId); // Store in cache cache.set(cacheKey, user); return user; }
Redis Caching
const redis = require('redis'); const client = redis.createClient({ host: process.env.REDIS_HOST, port: process.env.REDIS_PORT, password: process.env.REDIS_PASSWORD }); async function getCachedData(key) { try { const cached = await client.get(key); return cached ? JSON.parse(cached) : null; } catch (error) { console.error('Cache error:', error); return null; } } async function setCachedData(key, data, ttl = 3600) { try { await client.setex(key, ttl, JSON.stringify(data)); } catch (error) { console.error('Cache set error:', error); } } // Cache invalidation patterns async function invalidateUserCache(userId) { const patterns = [ `user:${userId}`, `user:${userId}:orders`, `user:${userId}:profile` ]; for (const pattern of patterns) { await client.del(pattern); } }
Database Schema Optimization
Normalization vs Denormalization
-- Normalized schema (3NF) CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL ); CREATE TABLE user_profiles ( user_id INTEGER PRIMARY KEY REFERENCES users(id), bio TEXT, avatar_url VARCHAR(500), location VARCHAR(255) ); -- Denormalized for read performance CREATE TABLE user_summary ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, bio TEXT, avatar_url VARCHAR(500), location VARCHAR(255), total_orders INTEGER DEFAULT 0, last_order_date TIMESTAMP );
Partitioning Large Tables
-- Partition by date range CREATE TABLE orders ( id SERIAL, user_id INTEGER NOT NULL, total DECIMAL(10,2) NOT NULL, created_at TIMESTAMP NOT NULL ) PARTITION BY RANGE (created_at); -- Create monthly partitions CREATE TABLE orders_2024_01 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE orders_2024_02 PARTITION OF orders FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Monitoring and Profiling
Query Performance Monitoring
// Log slow queries const slowQueryThreshold = 1000; // 1 second function logSlowQuery(query, duration) { if (duration > slowQueryThreshold) { console.warn(`Slow query detected: ${duration}ms`); console.log('Query:', query); } } // Wrap database calls async function executeQuery(query, params) { const start = Date.now(); try { const result = await pool.query(query, params); const duration = Date.now() - start; logSlowQuery(query, duration); return result; } catch (error) { const duration = Date.now() - start; console.error(`Query failed after ${duration}ms:`, error); throw error; } }
Database Metrics
// Monitor connection pool metrics setInterval(() => { const poolStats = { totalCount: pool.totalCount, idleCount: pool.idleCount, waitingCount: pool.waitingCount }; console.log('Pool stats:', poolStats); // Alert if pool is exhausted if (poolStats.waitingCount > 10) { console.warn('High connection pool wait time!'); } }, 30000); // Every 30 seconds
Read Replicas
// Separate read and write connections const writePool = new Pool({ host: process.env.DB_WRITE_HOST, // ... other config }); const readPool = new Pool({ host: process.env.DB_READ_HOST, // ... other config }); // Route queries appropriately async function getUserById(id, useReadReplica = true) { const pool = useReadReplica ? readPool : writePool; const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]); return result.rows[0]; } async function createUser(userData) { // Always use write pool for mutations const result = await writePool.query( 'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *', [userData.name, userData.email] ); return result.rows[0]; }
Best Practices Summary
- Index strategically - Create indexes for frequently queried columns
- Monitor performance - Use EXPLAIN and profiling tools
- Optimize queries - Avoid N+1 problems and use appropriate JOINs
- Implement caching - Cache frequently accessed data
- Use connection pooling - Manage database connections efficiently
- Consider partitioning - For very large tables
- Use read replicas - Scale read operations
- Monitor and alert - Set up proper monitoring
- Regular maintenance - Update statistics and rebuild indexes
- Test with realistic data - Use production-like data for testing
Conclusion
Database optimization is an ongoing process that requires monitoring, testing, and continuous improvement. Start with the basics like proper indexing and query optimization, then move to more advanced techniques like caching and read replicas as your application scales.
Remember: Premature optimization is the root of all evil, but ignoring performance until it becomes a problem is equally dangerous. Find the right balance for your specific use case.
Related Articles
Incident Playbook for Beginners: Real-World Monitoring and Troubleshooting Stories
A story-driven, plain English incident playbook for new backend & SRE engineers. Find, fix, and prevent outages with empathy and practical steps.
System Design Power-Guide 2025: What To Learn, In What Order, With Real-World Links
Stop bookmarking random threads. This is a tight, no-fluff map of what to study for system design in 2025 - what each topic is, why it matters in interviews and production, and where to go deeper.
DSA Patterns Master Guide: How To Identify Problems, Pick Patterns, and Practice (With LeetCode Sets)
A practical, pattern-first road map for entry-level engineers. Learn how to identify the right pattern quickly, apply a small algorithm template, know variants and pitfalls, and practice with curated LeetCode problems.