Backend EngineeringDatabasePerformanceOptimization

Database Optimization Techniques for Backend Engineers

Satyam Parmar
January 16, 2025
8 min read

Database Optimization Techniques for Backend Engineers

Database performance is critical for backend applications. This comprehensive guide covers essential optimization techniques to improve query performance, reduce latency, and scale your database effectively.

Indexing Strategies

Primary Indexes

-- Primary key (automatically indexed)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  name VARCHAR(255) NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Unique index for email lookups
CREATE UNIQUE INDEX idx_users_email ON users(email);

Composite Indexes

-- Multi-column index for complex queries
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Query that benefits from this index
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at DESC 
LIMIT 10;

Partial Indexes

-- Index only active users
CREATE INDEX idx_users_active ON users(email) 
WHERE status = 'active';

-- Index only recent orders
CREATE INDEX idx_orders_recent ON orders(user_id, created_at) 
WHERE created_at > '2024-01-01';

Covering Indexes

-- Include all needed columns in index
CREATE INDEX idx_users_covering ON users(email, name, status) 
INCLUDE (created_at, updated_at);

-- Query can be satisfied entirely from index
SELECT email, name, status, created_at 
FROM users 
WHERE email = 'user@example.com';

Java/Spring Boot with JPA:

@Entity
@Table(name = "users")
public class User {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(unique = true, nullable = false)
    private String email;
    
    @Column(nullable = false)
    private String name;
    
    @CreationTimestamp
    @Column(name = "created_at")
    private LocalDateTime createdAt;
    
    // Getters and setters
}

// Repository with custom queries
@Repository
public interface UserRepository extends JpaRepository<User, Long> {
    
    // Simple query method (automatically optimized)
    Optional<User> findByEmail(String email);
    
    // Custom query with @Query
    @Query("SELECT u FROM User u WHERE u.email = :email AND u.createdAt > :date")
    List<User> findActiveUsersByEmail(@Param("email") String email, 
                                   @Param("date") LocalDateTime date);
    
    // Native query for complex operations
    @Query(value = "SELECT * FROM users WHERE email = ?1 AND created_at > ?2", 
           nativeQuery = true)
    List<User> findUsersByEmailAndDate(String email, LocalDateTime date);
    
    // Pagination with sorting
    Page<User> findByEmailContaining(String email, Pageable pageable);
}

// Service layer with optimization
@Service
@Transactional
public class UserService {
    
    @Autowired
    private UserRepository userRepository;
    
    @Autowired
    private EntityManager entityManager;
    
    // Batch operations for better performance
    @Transactional
    public void saveUsersInBatch(List<User> users) {
        int batchSize = 50;
        for (int i = 0; i < users.size(); i += batchSize) {
            List<User> batch = users.subList(i, 
                Math.min(i + batchSize, users.size()));
            userRepository.saveAll(batch);
            entityManager.flush();
            entityManager.clear();
        }
    }
    
    // Optimized query with projections
    public List<UserProjection> findUserProjections(String email) {
        return userRepository.findByEmailContaining(email, 
            PageRequest.of(0, 100, Sort.by("createdAt").descending()));
    }
}

// Projection interface for selective data
public interface UserProjection {
    Long getId();
    String getEmail();
    String getName();
    LocalDateTime getCreatedAt();
}

Query Optimization

Use EXPLAIN to Analyze Queries

-- Analyze query execution plan
EXPLAIN (ANALYZE, BUFFERS) 
SELECT u.name, o.total 
FROM users u 
JOIN orders o ON u.id = o.user_id 
WHERE u.status = 'active' 
AND o.created_at > '2024-01-01';

Optimize JOIN Operations

-- Use appropriate JOIN types
-- INNER JOIN for matching records only
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active';

-- LEFT JOIN to include all users
SELECT u.name, COALESCE(SUM(o.total), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

Avoid N+1 Queries

// Bad: N+1 query problem
const users = await User.findAll();
for (const user of users) {
  const orders = await Order.findAll({ where: { userId: user.id } });
  user.orders = orders;
}

// Good: Single query with JOIN
const users = await User.findAll({
  include: [{
    model: Order,
    as: 'orders'
  }]
});

// Good: Batch loading
const userIds = users.map(u => u.id);
const allOrders = await Order.findAll({
  where: { userId: { [Op.in]: userIds } }
});
// Group orders by userId

Use LIMIT and OFFSET Efficiently

-- Use cursor-based pagination for large datasets
-- Instead of OFFSET (slow for large offsets)
SELECT * FROM orders 
WHERE id > 1000 
ORDER BY id 
LIMIT 20;

-- Or use timestamp-based pagination
SELECT * FROM orders 
WHERE created_at < '2024-01-15 10:30:00' 
ORDER BY created_at DESC 
LIMIT 20;

Connection Pooling

Node.js with pg-pool

const { Pool } = require('pg');

const pool = new Pool({
  user: process.env.DB_USER,
  host: process.env.DB_HOST,
  database: process.env.DB_NAME,
  password: process.env.DB_PASSWORD,
  port: 5432,
  max: 20, // Maximum number of clients in the pool
  idleTimeoutMillis: 30000, // Close idle clients after 30 seconds
  connectionTimeoutMillis: 2000, // Return an error after 2 seconds if connection could not be established
  ssl: process.env.NODE_ENV === 'production' ? { rejectUnauthorized: false } : false
});

// Use pool for queries
async function getUserById(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release();
  }
}

Connection Pool Configuration

// Optimal pool settings based on application needs
const poolConfig = {
  // For read-heavy applications
  readPool: {
    max: 50,
    min: 5,
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 60000
  },
  
  // For write-heavy applications
  writePool: {
    max: 20,
    min: 2,
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 30000
  }
};

Caching Strategies

Application-Level Caching

const NodeCache = require('node-cache');
const cache = new NodeCache({ stdTTL: 600 }); // 10 minutes

async function getCachedUser(userId) {
  const cacheKey = `user:${userId}`;
  
  // Try cache first
  let user = cache.get(cacheKey);
  if (user) {
    return user;
  }
  
  // Fetch from database
  user = await getUserFromDB(userId);
  
  // Store in cache
  cache.set(cacheKey, user);
  
  return user;
}

Redis Caching

const redis = require('redis');
const client = redis.createClient({
  host: process.env.REDIS_HOST,
  port: process.env.REDIS_PORT,
  password: process.env.REDIS_PASSWORD
});

async function getCachedData(key) {
  try {
    const cached = await client.get(key);
    return cached ? JSON.parse(cached) : null;
  } catch (error) {
    console.error('Cache error:', error);
    return null;
  }
}

async function setCachedData(key, data, ttl = 3600) {
  try {
    await client.setex(key, ttl, JSON.stringify(data));
  } catch (error) {
    console.error('Cache set error:', error);
  }
}

// Cache invalidation patterns
async function invalidateUserCache(userId) {
  const patterns = [
    `user:${userId}`,
    `user:${userId}:orders`,
    `user:${userId}:profile`
  ];
  
  for (const pattern of patterns) {
    await client.del(pattern);
  }
}

Database Schema Optimization

Normalization vs Denormalization

-- Normalized schema (3NF)
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE user_profiles (
  user_id INTEGER PRIMARY KEY REFERENCES users(id),
  bio TEXT,
  avatar_url VARCHAR(500),
  location VARCHAR(255)
);

-- Denormalized for read performance
CREATE TABLE user_summary (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL,
  bio TEXT,
  avatar_url VARCHAR(500),
  location VARCHAR(255),
  total_orders INTEGER DEFAULT 0,
  last_order_date TIMESTAMP
);

Partitioning Large Tables

-- Partition by date range
CREATE TABLE orders (
  id SERIAL,
  user_id INTEGER NOT NULL,
  total DECIMAL(10,2) NOT NULL,
  created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Monitoring and Profiling

Query Performance Monitoring

// Log slow queries
const slowQueryThreshold = 1000; // 1 second

function logSlowQuery(query, duration) {
  if (duration > slowQueryThreshold) {
    console.warn(`Slow query detected: ${duration}ms`);
    console.log('Query:', query);
  }
}

// Wrap database calls
async function executeQuery(query, params) {
  const start = Date.now();
  try {
    const result = await pool.query(query, params);
    const duration = Date.now() - start;
    logSlowQuery(query, duration);
    return result;
  } catch (error) {
    const duration = Date.now() - start;
    console.error(`Query failed after ${duration}ms:`, error);
    throw error;
  }
}

Database Metrics

// Monitor connection pool metrics
setInterval(() => {
  const poolStats = {
    totalCount: pool.totalCount,
    idleCount: pool.idleCount,
    waitingCount: pool.waitingCount
  };
  
  console.log('Pool stats:', poolStats);
  
  // Alert if pool is exhausted
  if (poolStats.waitingCount > 10) {
    console.warn('High connection pool wait time!');
  }
}, 30000); // Every 30 seconds

Read Replicas

// Separate read and write connections
const writePool = new Pool({
  host: process.env.DB_WRITE_HOST,
  // ... other config
});

const readPool = new Pool({
  host: process.env.DB_READ_HOST,
  // ... other config
});

// Route queries appropriately
async function getUserById(id, useReadReplica = true) {
  const pool = useReadReplica ? readPool : writePool;
  const result = await pool.query('SELECT * FROM users WHERE id = $1', [id]);
  return result.rows[0];
}

async function createUser(userData) {
  // Always use write pool for mutations
  const result = await writePool.query(
    'INSERT INTO users (name, email) VALUES ($1, $2) RETURNING *',
    [userData.name, userData.email]
  );
  return result.rows[0];
}

Best Practices Summary

  1. Index strategically - Create indexes for frequently queried columns
  2. Monitor performance - Use EXPLAIN and profiling tools
  3. Optimize queries - Avoid N+1 problems and use appropriate JOINs
  4. Implement caching - Cache frequently accessed data
  5. Use connection pooling - Manage database connections efficiently
  6. Consider partitioning - For very large tables
  7. Use read replicas - Scale read operations
  8. Monitor and alert - Set up proper monitoring
  9. Regular maintenance - Update statistics and rebuild indexes
  10. Test with realistic data - Use production-like data for testing

Conclusion

Database optimization is an ongoing process that requires monitoring, testing, and continuous improvement. Start with the basics like proper indexing and query optimization, then move to more advanced techniques like caching and read replicas as your application scales.

Remember: Premature optimization is the root of all evil, but ignoring performance until it becomes a problem is equally dangerous. Find the right balance for your specific use case.

Related Articles

Home