System Design: A Scalable Architecture for a Music Streaming Service (Spotify-like)

Designing Spotify: A System Design Blueprint

Have you ever wondered how Spotify delivers millions of songs to hundreds of millions of users worldwide with near-instant playback? Behind that simple “press play” experience lies a complex distributed system that must handle massive scale, low latency, and high availability.

In this post, we’ll break down how to architect a music streaming service that can handle 100+ million users, serve a 30+ million song catalog, and deliver low-latency playback globally—all while maintaining cost efficiency and scalability.

Let’s dive in!

What We’re Building

Before we jump into the architecture, let’s define what our music streaming platform needs to do:

For Users:

Stream music with low latency
Search and discover songs, artists, and playlists
Create and manage personal playlists
Like songs and follow artists
View trending content

For Artists:

Upload audio files and metadata
Manage their catalog and profile

For the System:

Handle millions of concurrent streams
Serve content globally with low latency
Scale horizontally as user base grows
Maintain high availability (99.9%+ uptime)

Simple enough, right? But here’s where it gets interesting—we need to do all of this at massive scale.

System Requirements

A clear definition of requirements is the foundation upon which the entire system is designed. These requirements dictate architectural choices, technology selection, and scalability strategies.

Functional Requirements

Artist Content Upload: Mechanism for artists to upload audio files and associated metadata
User Music Playback: Allow authenticated users to stream songs with low latency
Search and Discovery: Users can search for music by title, artist, genre, etc.
Playlist Management: Create and manage personal playlists (add, remove, reorder songs)
User Profiles: Manage playlists, liked songs, followed artists, and preferences
Trending Content: Feature trending songs, optionally filtered by region or genre

Non-Functional Requirements

High Availability: System must be resilient to failures with minimal downtime
Scalability: Architecture must scale horizontally from 50 million to 1+ billion users
Low Latency: Music playback and search must be highly responsive
Eventual Consistency: Acceptable for non-critical operations (e.g., newly uploaded songs appearing in search)
Monitoring & Observability: Basic monitoring including server health, error tracking, and performance metrics

Capacity Planning and Estimation

Strategic capacity planning is critical for architecting a system that can handle the projected load while remaining cost-effective. The following estimations are based on a target scale of 100 million users and a 30 million song catalog.

Metric	Estimation	Rationale / Calculation
User Base	100 Million Users	Baseline assumption for a large-scale service
Song Catalog	30 Million Songs	Initial catalog size assumption
Raw Audio Storage	~90 TB	Based on 30M songs × ~3MB/song average
Replicated Audio Storage	~270 TB	Assuming 3× replication for durability
Song Metadata Storage	~3 GB	Based on 30M songs × ~100 bytes/song
User Metadata Storage	~100 GB	Based on 100M users × ~1 KB/user

Key Insight: Raw audio files dominate both storage and bandwidth costs. User and song metadata represent a comparatively small fraction of the total data footprint. This justifies a decoupled storage architecture that uses separate, specialized solutions for each data type.

High-Level System Architecture

The proposed architecture is a decoupled, service-oriented design built for scalability, maintainability, and fault tolerance. Each component performs a specific function and can be scaled independently.

graph TB
    subgraph Clients
        WA[Web App]
        MA[Mobile App]
    end
    
    subgraph Entry Point
        LB[Load Balancer<br/>Health Checks, SSL]
    end
    
    subgraph Application Layer
        API1[API Server 1<br/>Stateless]
        API2[API Server 2<br/>Stateless]
        API3[API Server N<br/>Stateless]
    end
    
    subgraph Data Layer
        DB[(PostgreSQL<br/>Metadata Database)]
        S3[(S3 Bucket<br/>Audio Files)]
        CACHE[(Redis<br/>Cache Layer)]
    end
    
    subgraph CDN Layer
        CDN[CloudFront CDN<br/>Global Edge Network]
    end
    
    WA --> LB
    MA --> LB
    LB --> API1
    LB --> API2
    LB --> API3
    
    API1 --> DB
    API2 --> DB
    API3 --> DB
    
    API1 --> CACHE
    API2 --> CACHE
    API3 --> CACHE
    
    WA --> CDN
    MA --> CDN
    CDN --> S3
    
    style CDN fill:#FFD700,stroke:#333,stroke-width:3px
    style S3 fill:#87CEEB,stroke:#333,stroke-width:2px
    style DB fill:#90EE90,stroke:#333,stroke-width:2px

Component Breakdown

Client Applications (Web/Mobile): User-facing interface for UI, playback controls, search, and playlist management. Communicates via HTTP REST APIs.
Load Balancer: Entry point that distributes API requests across application servers using algorithms like Round Robin or Least Connections. Performs health checks to route traffic only to healthy instances.
Stateless API Servers: Horizontally scalable layer containing core business logic. Processes user requests, validates authentication tokens (JWT), and interacts with data stores. Stateless nature means any server can handle any request.
Metadata Database (Relational): SQL database (PostgreSQL, MySQL, or AWS RDS) for structured data including user profiles, song metadata (title, artist, duration), and playlist information. Chosen for relational nature of data, complex queries, joins, and transactional integrity.
Object Storage (Blob Store): Highly scalable and durable object storage (AWS S3) for large, immutable audio files. Cost-effective for petabyte-scale data with virtually unlimited scalability and high durability.
Content Delivery Network (CDN): Global network of edge servers (AWS CloudFront) that cache copies of popular songs geographically closer to users. Reduces playback latency, improves user experience, and significantly offloads traffic from origin storage.

Audio Streaming: HLS and CDN Overview

Before diving into detailed workflows, let’s briefly understand how audio streaming works in our architecture.

HLS (HTTP Live Streaming)

HLS is a streaming protocol that breaks large audio files into small segments (typically 2-6 seconds each). Instead of downloading an entire 3-5 MB file, the client downloads small chunks sequentially.

Pre-processing workflow:

Original audio file (e.g., song.mp3) is uploaded to S3
File is segmented into small chunks (.ts or .aac files)
Playlist file (.m3u8) is generated with references to all segments
All files stored in S3, with CDN in front for global delivery

CDN Caching Strategy

File Type	Cache TTL	Reason
`.m3u8` playlists	10-30 seconds	Playlists may change (adaptive bitrate)
`.ts` segments	24 hours	Segments are immutable once created

Key insight: CDN is pull-based. You only upload to S3; CDN automatically fetches and caches content on cache misses. Popular songs get cached at edge locations globally, reducing latency and origin load.

Secure Streaming with Signed Cookies

For authenticated users, we use CloudFront signed cookies instead of signing every segment URL:

User logs in and backend validates subscription
Backend sets CloudFront signed cookies (grants access to path prefix like /hls/song123/*)
User can now fetch playlists and segments freely
CDN validates cookies and serves content

Why this approach?

✅ One-time authentication (set cookies once, stream freely)
✅ CDN caching works perfectly (segments aren’t signed, so they cache normally)
✅ No URL regeneration needed
✅ Smooth playback without interruptions

This is much more efficient than generating signed URLs for hundreds of segments per song!

Detailed System Workflows

This section details the step-by-step processes for the system’s primary functions: uploading new music and playing existing tracks.

Write Path: Artist Song Upload

sequenceDiagram
    participant Artist
    participant Client
    participant API
    participant S3
    participant DB
    participant Queue as Message Queue
    participant Worker as Worker Service
    
    Artist->>Client: Upload song + metadata
    Client->>API: POST /songs (with auth token)
    API->>API: Validate artist authentication
    API->>API: Validate file format & size
    
    alt Direct Upload
        Client->>S3: Upload audio file directly
        S3-->>Client: Upload complete
    else Server Upload
        Client->>API: Upload file to API
        API->>S3: Store audio file
    end
    
    API->>DB: Insert song metadata record
    API->>Queue: Publish transcoding job (optional)
    
    Note over Queue,Worker: Asynchronous Processing
    Queue->>Worker: Consume transcoding job
    Worker->>S3: Fetch original file
    Worker->>Worker: Transcode to multiple bitrates<br/>(64kbps, 128kbps, 320kbps)
    Worker->>Worker: Generate HLS segments
    Worker->>S3: Store segments & playlists
    
    API-->>Client: Success response
    Note over Client: Song available for streaming

Steps:

Artist initiates upload via client, sending POST request with audio file and metadata
API server validates request (file format, size limits, authentication)
Raw audio file uploaded to S3 (directly from client or via server)
API server extracts metadata (duration, bitrate) and inserts record into SQL database
(Optional) Upload triggers message in queue (RabbitMQ, SQS) for asynchronous transcoding
Worker services consume messages to perform offline tasks:
- Transcode audio into multiple bitrates (64kbps, 128kbps, 320kbps) for adaptive streaming
- Chunk files into HLS segments
- Generate playlist files (.m3u8)
Success response returned to artist

Read Path: User Song Playback

sequenceDiagram
    participant User
    participant Client
    participant API
    participant CDN
    participant S3
    participant DB
    
    User->>Client: Press Play
    Client->>API: GET /songs/{id} (with JWT)
    API->>API: Validate JWT token
    API->>API: Check subscription level
    API->>DB: Query song metadata
    DB-->>API: Return song info
    API->>API: Set CloudFront signed cookies
    API-->>Client: Return metadata + cookies
    
    Client->>CDN: Request master.m3u8 (with cookies)
    CDN->>CDN: Validate cookies
    alt Cache Hit
        CDN-->>Client: Return cached playlist
    else Cache Miss
        CDN->>S3: Fetch playlist
        S3-->>CDN: Return playlist
        CDN->>CDN: Cache playlist
        CDN-->>Client: Return playlist
    end
    
    Client->>CDN: Request variant playlist (e.g., audio_128k.m3u8)
    CDN-->>Client: Return variant playlist
    
    loop Streaming Segments
        Client->>CDN: Request segment_001.ts
        alt Cache Hit
            CDN-->>Client: Return cached segment
        else Cache Miss
            CDN->>S3: Fetch segment
            S3-->>CDN: Return segment
            CDN->>CDN: Cache segment (24h TTL)
            CDN-->>Client: Return segment
        end
        Note over Client: Player buffers and plays
    end

Steps:

User presses play button in client application
Client sends GET request for song metadata (/songs/{id}) to API server via load balancer
API server validates JWT token to verify identity and subscription level (free vs. premium)
Server queries SQL database for song metadata (title, artist, duration, file path)
Server sets CloudFront signed cookies (grants access to song’s HLS path)
Server returns song metadata and cookie information to client
Client’s media player requests HLS playlist from CDN (with cookies)
CDN validates cookies and serves playlist (from cache or fetches from S3)
Client requests segments sequentially as needed
CDN serves segments (cached or fetches from S3, then caches for future requests)

Data Storage Architecture

This proposal advocates for a dual-database strategy, a critical design decision that optimizes performance, cost, and scalability. By separating transactional metadata from large, immutable media files, we use the best storage technology for each data type.

Metadata Storage: Relational Database (SQL)

A relational database is ideal for managing metadata due to strong support for transactional integrity (ACID properties), complex queries, and native ability to handle joins between entities.

Core Schema

users table

Column	Type	Description
`user_id`	PK	Unique user identifier
`email`	VARCHAR	User’s email address
`password_hash`	VARCHAR	Hashed user password
`subscription_type`	ENUM	e.g., ‘free’, ‘premium’
`created_at`	TIMESTAMP	Account creation timestamp

artists table

Column	Type	Description
`artist_id`	PK	Unique artist identifier
`name`	VARCHAR	Artist’s name
`country`	VARCHAR	Artist’s country of origin

songs table

Column	Type	Description
`song_id`	PK	Unique song identifier
`title`	VARCHAR	Song title
`duration`	INT	Duration in seconds
`file_url`	VARCHAR	Path to HLS playlist in object storage
`genre`	VARCHAR	Song genre

artist_songs (Join Table)

Column	Type	Description
`artist_id`	FK (artists)	Composite primary key
`song_id`	FK (songs)	Composite primary key

playlists table

Column	Type	Description
`playlist_id`	PK	Unique playlist identifier
`owner_id`	FK (users)	Foreign key to playlist owner
`name`	VARCHAR	Playlist name
`created_at`	TIMESTAMP	Creation timestamp

playlist_items (Join Table)

Column	Type	Description
`playlist_id`	FK (playlists)	Composite primary key
`song_id`	FK (songs)	Composite primary key
`position`	INT	Order of song in playlist

user_likes table

Column	Type	Description
`user_id`	FK (users)	Composite primary key
`song_id`	FK (songs)	Composite primary key
`liked_at`	TIMESTAMP	When song was liked

Audio File Storage: Object Storage

An object storage solution like AWS S3 is recommended for audio files. Its suitability stems from:

Virtually unlimited scalability
High durability (99.999999999% durability by design)
Cost-effectiveness for storing large, immutable files

File organization structure:

s3://music-streaming-bucket/
  artists/
    {artist_id}/
      albums/
        {album_id}/
          {song_id}/
            master.m3u8
            audio_64k.m3u8
            audio_128k.m3u8
            audio_320k.m3u8
            segments/
              segment_001.ts
              segment_002.ts
              ...

This hierarchical structure provides clarity and easy management while keeping related files together.

API Endpoint Design

The RESTful API serves as the formal contract between client applications and backend services. The following endpoints are designed to be intuitive, resource-oriented, and follow standard web conventions.

Search and Discovery

GET /search?q={query}&type={song|artist}&limit={20}&offset={0}

Searches for content across different types with pagination support

GET /songs/trending?genre={genre}

Retrieves a list of trending songs, with optional genre filter

GET /artists/{id}/songs

Fetches a paginated list of all songs by a specific artist

Content Access & Playback

GET /songs/{id}

Retrieves specific song’s metadata
Returns CloudFront signed cookies for secure streaming
Response includes: title, artist, duration, album art, streaming URL

Playlist Management

POST /playlists

Creates a new playlist for the authenticated user
Body: { name: "My Playlist" }

PUT /playlists/{id}/songs

Adds one or more songs to a specified playlist
Body: { song_ids: [1, 2, 3] }

DELETE /playlists/{id}/songs/{song_id}

Removes a specific song from a user’s playlist

GET /playlists/{id}

Retrieves playlist details and all songs in the playlist

User Management

GET /users/me/playlists

Fetches all playlists created by the currently authenticated user

POST /songs/{id}/like

Adds a song to the current user’s list of liked songs

DELETE /songs/{id}/like

Removes a song from the user’s liked songs

POST /artists/{id}/follow

Allows the current user to follow an artist

DELETE /artists/{id}/follow

Unfollows an artist

Artist Endpoints

POST /songs

Uploads a new song (artist only)
Body: multipart form data with audio file and metadata

GET /artists/me/songs

Retrieves all songs uploaded by the authenticated artist

Scalability and Availability Strategy

To meet critical non-functional requirements of scalability and high availability, the architecture incorporates several strategies designed to ensure reliable performance as the user base grows.

Stateless API Servers

API servers do not store any user session state locally. This fundamental design choice allows for seamless horizontal scaling. New server instances can be added behind the load balancer during traffic spikes and removed during quiet periods, with no impact on users.

Intelligent Load Balancing

While simple Round Robin load balancing is a valid start, a production-grade streaming service requires a more sophisticated approach. The load balancer should monitor:

CPU utilization
Network bandwidth
Number of active streams per API server

This prevents routing new stream requests to a server that has available CPU but has saturated its network interface—a common bottleneck in media-heavy applications.

Database Read Replicas

The architecture employs a leader-follower (master-slave) database configuration:

graph TB
    subgraph Write Operations
        API1[API Server] --> PRIMARY[(PostgreSQL<br/>Primary/Leader)]
    end
    
    subgraph Read Operations
        API2[API Server] --> REPLICA1[(Read Replica 1)]
        API3[API Server] --> REPLICA2[(Read Replica 2)]
        API4[API Server] --> REPLICA3[(Read Replica 3)]
    end
    
    PRIMARY -.replication.-> REPLICA1
    PRIMARY -.replication.-> REPLICA2
    PRIMARY -.replication.-> REPLICA3
    
    style PRIMARY fill:#FF6B6B
    style REPLICA1 fill:#90EE90
    style REPLICA2 fill:#90EE90
    style REPLICA3 fill:#90EE90

Write operations (creating playlists, uploading songs) → directed to leader
Read operations (fetching song metadata, user profiles) → distributed across read-only follower replicas

This pattern dramatically increases the system’s overall read capacity since reads are far more frequent than writes.

Database Sharding

To achieve near-limitless write scalability for a massive global user base, the metadata database can be sharded. This involves partitioning data horizontally across multiple independent database instances.

Sharding strategies:

By user geography: North American users → Shard 1, European users → Shard 2, etc.
By user_id hash: Consistent hashing distributes users evenly across shards
By artist_id ranges: For artist-related queries

Global Index: A separate sharded table maps secondary keys (like user_id) to the primary shard key, enabling efficient queries like “find all playlists for user X” across shards.

Multi-Layer Caching

An aggressive, multi-tiered caching strategy is essential for both performance and cost-efficiency:

graph TB
    A[User Request] --> B{Client Cache?}
    B -->|Hit| C[Return from Device]
    B -->|Miss| D{CDN Cache?}
    D -->|Hit| E[Return from Edge]
    D -->|Miss| F{Redis Cache?}
    F -->|Hit| G[Return from Redis]
    F -->|Miss| H[Query Database]
    H --> I[Cache in Redis]
    I --> J[Return to User]
    
    style C fill:#FFD700
    style E fill:#87CEEB
    style G fill:#90EE90

Client-Side Cache: Mobile and web applications cache recently and frequently played songs locally on the user’s device. Provides instantaneous playback for repeat plays and works offline.
Content Delivery Network (CDN): Global CDN (AWS CloudFront) serves as the primary edge cache. Stores copies of popular songs in locations geographically close to users. Uses LRU (Least Recently Used) eviction policy to keep cache relevant.
Service-Level Cache: In-memory cache (Redis, Memcached) at the service layer stores “hot” data:
- Frequently accessed song metadata
- User session data
- Trending song lists
- Search query results

This reduces redundant database queries and protects the backend from overload.

Geo-Replicated Object Storage

For a global service, object storage buckets containing audio files are replicated across different geographic regions. This allows:

Users to upload to and stream from a data center physically closer to them
Reduced latency
Improved data durability and availability in the event of regional outages

Meeting Non-Functional Requirements

Let’s see how our architecture addresses the critical non-functional requirements:

Scalability ✅

Horizontal Scaling: All microservices are stateless—just add more instances behind a load balancer
Database Sharding: Partition data across multiple nodes to distribute load
CDN Edge Caching: Absorbs massive traffic at edge locations globally
Asynchronous Processing: Transcoding and heavy processing happen offline via worker queues

High Availability ✅

Service Redundancy: Multiple API server instances across different availability zones
Data Replication: PostgreSQL read replicas for automatic failover, S3 with 3× replication
CDN Redundancy: Multiple edge locations ensure content is always available
Health Checks: Load balancer routes traffic only to healthy instances

Low Latency ✅

CDN Edge Delivery: Content served from locations close to users (typically <50ms)
Client-Side Caching: Instant playback for cached songs
Database Read Replicas: Distribute read load for faster query response
Redis Caching: Sub-millisecond access to hot data

Eventual Consistency ✅

Search Index Updates: Newly uploaded songs appear in search results within minutes (acceptable trade-off)
Trending Lists: Updated periodically (e.g., every hour) rather than real-time
Playlist Updates: Immediate consistency for user’s own playlists, eventual for shared playlists

Complete System Architecture Diagram

Here’s the complete end-to-end architecture showing all components working together:

graph TB
    subgraph Clients
        WA[Web App]
        MA[Mobile App]
    end
    
    subgraph Entry Point
        LB[Load Balancer<br/>Health Checks, SSL Termination]
    end
    
    subgraph Application Layer
        API1[API Server 1]
        API2[API Server 2]
        API3[API Server N]
    end
    
    subgraph Caching Layer
        REDIS[(Redis<br/>Hot Data Cache)]
    end
    
    subgraph Data Layer
        PG_PRIMARY[(PostgreSQL<br/>Primary)]
        PG_REPLICA1[(Read Replica 1)]
        PG_REPLICA2[(Read Replica 2)]
        S3[(S3 Bucket<br/>Audio Files & HLS)]
    end
    
    subgraph CDN Layer
        CDN[CloudFront CDN<br/>Global Edge Network]
    end
    
    subgraph Processing
        QUEUE[Message Queue<br/>SQS/RabbitMQ]
        WORKER[Worker Services<br/>Transcoding, HLS Generation]
    end
    
    WA --> LB
    MA --> LB
    LB --> API1
    LB --> API2
    LB --> API3
    
    API1 --> PG_PRIMARY
    API2 --> PG_REPLICA1
    API3 --> PG_REPLICA2
    
    API1 --> REDIS
    API2 --> REDIS
    API3 --> REDIS
    
    PG_PRIMARY -.replication.-> PG_REPLICA1
    PG_PRIMARY -.replication.-> PG_REPLICA2
    
    WA --> CDN
    MA --> CDN
    CDN --> S3
    
    API1 --> QUEUE
    QUEUE --> WORKER
    WORKER --> S3
    
    style CDN fill:#FFD700,stroke:#333,stroke-width:3px
    style S3 fill:#87CEEB,stroke:#333,stroke-width:2px
    style PG_PRIMARY fill:#FF6B6B,stroke:#333,stroke-width:2px
    style REDIS fill:#90EE90,stroke:#333,stroke-width:2px

Wrapping Up

Building a music streaming service at scale is a fascinating challenge that requires careful architectural decisions. By leveraging:

Stateless API servers for independent horizontal scaling
Dual-database strategy (SQL for metadata, object storage for media)
CDN with HLS streaming for low-latency global delivery
Multi-layer caching (client, CDN, Redis) for performance
Database read replicas and sharding for scalability
Asynchronous processing for heavy workloads

We can build a system that handles millions of users, serves millions of songs, and delivers low-latency playback globally—all while maintaining cost efficiency and high availability.

The architecture we’ve outlined provides a robust foundation not just for the immediate requirements, but also for future feature expansion and long-term business growth.

If you’re preparing for system design interviews or working on building scalable systems, I hope this deep dive has been helpful. The principles here—stateless services, polyglot persistence, CDN caching, and strategic technology choices—apply to many large-scale distributed systems beyond just music streaming.

Feel free to share your thoughts or questions in the comments!