Building Spotify-Style Search with Algolia in Go | Haris Ali

Table of Contents

Eight years ago, I built Lavafoshi - a local music streaming app for the Maldives. Fresh out of college with just one year of professional experience, I was eager to create something meaningful. Looking back now, as I prepare to release a major update, I realize how much I’ve learned about building software.

The original search was built with what every junior developer reaches for: SQL LIKE queries with manual relevance scoring. It worked for a few hundred songs, but as our catalog grew to tens of thousands of tracks, the cracks began to show. Response times crept up, relevance suffered, and users started complaining. It was time for a complete overhaul.

This is the story of how I migrated from SQL-based search to Algolia, achieving sub-50ms response times while building a Spotify-quality search experience. I’ll share the architectural decisions, technical challenges, and hard-won lessons that might save you from the pitfalls I encountered.

The breaking point: When SQL isn’t enough

Before diving into the solution, let me paint a picture of what I was dealing with. Our original search looked something like this:

SELECT * FROM songs 
WHERE name LIKE '%user_query%' 
ORDER BY monthly_listeners DESC
LIMIT 50;

Simple, right? And it worked—until it didn’t. With over 10,000 songs, basic LIKE queries were taking more time than I was comfortable with. Users expect search to be instant, especially when they’re used to Spotify’s lightning-fast responses.

The real challenge wasn’t just speed, though. We needed a search experience that could:

Handle typos gracefully (“Naashid” → “Nashid”)
Search across multiple content types simultaneously (songs, artists, albums, playlists)
Provide intelligent ranking based on popularity and relevance
Support real-time suggestions and faceted filtering
Scale to handle concurrent users without breaking a sweat

SQL simply wasn’t designed for this kind of search complexity.

Choosing the right tool: Meilisearch vs. Algolia

I evaluated two main contenders: Meilisearch and Algolia. Having worked with Meilisearch before, I was comfortable with its capabilities and had experience self-hosting it. But here’s the thing about side projects: you want to focus on building, not managing infrastructure.

Algolia’s cloud pricing caught my attention. Their generous free tier and pay-as-you-go model aligned perfectly with Lavafoshi’s usage patterns. More importantly, their AI features like personalized recommendations opened doors for future enhancements I was excited to explore.

Architecture deep dive: The unified index approach

The first major architectural decision was how to structure our indices. Algolia gives you two main options:

Option 1: Separate indices for each content type

songs_index, albums_index, artists_index, playlists_index

Option 2: Unified index with type discrimination

music_search_index (with type: song|album|artist|playlist)

I chose the unified approach, and here’s why this decision proved crucial:

User experience comes first

When users search for “Ali Rameez”, they don’t want to see just songs or just albums, they want everything. A unified index delivers mixed results in a single, fast query, exactly like Spotify does.

Simplified ranking logic

With separate indices, you’d need to maintain different ranking rules for each content type, then somehow merge and re-rank results on your backend. With a unified index, one set of ranking rules handles everything consistently.

Resource efficiency

One index means one set of settings to configure, one batch of data to sync, and one endpoint to optimize. As your search requirements evolve, you’re making changes in one place.

Here’s the foundational structure I built:

type SearchableItem struct {
    ObjectID         string `json:"objectID"`
    Type             string `json:"type"`  // "song", "album", "artist", "playlist"
    Name             string `json:"name"`
    ArtistName       string `json:"artist_name,omitempty"`
    AlbumName        string `json:"album_name,omitempty"`
    Duration         int32  `json:"duration,omitempty"`
    MonthlyListeners int32  `json:"monthly_listeners"`
    FollowersCount   int32  `json:"followers_count"`
    HasLyrics        bool   `json:"has_lyrics,omitempty"`
    ArtworkURL       string `json:"artwork_url"`
    Biography        string `json:"biography,omitempty"`
    ReleasedDate     string `json:"released_date,omitempty"`
}

The Power of Denormalization

Notice how I’m storing artist_name and album_name directly in each song record, rather than just foreign key references. This is denormalization—trading storage space for query performance.

In a traditional database, this would be heresy. But search indices aren’t databases. They’re optimized for read performance, and the cost of a few extra megabytes is negligible compared to the performance gains.

Service Architecture: Keeping It Clean

Since Lavafoshi already had a solid service-oriented architecture, integrating Algolia needed to feel natural. I organized the code like this:

├── internal/services/algolia.go    # Core Algolia service
├── internal/handlers/search.go     # HTTP API handlers  
├── internal/repository/            # Existing data access layer
└── cmd/indexer/                   # CLI indexing tool

The key insight here is separation of concerns. The Algolia service handles search index operations, the handlers manage HTTP requests, and the CLI tools handle bulk operations. Each component has a single responsibility, making the codebase easier to maintain and test.

Content-Specific Modeling

Each content type gets its own optimized structure. Here’s how I modeled songs:

type SearchableSong struct {
    ObjectID         string `json:"objectID"`
    ID               uint64 `json:"id"`
    Name             string `json:"name"`
    ArtistName       string `json:"artist_name"`
    AlbumName        string `json:"album_name"`
    Duration         int32  `json:"duration"`
    MonthlyListeners int32  `json:"monthly_listeners"`
    HasLyrics        bool   `json:"has_lyrics"`
    ArtworkURL       string `json:"artwork_url"`
    Type             string `json:"type"`  // Always "song"
}

The Type field is crucial—it enables faceted search, allowing users to filter results by content type. The ObjectID follows Algolia’s convention of being unique across all records in the index.

Search Configuration: The Art of Relevance

Here’s where things get interesting. Algolia’s ranking algorithm is highly configurable, but with great power comes the need for thoughtful decisions. After analyzing how users interact with music search, I developed this ranking strategy:

settings := search.Settings{
    SearchableAttributes: opt.SearchableAttributes(
        "name", "artist_name", "album_name", "biography",
    ),
    AttributesForFaceting: opt.AttributesForFaceting(
        "type", "has_lyrics", "released_date",
    ),
    Ranking: opt.Ranking(
        "desc(monthly_listeners)",  // Popularity first
        "desc(followers_count)",    // Then social proof
        "typo",                     // Typo tolerance
        "geo", "words", "filters",  // Algolia defaults
        "proximity", "attribute", "exact", "custom",
    ),
    CustomRanking: opt.CustomRanking(
        "desc(monthly_listeners)", "desc(followers_count)",
    ),
}

Why Popularity First?

In music search, popularity often correlates with relevance. When someone searches for “Shape”, they probably want Ed Sheeran’s “Shape of You”, not an obscure jazz track with “shape” in the lyrics. By putting monthly_listeners first in the ranking, popular tracks naturally bubble to the top.

This doesn’t mean unpopular music gets buried—Algolia’s algorithm balances all ranking factors. It just means that when relevance is equal, popularity acts as the tiebreaker.

The importance of faceting

Faceting allows users to filter results dynamically. By making type, has_lyrics, and released_date facetable, users can refine their search to show only songs with lyrics or only albums from the 2000s. This dramatically improves the search experience for power users while keeping it simple for casual browsers.

Building the CLI Indexer: Batch Processing Done Right

Initial data indexing and ongoing maintenance required a robust CLI tool. Here’s how I approached batch processing at scale:

go run ./cmd/indexer -type=all -batch-size=1000

The Batch Processing Strategy

func syncSongs(ctx context.Context, repo repository.SongRepository, 
               algoliaService *services.AlgoliaService, batchSize int) error {
    offset := 0
    totalSynced := 0

    for {
        songs, err := repo.GetAll(offset, batchSize)
        if err != nil {
            return fmt.Errorf("failed to get songs at offset %d: %w", offset, err)
        }

        if len(songs) == 0 {
            break // End of data
        }

        // Transform and index the batch
        if err := algoliaService.IndexSongs(ctx, songs); err != nil {
            return fmt.Errorf("failed to index batch at offset %d: %w", offset, err)
        }

        totalSynced += len(songs)
        log.Printf("Synced %d songs (total: %d)", len(songs), totalSynced)
        offset += batchSize
        
        // Early termination for smaller final batch
        if len(songs) < batchSize {
            break
        }
    }
    
    log.Printf("Successfully synced %d songs total", totalSynced)
    return nil
}

Lessons Learned About Batch Sizes

Getting batch sizes right took some experimentation:

Too small (< 100 records): Network overhead dominates, indexing becomes glacially slow
Too large (> 5000 records): Memory usage spikes, potential timeout issues
Sweet spot (1000-2000 records): Good balance of throughput and resource usage

The key insight is that batch size isn’t just about performance, it’s about reliability. Smaller batches mean that if something goes wrong, you lose less work and can resume more easily.

API Handler: The Great Migration

The biggest architectural shift happened in our search handler. The transformation from multiple SQL queries to a single Algolia call was dramatic:

Before: The SQL Approach

func (h *SearchHandler) Search(c *gin.Context) {
    // Multiple database hits with manual result merging
    songs := h.searchSongs(keyword)      // ~150ms
    artists := h.searchArtists(keyword)  // ~100ms
    albums := h.searchAlbums(keyword)    // ~120ms
    playlists := h.searchPlaylists(keyword) // ~80ms
    
    // Manual relevance scoring and merging
    results := h.combineAndRankResults(songs, artists, albums, playlists)
    
    // Total: ~450ms + processing time
}

After: The Algolia Approach

func (h *SearchHandler) Search(c *gin.Context) {
    searchOptions := services.SearchOptions{
        Query:       keyword,
        Page:        page,
        HitsPerPage: hitsPerPage,
        Facets:      []string{"type"},
        Filters:     typeFilter, // Optional: "type:song"
    }
    
    results, err := h.algoliaService.Search(ctx, searchOptions)
    // Total: ~25ms
    
    if err != nil {
        c.JSON(500, gin.H{"error": "Search failed"})
        return
    }
    
    c.JSON(200, results)
}

The difference is stark: from four database queries plus processing time to a single API call. But the real magic is in what Algolia handles automatically—typo tolerance, relevance scoring, faceted search, and pagination all work out of the box.

Response Format: Unified Yet Flexible

I designed a response format that preserves type information while enabling unified results:

type SearchItem struct {
    Type string      `json:"type"`
    Data interface{} `json:"data"`
}

type SearchResponse struct {
    Results     []SearchItem              `json:"results"`
    Page        int                       `json:"page"`
    HitsPerPage int                       `json:"hits_per_page"`
    TotalHits   int                       `json:"total_hits"`
    TotalPages  int                       `json:"total_pages"`
    Facets      map[string]map[string]int `json:"facets,omitempty"`
}

This structure gives frontend developers the flexibility they need. They can render all results in a unified list while applying type-specific styling and interactions based on the type field.

Technical Challenges: The Devil in the Details

Challenge 1: Go’s Type System vs. JSON Flexibility

Algolia returns search results as map[string]interface{}, which doesn’t play nicely with Go’s static type system. Here’s how I handled the conversion:

func (h *SearchHandler) convertHitToSong(hit map[string]interface{}) models.Song {
    // JSON numbers come back as float64, need safe conversion
    id, ok := hit["id"].(float64)
    if !ok {
        log.Printf("Warning: invalid ID for song hit: %v", hit["id"])
        id = 0
    }
    
    duration, _ := hit["duration"].(float64)
    monthlyListeners, _ := hit["monthly_listeners"].(float64)
    
    name, _ := hit["name"].(string)
    artistName, _ := hit["artist_name"].(string)
    
    return models.Song{
        ID:               uint64(id),
        Name:             name,
        ArtistName:       artistName,
        Duration:         int32(duration),
        MonthlyListeners: int32(monthlyListeners),
        ArtworkURL:       hit["artwork_url"].(string),
    }
}

The key lesson here: always use type assertions defensively. JSON unmarshaling in Go can be unpredictable, especially when dealing with numbers and optional fields.

Challenge 2: Repository Pattern Integration

Our existing repositories used different patterns for data access. For indexing to work smoothly, I needed consistency:

type SongRepository interface {
    GetByID(id uint64) (*models.Song, error)
    GetAll(offset, limit int) ([]models.Song, error)  // Added for indexing
    GetRecentlyUpdated(since time.Time) ([]models.Song, error)  // For incremental sync
    // ... existing methods
}

Adding these methods to all repository interfaces created a consistent pattern for bulk operations, making the indexer implementation much cleaner.

Performance Results: The Numbers Don’t Lie

The migration to Algolia delivered transformational performance improvements:

Metric	Before (SQL)	After (Algolia)	Improvement
Average Response Time	200-500ms	15-50ms	10x faster
P95 Response Time	1200ms	80ms	15x faster
Search Features	Basic LIKE matching	Typo tolerance, synonyms, faceting	Immeasurable

Operational Considerations: Keeping It Running

The Indexing Strategy Evolution

I started with a simple approach but learned to be more sophisticated:

Phase 1: Full reindex everything

# Daily full reindex - simple but wasteful
0 2 * * * cd /app && ./indexer -type=all

Phase 2: Smart incremental updates

# Hourly incremental sync for new/updated content
0 * * * * cd /app && ./indexer -type=all -since=1h

Phase 3: Near real-time (planned) Event-driven updates via webhooks when content changes.

The lesson here is to start simple and evolve. A daily full reindex might seem inefficient, but it’s reliable and easy to reason about. Once you understand your data patterns and growth rate, you can optimize for efficiency.

Monitoring and Alerting

Search infrastructure is critical infrastructure. I monitor for:

Index freshness (when was the last successful sync?)
Search response times (are queries getting slower?)
Error rates (are searches failing?)
Index size growth (am I approaching plan limits?)

// Simple health check endpoint
func (h *SearchHandler) HealthCheck(c *gin.Context) {
    stats, err := h.algoliaService.GetIndexStats()
    if err != nil {
        c.JSON(500, gin.H{"status": "unhealthy", "error": err.Error()})
        return
    }
    
    c.JSON(200, gin.H{
        "status": "healthy",
        "index_size": stats.RecordCount,
        "last_updated": stats.LastUpdated,
    })
}

Lessons learned: What I wish I’d known

The Do’s

✅ Start with a unified index: The complexity of managing multiple indices isn’t worth the theoretical benefits for most use cases.

✅ Invest heavily in data modeling: Spend time thinking about what fields to index, how to structure your data, and what attributes to make searchable. This is where 80% of your search quality comes from.

✅ Build comprehensive CLI tools early: You’ll need them more than you think for debugging, maintenance, and data migrations.

✅ Use typed converters religiously: Go’s type system is your friend, even when dealing with interface{} from JSON APIs.

✅ Plan for growth from day one: Design your indexing strategy to handle 10x your current data volume.

The Don’ts

❌ Don’t guess at batch sizes: Profile your indexing process and find the sweet spot through measurement, not intuition.

❌ Don’t index sensitive data: Algolia isn’t encrypted at rest by default. Keep user passwords, personal information, and sensitive metadata out of your search index.

❌ Don’t ignore operational costs: Search queries cost money. Implement reasonable rate limiting and monitoring to avoid surprise bills.

❌ Don’t forget about data freshness: Users notice when search results are stale. Plan your sync strategy carefully and monitor it religiously.

❌ Don’t optimize prematurely: Start with simple approaches (daily full reindex) and optimize based on real usage patterns, not theoretical concerns.

Looking forward: The next chapter

With the core search infrastructure solid, I’m excited about what comes next for Lavafoshi:

Real-time indexing

Moving from batch updates to event-driven, real-time indexing. When an artist releases a new song, it should be searchable within seconds, not hours.

Personalization at Scale

Algolia’s AI features open up possibilities for personalized search results based on listening history, geographical trends, and collaborative filtering.

Advanced Analytics

Understanding how users search provides insights into content gaps, trending artists, and feature opportunities. Search analytics can drive both product and content strategy.

Final thoughts: Beyond just speed

Migrating from SQL to Algolia was about more than just performance. It was about building a foundation for the future. The old search was a bottleneck that limited how users could discover content. The new search is an enabler that opens up new interaction patterns and user behaviors.

The technical implementation matters, but what matters more is understanding your users’ mental model of search. They expect it to “just work”—to find what they’re looking for regardless of typos, to surface popular content naturally, and to respond instantly.

Building that experience requires the right combination of technology, architecture, and operational discipline. Algolia provided the technology foundation, clean service architecture made integration seamless, and careful attention to operational details ensures it keeps working reliably.

If you’re building any kind of content discovery experience, don’t make the same mistake I did eight years ago. SQL LIKE queries might seem simple, but they’re a technical debt that compounds over time. Invest in proper search infrastructure early—your users (and your future self) will thank you.

The complete implementation, from CLI indexing tools to API handlers and search configuration, provides a solid blueprint for any developer looking to build world-class search functionality. Sometimes the best technical decisions are the ones that let you focus on building features instead of fighting infrastructure.

And that’s exactly what good search should be—invisible infrastructure that enables great user experiences.