spirent.com

blitz.io: Using Redis Transactions with CouchDB

By Spirent On January 17, 2012
Networks
Blitz, Cloud, CouchDB, DevOps, Redis, Scalability

At blitz.io, for a while there, we were only relying on CouchDB clusters as the primary NoSQL database with some in-memory caching. As we grow (rapidly) and scale out, there are aspects of what we collect and store that are transient and real-time. While CouchDB is awesome for the map/reduce, replication and incremental view indexes, the real-time queues (emails, counters, stats, etc) natural lend themselves to, yup, redis. We are in the process of rolling out geo-located redis instances as part of our global infrastructure.

Geo-located Infrastructure

I’ve mentioned this in the past that we use CouchDB multi-master replication between the US east/west. In addition there are numerous scale engines attached to these CouchDB clusters from all around the world (California, Virginia, Oregon, Japan, Singapore and Ireland).

Image

When you run a load/performance test from a given location, say, Singapore, we find the closest CouchDB cluster (California) and route the job there. Then the scale engines in Singapore try and acquire the job and then execute it. Very similar to resque, only global with replication. We’ve now added two sets of redis instances one on us-east and one on us-west on AWS. These two sets of redis instances also happen to be next to our CouchDB clusters.

Redis Transactions and Lazy CouchDB saves

Each time someone runs a load test, the overall queue size per-engine (and hence region) is updated. This gives us a sense for the overall pressure on the system so we can automatically spin up additional instances as necessary. In addition we also update site stats (scoreboard, number of hits etc) at the end of each test. We’ve now effectively moved the engine status information completely into redis with periodic RDB snapshots.

The scoreboard was interesting since we snapshot daily metrics into CouchDB to analyze overall trends and who did what how many times each day. So this information had to move into CouchDB, but we decided to do this lazily, thanks to redis transactions. Here’s the pseudo code:

def incr_site_stats engine, stats
   # Increment the site:stats counters (in redis). We use 'created_at'
   # to checkpoint the time we created these stats
   redis.multi do
      redis.hsetnx 'site:stats', 'created_at', Time.now.utc.to_string
      stats.each_pair do |k, v|
         redis.hincrby 'site:stats', k, v.to_i
     end
  end

     # Start the transaction by doing optimistic locking with watch
         redis.watch 'site:stats'
         site_stats = redis.hgetall 'site:stats'
         created_at = Time.parse_utc site_stats['created_at'] 
         elapsed = Time.now.utc.to_i - created_at.utc.to_i

    # Save the stats in CouchDB every 5 minutes
        if elapsed > 300 
        ok = redis.multi { redis.del 'site:stats' }

        # If the del operation fails, then there was a conflict and someone else
        # updated the site:stats hash. Otherwise, increment the counters in CouchDB
               if ok
                   db.update_doc 'stats' do |_doc|
                         site_stats.each_pair do |k, v|
                               next if k == 'created_at'
                               _doc[k] ||= 0
                              _doc[k] += v.to_i 
                     end
                _doc
             end 
          end 
       else 
    redis.unwatch 
   end
end

view raw gistfile1.rb This Gist brought to you by GitHub.

The watch/exec combination effectively allows us to do atomic operations. More importantly this allows us to regulate writes into CouchDB in a lazy way as this information is really not required instantly.

CouchDB Document Caching

Finally, we’ve also completely switched over to redis as our caching layer by simply monkey patching CouchRest like below:

class CouchRest::Database
      alias :old_save_doc :save_doc
      alias :old_delete_doc :delete_doc
      attr_writer :rcache

     # Automatically add the created_at and updated_at time fields to the
     # document and also save it back to redis with a ttl.
  def save_doc doc, bulk = false, batch = false
           doc['created_at'] ||= Time.now
           doc['updated_at'] = Time.now
           res = old_save_doc doc, bulk, batch
           rcache.set doc['_id'], doc
         return res
  end

        # Purge the key from redis after a successful delete
     def delete_doc doc
         res = old_delete_doc doc
         rcache.del doc['_id']
         return res
   end

       # Fetch from redis first and if not, go to CouchDB. 
    def cache_get id
        rcache.get id do
        self.get id rescue nil
     end
   end
end

view raw gistfile1.rb This Gist brought to you by GitHub.

This localizes the cache-aware code and was a pretty simple way to add caching across all our apps. This also means that all writes go to both redis and CouchDB and redis always has the recent document for a certain period of time. This almost completely eliminates reads directly from CouchDB unless there’s a cache miss. We are actively tracking cache misses using the INFO command to see how much benefit we are really getting.

We got some awesome things cooking in blitz.io for this year as we start tackling problems beyond just load testing and the Redis/CouchDB combination is going to take us far!

 
comments powered by Disqus
× Spirent.com uses cookies to enhance and streamline your experience. By continuing to browse our site, you are agreeing to the use of cookies.