After reading Eric Florenzano's pair of posts Why CouchDB Sucks and Why CouchDB Rocks and being generally sick of not understanding why folks were excited about this system, I put my spelunkin boots on to find out more about CouchDB .
Before I set about trying to understand it, I didn't understand why folks were excited about CouchDB given that a good number of its features (append-only storage and "schemaless design" in particular) have been present in ZODB for a little under ten years now. Even more in particular, I was really baffled as to why Python developers were excited about such a system given the availability of ZODB.
I think I understand a bit better now. ZODB and CouchDB are quite similar in a lot of respects, but CouchDB beats ZODB on a narrow set of goals that seem to be becoming more important these days.
First of all, availability is everything. Since the documented way to talk to CouchDB is over HTTP, and because you send it JSON primitives containing data structures, it can be used from just about any program written in just about any language. And given that most folks are not accustomed to being able to access any database without dark-magic C bindings that speak strange connection-oriented TCP protocols (or the equivalent embedded usually-crashy C bindings and awkward APIs), this is probably a novelty for many users. For Zope users, however, it's not really that nifty, as we've been writing applications that use HTTP to modify ZODB structures for quite a long time now.
Second of all, Damien Katz, CouchDB's author used to work on Lotus Notes. Though I've never developed under (or even used) Lotus Notes, I do know that it is widely respected for its replication facilities. CouchDB has offline replay replication that I imagine smells a lot like Notes' built-in facilities for the same. This is a big deal if you're creating offline applications that need to synchronize to one or more other databases. ZODB itself has no such facility.
Third of all (and probably most importantly), CouchDB has a built-in indexing and querying facility, in the form of Views. This is something that ZODB does not share. Instead, ZODB relies on applications that are built on top of it to provide indexing capabilities. Moreover, creating indexes in applications that use ZODB is historically a static kind of thing that the application developer does "up front", or at least as a "software release" sort of thing. In CouchDB, creating an index is not really an exceptional sort of event. You tell the server, over HTTP, to create the index by PUT-ing a view. The first time any view is queried, CouchDB does the indexing. The following times the view is queried, it uses the index you've created via the view to return the results faster. No application that I know of written on top of ZODB allows you to do such a thing so casually.
There is a lot of Interweb talk about the stuff that Erlang provides to CouchDB "for free", usually discussed in terms of stability and "crash-only" design. I don't know anything about this topic, so I can't comment. But CouchDB views are implemented in terms of "map/reduce", which seems to imply that it will be possible to create a farm of servers, each which can operate on only a slice of the entire data structure in order to return a result. However, I'm not sure that this feature is actually implemented in any release of CouchDB (I think it's just implied by its design).
All in all, CouchDB is a neat piece of software. If I ever have to build an application that needs to store data that needs to be accessible from programs written in langauges other than Python across HTTP and I don't need to use a relational database, it seems like a great solution.
ZODB is probably still a better choice if your application is 100% Python and you want arbitrary application database write logic to be bounded within a single transaction. CouchDB's transaction semantics default to one-request-one-transaction, although they do have some batching facilities. Likewise, if you need to store large amounts of data (like arbitrary numbers of multimegabyte or multigigabyte files), ZODB is also probably a better choice, as it has blobs built in, and the blobs don't need to be base64 encoded in memory as "attachments" in order to transmit over a wire protocol. Likewise, if you need to store data structures that cannot be represented as JSON (like complex object instances), ZODB really can't be beat.
REVISION: Jan Lehnardt from the Apache projects sent this via email wrt to blobs
Thanks for the excellent overview. You certainly understand CouchDB and this helps a lot removing FUD. Good work, thanks for taking the time.There's one addition I'd like to make: CouchDB no longer requires encoding binary data in base64. Attachments for documents (that work just like email attachments) have their own REST API since this summer where you can send the raw binary data to create and retrieve documents. So storing loads of data is no longer a bad idea.
It's sort of obvious that most of what makes up CouchDB (besides the currently unquantifiable-by-me benefits of it being written in Erlang, anyway) could really be done in a reasonably straightforward ZODB web application. I actually started such an animal (just to learn, not to use, I doubt I will play with it much more) called loveseat . The hardest thing to get right would be the indexing and the replication, of course. It already does database creation and document creation and retrieval, but not views or replication. That would probably take several months to get right, and I only had today. ;-)
I'll also note that ZODB has pretty terrible marketing compared to CouchDB. So I suppose I should try to do some. For you Python developers that don't know about ZODB, and whom are excited about CouchDB, you might check out ZODB. You can think of a ZODB database as a place to hang a graph of arbitrary Python objects that becomes persistent. It's sort of an "uber-pickle"; it actually makes heavy use of the pickle module under the hood, but it breaks the object graph up into separate pickles, and so can be used for high volume applications. Changes take place transactionally. Multiple clients can make use of the same ZODB database over a protocol called "ZEO" (Zope Enterprise Objects), which isn't nearly as scary as it sounds; basically, you just set up a ZEO server and point the clients at it. You can use packages like repoze.catalog to do indexing and querying of object data that is inserted into the graph. You can use packages like repoze.folder to hold large collections of objects. It's fast, and other than a few C extensions, completely written in Python. It's been around, like I said for about ten years, and it's in production in tens (hundreds?) of thousands of Zope deployments today. The "Zope" in ZODB is a "brand-only" name; it does not require Zope; it can be used in any Python application. It has limited deployment outside a Zope context, but I think this is mostly cultural: I personally use it in non-Zope applications frequently. It works on all major platforms. A good place to start would be to do:
easy_install ZODB3
Then maybe take a gander at this tutorial . And for a more verbose look, try this howto .