CouchDB

Thursday Apr 24, 2014

CouchDB Weekly News, April 24

Weekly CouchDB meeting – summary

  • 1.6.0 release: voting for 1.6 rc3 is open; Erlang R15 and R16 are fine, but several unexpected issues for testing with Erlang R14BX were reported; any help with testing and diagnosing the issue is welcome
  • rcouch merge status: the recent call for testing led to progress on COUCHDB-1994.The Windows build and some other little bugs could be fixed. Still, help with testing is very welcome!
  • BigCouch merge status: large progress was made last week, next steps will be code review and testing
  • BigCouch branch and the new multirepo design: how to edit individual repos and split big CouchDB.git into single and finally run the merge

Major Discussions

Error when installing CouchDB on Windows 7 (see thread)

Installation of CouchDB 1.5.1_R16B02 binary on Windows 7 lead to an issue caused by a problem with the Windows R16B02 installation file. This file has been replaced quickly, everything works now. The installation file can be downloaded here.

Compaction of a database with the same number of documents is getting slower over time (see thread)

A database containing a relatively stable number of documents (in CouchDB: "doc_count"), the documents themselves change frequently (including many insertions and deletions (CouchDB: "doc_del_count")). The person asking expected the compaction time to stay the same, but numbers showed that it took longer to compact the database over time, thus they asked why this was the case, and how it could overcome.

The reply is that compaction time in CouchDB scales with the sum of "doc_count" and "doc_del_count". As explained by Adam Kocoloski, this can currently be controlled by "1) [Purging] the deleted docs (lots of caveats about replication, potential for view index resets, etc.); 2) [Rotating] over to a new database, either by querying both or by replicating non-deleted docs to the new one. Neither one is particularly palatable. CouchDB currently keeps the tombstones around forever so that replication can always work. Making changes on that front is a pretty subtle thing but maybe not completely impossible. Also, there's a new compactor in the works that is faster and generates smaller files."

Data replication: replicating only non-deleted documents (same discussion as above; see thread)

The approach could be to either run a filtered replication or block deleted documents with a validate_doc_update function on the target database (see example function here).

How to handle terabyte databases (see thread)

This was a request about improving performance, insert time, compaction and replication for terabyte databases with billions of documents (setup: CouchDB 1.2 and CouchDB 1.4). Some approaches to handle a write-heavy workload:

  • Replication: Supply parameters to allocate more resources to a given job (code example here).
  • Insert time: Ensure that e.g. compaction only runs in the background and does not impact the throughput of interactive operations; this is work in process at the moment.
  • Compaction: A new, significantly faster compactor that also generates smaller post-compaction files that also eliminates the exponential falloff in throughput observed by the inquirer is in the works. – This problem can be solved by BigCouch as it tries to partition databases.
  • Ensure to not exceed the write capacity of the RAID (this effect is amplified for partitioned databases).
Vote: Release Apache CouchDB 1.6.0 rc3 (ongoing testing and discussion; see thread)

We encourage the whole community to download and test these release artefacts so that any critical issues can be resolved before the release is made. Everyone is free to vote on this release, so get stuck in on our dev@couchdb.apache.org mailing list! Find all release artefacts we are voting on in this list. If you want to test, please follow this test procedure. The changes since last vote round can be found here.

Releases in the CouchDB Universe

Opinions

Use Cases, Questions and Answers

Get involved!

If you want to get into working on CouchDB:
  • rcouch Merge: Erlang hackers and CouchDB users, e need your help with testing and review of the rcouch merge. It's easy! Find the how-to in this post.
  • Here's a list of beginner tickets around our currently ongoing Fauxton-implementation. If you have any questions or need help, don't hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help.
  • You want join us for the updates of CouchDB-Python for Python 3? Take a look at issue 231.
We'd be happy to have you!

Events

  • April 26, Delhi, India: CouchDB meetup - Understanding how design documents work
  • June 16, 17, San Francisco, CA: CloudantCON

Job opportunities for people with CouchDB skills

… and also in the news

Posted on behalf of Lena Reinhard.

Thursday Apr 17, 2014

CouchDB Weekly News, April 17

Blog Posts

Weekly CouchDB meeting – summary

  • Bigcouch Merge: work on branches has started again and will make good progress
  • rcouch Merge: code reviews have been restarted; more help with testing is needed; discussion around Windows support, will be continued on Dev-Mailing list
  • Advocate Hub: onboarding at the moment, next steps will be sent to marketing mailing list soon
  • 1.6.0 release: release candidate 3 (rc3) is being prepared now
  • COUCHDB-1986: the issue is being investigated, will be updated soon if its status changes

Major Discussions

1.6.0 changelog review (see thread)

A discussion around what exactly should be included in 1.6.0 (see changelog and documentation for details).

Handling the Welcome Request (see thread)

How to handle the welcome message that appears from a root request and direct the "/" request to another URL (setup: version 1.5 on a Windows machine).

authentication_redirect is not working (see thread)

How to handle the redirection of users after login to other databases that service page requests (and avoid errors; setup: public application with CouchDB-based authentication).

Vote: Release Apache CouchDB 1.6.0 rc2 (see thread)

The vote was aborted because of an issue brought up by the test suite.

Modeling Relationships and providing Transactional Integrity (see thread; ongoing discussion)

… and the question if _bulk_docs is transactional (hint: it's not)

Releases in the CouchDB Universe

Opinions

Use Cases, Questions and Answers

Get involved!

If you want to get into working on CouchDB:
  • rcouch Merge: Erlang hackers and CouchDB users, we need your help with testing and review of the rcouch merge. It's easy! Find the how-to in this post.
  • Here's a list of beginner tickets around our currently ongoing Fauxton-implementation. If you have any questions or need help, don't hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help.
  • You want join us for the updates of CouchDB-Python for Python 3? Take a look at issue 231.
We'd be happy to have you!

Events

Job opportunities for people with CouchDB skills

… and also in the news

Posted on behalf of Lena Reinhard.

Wednesday Apr 16, 2014

Merging rcouch

rcouch is a variant of CouchDB that is derived from the main Apache CouchDB code base. One way to look at it is as a “future port” of CouchDB, that is for two reasons:

  1. The internals are greatly improved. In simple terms, if CouchDB would be started to day, it would look a lot like rcouch.
  2. It comes with a number of nice new features.

So whether you are a CouchDB user or a CouchDB Erlang developer, rcouch is good news for you!

Since CouchDB has been around for a while, the internal codebase is a little crufty. We are constantly working on improving that, but one of the larger efforts have been done within the rcouch project by Apache CouchDB committer Benoit Chesneau.

At the same time, Benoit added a number of nice little features that benefit CouchDB users, just to name a few:

  • _changes for views.
  • Replication from views.
  • Write-only “Drop-Box” databases.
  • Request validation on-read.
  • _db_changes (actually, we nicked that one for Apache CouchDB 1.4 already :).

For more information, see the rcouch website.

Benoit has generously donated the rcouch project back to Apache CouchDB and has worked on merging the rcouch and the CouchDB codebases. Now it is time to review the merged code and get it into Apache CouchDB proper and put it into a new release.

With that, we need your help.

How you can help, you ask? Easy!

If you are a CouchDB user, and are not afraid to build CouchDB yourself, give the rcouch merge branch a spin:

git clone https://github.com/apache/couchdb.git
cd couchdb
git checkout -b 1994-merge-rcouch origin/1994-merge-rcouch

# Read the DEVELOPERS file to install all dependencies

# test regular build:
make

# test build with static icu library
make icu=static

# test build with shared libraries
make libs=shared

# run test suite
make check

# make a release
make rel

# run:
./rel/apache-couchdb/bin/couchdb start

# now CouchDB is running at http://127.0.0.1:5984/_utils/, as usual, you just won’t see any output.


If you are an Erlang developer, we could use your help with code reviews. Any small chunks will help!

Please see the associated JIRA ticket for more information.

Thanks so much!


FaQ: how does this relate to BigCouch?

It doesn’t really, the BigCouch merge is ongoing and we will announce updates and requests for review when the time is right. Well, actually, if we can convince you to help with the review of the rcouch merge, we can put even more resources on the BigCouch merge :)

Thursday Apr 10, 2014

CouchDB Weekly News, 10 April

Releases

Apache CouchDB 1.5.1 released
This is a security release. Download link | Release notes

Blog Posts

Major Discussions

Writing many docs at once: bench marking a 10k write to a CouchDB server (see thread here)
A discussion around handling and measurements of bulk input in CouchDB.
Question about "complex" range queries (see thread here)
Requirements for sort orders of data views.
Is the revision field deterministic? (see thread here)
Quoting Robert Samuel Newson's reply: "Yes, it’s deterministic. The same document with the same history will have the same _rev value. This is an optimization over the previous algorithm where _rev was a random UUID. […] The advantage is that two servers receiving the same update can more optimally replicate. They still have to check that the target has all _id/_rev pairs but will usually be able to skip actually transmitting document and attachment content."

CouchDB Universe

  • Apache Con has taken place from April 7-9. You can find the slides of the talks given here

Releases in the CouchDB Universe

  • New Erlang Version is out
  • mbtiles2couchdb, for using CouchDB as a simple tiles server
  • RCouch has a new home: announcement | the new home
  • PPnet, a minimal social network to drop into websites based on PouchDB
  • sabisu, a sensu web UI, got open sourced by Cloudant last week
  • Janus, an online storage for offline Web Apps built using PouchDB
  • couche 0.0.2, a couchdb client for node, with specific apis
  • mock-couch, an http server pretending to be couchdb, for unit testing
  • couchdb-sync 0.1.1 - a generic couchdb replicator which is basically an eventemitter
  • hackoregon-couch 0.0.2 which exports database information from PostgreSQL and imports into CouchDB
  • overwatch 0.2.7, a deterministic couchdb replication watcher

Opinions

Use Cases, Questions and Answers

Get Involved!

If you want to get into working on CouchDB:
  • here's a list of beginner tickets around our currently ongoing Fauxton-implementation. If you have any questions or need help, don't hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help.
  • Your help on updating CouchDB-Python for Python 3. Wanna join? Please participate here.
We'd be happy to have you!

New PMC Member

  • Joan Touzet joins the Apache CouchDB Project Management Committee today. Joan has made outstanding, sustained contributions to the project. Welcome to the Couch, Joan!

Events

Job opportunities

… and also in the news

Posted on behalf of Lena Reinhard.

Wednesday Apr 09, 2014

Apache CouchDB 1.5.1 Released

Apache CouchDB 1.5.1 has been released and is available for download.

CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. Query, combine, and transform your documents with JavaScript. CouchDB works well with modern web and mobile apps. You can even serve web apps directly out of CouchDB. And you can distribute your data, or your apps, efficiently using CouchDB’s incremental replication. CouchDB supports master-master setups with automatic conflict detection.

Grab your copy here:

http://couchdb.apache.org/

Pre-built packages for Windows and OS X are available.

CouchDB 1.5.1 is a security release, and was originally published on 2014-04-09.

These release notes are based on the changelog.

Changes

  • Add the max_count option (UUIDs Configuration) to allow rate-limiting the amount of UUIDs that can be requested from the /_uuids handler in a single request.

Tuesday Apr 08, 2014

CouchDB and the Heartbleed SSL/TLS Vulnerability

You may or may not have heard about the Heartbleed SSL/TLS vulnerability yet. Without much exaggeration, this is a big one.

What does this mean for CouchDB?

1. If you are using CouchDB with the built-in SSL support, you are at the whim of Erlang/OTP’s handling of SSL. Lucky for you, while they do use OpenSSL for the heavy lifting, they do the TLS/SSL handshake logic in Erlang (Source). That means you are not affected by this issue.

2. If you are using CouchDB behind a third-party proxy server you are at the whim of the SSL library it uses. For the big three Apache, nginx and HAProxy it’s all OpenSSL. So if they are using OpenSSL 1.0.1-1.0.1f with heartbeat support (RFC6520) enabled (the default), you need to take action. As far as I can tell now:

  • 0. Check if you are vulnerable
  • 1. Stop your service.
  • 2. Upgrade to OpenSSL 1.0.1g or recompile OpenSSL without heartbeat support.
  • 3. Request new cert from your SSL cert vendor.
  • 4. Revoke your old cert.
  • 5. Invalidate all existing sessions by changing the CouchDB couchdb_httpd_auth/secret configuration value to a new UUID.
  • 6. Restart your service.
  • 7. Invalidate all your user’s passwords and/or OAuth tokens.
  • 8. Notify your users that any of their data and passwords are potentially compromised.

The Little Things(1): Do Not Delete

CouchDB takes data storage extremely seriously. This usually means we work hard to make sure that the CouchDB storage modules are as robust as we can make them. Sometimes though, we go all the way to the HTTP API to secure against accidental data loss, saving users from their mistakes, rather than dealing with hard drives and kernel caches that usually stand in the way of safe data storage.

The scenario:

To delete a document in CouchDB, you issue the following HTTP request:

DELETE /database/docid?rev=12345 HTTP/1.1

A common way to program this looks like this:

http.request('DELETE', db + '/' + docId + '?rev=' + docRev);

So far so innocent. Sometimes though, users came to us and complained that their whole database was deleted by that code.

Turns out the above code creates a request that deletes the whole database, if the docId variable isn’t set correctly. The request then looks like:

DELETE /database/?rev=12345 HTTP/1.1

It looks like an honest mistake, once you check the CouchDB log file, but good old CouchDB would just go ahead and delete the database, ignoring the ?rev= value.

We thought this is a good opportunity to help users not accidentally losing their data. So since late 2009 (yes, this is an oldie, but it came up in a recent discussion and we thought it is worth writing about :), CouchDB will not delete a database, if it sees that a ?rev= parameter is present and it looks like that this is just a malformed request, as database deletions have no business requiring a ?rev=.

One can make an easy argument that the code sample is fairly shoddy and we’d agree. But we are not here to argue how our users use our database beyond complying with the API and recommended use-cases. And if we can help them keep their data, that’s a win in our book

Continuing down this thought, we thought we could do one better. You know that to delete a document, you must pass the current rev value, like you see above. This is to ensure that we don’t delete the document accidentally without knowing that someone else may have added an update to it that we don’t actually want to delete. It’s CouchDB’s standard multi version currency control (MVCC) mechanism at work.

Databases don’t have revisions like documents, and deleting a database is a simple HTTP DELETE /database away. Databases, however, do have a sequence id, it’s the ID you get from the changes feed, it’s an number that starts at 0 when the database is created and increments by 1 each time a document is added, updated or deleted. Each state of the database has a single sequence ID associated with it.

Similar to a rev, we could require the latest sequence ID to delete a database, as in:

DELETE /database?seq_id=6789

And deny database deletes that don’t carry the latest seq_id. We think this is a decent idea, but unfortunately, this would break backwards compatibility with older versions of CouchDB and it would break a good amount of code in the field, so we are hesitant to add this feature. In addition, sequence IDs change a little when BigCouch finally gets merged, so we’d have to look at this again then.

In the meantime, we have the protection against simple coding errors and we are happy that our users keep their hard earned data more often now.

Thursday Apr 03, 2014

CouchDB Weekly News, April 3

Major Discussions

Vote on release of Apache CouchDB 1.5.1-rc.1 (will be released as Apache CouchDB 1.5.1 — see thread)

The vote passed.

Importing CSV data into a CouchDB document using Python-Cloudant module (discussion still open — see thread)

Approaches brought up: (1) Transforming a CSV file into a JSON file with Python; (2) rc_scv, a direct rcouch extension; (3) using Google Refine and Max Ogden's refine uploader (see visual example here); (4) CSV2Couch. Further information can also be found in this blog post on "Using Python with Cloudant" and the newer Cloudant-Python interface

CouchDB 1.6.0 proposals (see thread)

Discussion around the open blocker plus how to deal with it and around re-cutting 1.6.x from master. Releasing 1.5.1 will take precedence.

Poll around Erlang whitespace standards (see thread; the poll is still open)

Joan Touzet: "I know many of you are fed up with not being able to auto format in your favourite editor and match the CouchDB Erlang coding standards, or receiving pull requests that are formatted poorly. I'd like to fix that with an appropriate whitespace standard, and supplementary plugins for vi and Emacs that will just Do The Right Thing and let us all stop worrying about whitespace corrections in pull requests."

There's currently a poll around this topic which is still open.

Multiple Concurrent Instances of CouchDB on Mac OS X 10.7.5 (see thread)

Approaches: (1) for a powerful enough machine Vagrant could be used to spin up a few CouchDB VMs, e.g. with CouchDB-Vagrant. Other options could be: (2) using Docker, (3) Node-Multicouch (used in Hoodie) or (4) this script to configure isolated instances, it should be possible to point couchdb to the CouchDB commands inside the .app.

CouchDB Universe

Releases in the CouchDB Universe

  • Wilt 3.0.0 – a browser/server based CouchDB API library based on the SAG CouchDB library for PHP
  • contentful-2-couchdb – a proof of concept for easy data export to CouchDB and replication use
  • Availability of MariaDB 10 announced
  • PouchDB released a new website
  • PouchDB 2.1.0 release includes e.g. (all release notes here):
    • Support optional leveldown builds
    • Replication performance improvements
    • Fix for localStorage detection in Chrome Apps
    • Improved error reporting from replicator et al.

Opinions

Use Cases, Questions and Answers

Getting involved into CouchDB

If you want to get into working on CouchDB: here's a list of beginner tickets you can get started with. These are issues around our currently ongoing Fauxton-implementation. If you have any questions or need help, don't hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help. We'd appreciate having you!

New Committers and PMC Members

… and also in the news

Posted on behalf of Lena Reinhard.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation