Write-behind Doc updates

Type: Documentation
Status: Closed
Priority: 1 Critical
Resolution: Fixed
Component/s:
Labels:

Assignee:
Reporter: alexsnaps
Created: January 12, 2011
Votes: 0
Watchers: 0
Updated:
July 27, 2012
Resolved:
February 17, 2011

Description

We should clarify docs on how write-behind shutdown happens depending on whether clustered or not… Also need to add doc about the new concurrency & bounded queue settings

Comments

Gautam Jayaprakash 2011-01-12

Checked the “Documentation Required” field.

Fiona OShea 2011-01-17

Please checkin with Igal to make sure he has the information needed.

Alexander Snaps 2011-01-31

I mailed you these details below. Assigning to you if you don’t mind, assign back or mail me if you miss information. We can have a quick chat at some point as well if that is easier.

Queue-size monitoring

Users can now monitor the amount of elements on the write-behind queue. The size is available through the net.sf.ehcache.statistics.LiveCacheStatistics#getWriterQueueLength method on all statistics (which will return -1 if no write-behind registered). The method exposes the size of the local queue, for all buckets (see #3). Should a batch (or an Element) be currently processed, these will not be reflected in the exposed figure. The method does not reflect any coalescing (if enabled). Indeed, coalescing is only done at the processed batch level.

Max queue size

A writer’s queue size can now be bound. Using the new net.sf.ehcache.config.CacheWriterConfiguration#setWriteBehindMaxQueueSize or the writeBehindMaxQueueSize attribute on cacheWriter element, a user can limit the size of the queue of write-behind. This limit applies per Bucket (see #3), as do the other settings for a writer. Default value is 0, which is unbound (same behavior as before). The limit applies against the same “meaning” of queue-size as in #1. When adding a new element to the queue (putWith or removeWithWriter methods on Cache), the size is checked. When the limit is reached the corresponding Cache operation will block until the queue size decreases by one…

Concurrency

Using net.sf.ehcache.config.CacheWriterConfiguration#setWriteBehindConcurrency or the writeBehindConcurrency attribute on the cacheWriter element, users can configure the amount of thread/bucket pair that write-behind will use to write. Default value is 1 (same behavior as before). When a cache operation occurs, the writer will dispatch the element to a bucket based on the key’s hash code. That way a same key always goes to the same bucket/thread pair. Each bucket has its own writer thread that will all use the same settings: minWriteDelay, maxWriteDelay, rateLimitPerSecond, writeCoalescing, writeBatching, writeBatchSize, retryAttempts, retryAttemptDelaySeconds… It is important here to understand that configuration involving size or rate will apply equally to all bucket/thread pairs. So that a rate limit of 100 operations per second, with a concurrency setting of 4 means that 4 threads will write at a maximum rate of 100 operations per second (meaning a 400 operations). These settings were already per node, so that in a clustered setup with 10 nodes, you effectively hit the store at a maximum rate of 4,000 operations per second.

No more stealing (Clustered only)

Stealing from one queue to another (whether local or across nodes) is now disabled. While not configurable from ehcache, tim-async had the feature turned on by default. Because of the bucket/thread-to-key affinity, stealing doesn’t really makes sense (and was not what the customer wanted), also in our testing, it turned out stealing was doing more harm than good performance-wise…

ilevy 2011-02-11

in write_through_caching.apt there’s a section that says:

” *** Cluster-wide queue processing

In a cluster each node will have a <<<CacheWriter>>> configured. These will process their local work, but if there is no local
work, they will poll the other nodes to see if there is outstanding work and process part of that work.

This means that workload is balanced across the cluster, and that the write-behind queue will be serviced as long as there is one
Ehcache L1 instance in the cluster. " i assume this is referring to "work stealing" and so i've removed it (per #4 above). if this is referring to something else, let me know.

ilevy 2011-02-11

found this in the doc: “ rateLimitPerSecond The maximum number of store operations to allow per second. If writeBatching is enabled, “ any idea where this was supposed to go after the comma?

ilevy 2011-02-15

please answer the questions in the comments and reassign to me.

Alexander Snaps 2011-02-16

On you first comment: Yes work stealing is now disabled. So this paragraph can go. If we ever want to support this, we need to make it configurable (as it is in the toolkit). But under load this makes your app choke On the Batching: I guess what it means is that if given a certain batchSize, the writer can “skip” a write loop to keep tps down. if you have a rateLimit of 10 and the batch to be written is of 11, then the writer will skip this batch and retry later until tps is below the boundary.

ilevy 2011-02-17

i think we need to clarify better the relationship between writeBatchSize and rateLimitPerSecond, as this seems to be a somewhat subtle relationship with perhaps an important influence on performance.