Object cache - shared vs local

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Object cache - shared vs local

John Huss
The old docs describe the Object cache(s) like so:

A cache shared between ObjectContexts has a fixed upper limit. 10000 is the
default maximum number of entries, which can be changed in the Modeler. A
cache attached to each ObjectContext (also referred to as "local cache"
elsewhere in this chapter), which only stores the objects that were
accessed via this context, has no upper limit.

https://cayenne.apache.org/docs/3.0/individual-object-caching.html


So there is a Shared cache and a Local cache.  The default behavior for
relationship faulting (lazy loading) is to place these objects into the
*Shared* cache. Same with Cayenne.objectForPK.

This can cause the shared cache to get large, and in some cases I've had
objects that I really wanted to be cached forever to get pushed out of the
Shared cache by poorly written code that fires a ton of lazy relationships.

Also this can cause stale data to be returned when not carefully guarding
against it with prefetches that refresh any needed relationships.

I'd prefer that objects from lazily loaded relationships be placed in the
*Local* cache to eliminate both of these problems.  I really only want to
ever use the Shared cache *explicitly*, never implicitly.  So this default
behavior seems backwards to me.

Is there a way to change this?  It seems like a DI property could switch
between these two modes.

Thanks,
John
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object cache - shared vs local

Andrus Adamchik

> On Jun 21, 2017, at 4:10 PM, John Huss <[hidden email]> wrote:

> A cache shared between ObjectContexts has a fixed upper limit. 10000 is the
> default maximum number of entries, which can be changed in the Modeler. A
> cache attached to each ObjectContext (also referred to as "local cache"
> elsewhere in this chapter), which only stores the objects that were
> accessed via this context, has no upper limit.

This sounds about right, even in 4.0.

> So there is a Shared cache and a Local cache.  The default behavior for
> relationship faulting (lazy loading) is to place these objects into the
> *Shared* cache. Same with Cayenne.objectForPK.

More generally, every query, implicit or explicit, would result in selected objects placed in both shared cache and local cache of a given context.

> This can cause the shared cache to get large, and in some cases I've had
> objects that I really wanted to be cached forever to get pushed out of the
> Shared cache by poorly written code that fires a ton of lazy relationships.
>
> Also this can cause stale data to be returned when not carefully guarding
> against it with prefetches that refresh any needed relationships.
>
> I'd prefer that objects from lazily loaded relationships be placed in the
> *Local* cache to eliminate both of these problems.  I really only want to
> ever use the Shared cache *explicitly*, never implicitly.  So this default
> behavior seems backwards to me.
>
> Is there a way to change this?  

Not easily. It is unmanaged (you can't set per-entity caching policies, expiration times, etc.), and this is certainly a big limitation. Some people turn it off completely by unchecking "Use Shared Cache" in the Modeler, but that's another extreme.

Having said that, I never bothered tweaking shared *object* cache, cause I base all my refresh policies on *query* cache instead. Query cache is of course fully configurable, and local vs shared can be specified explicitly. So my typical approach is to leave object cache alone to do what it does behind the scenes, but manage query cache that would update objects in the shared cache as a side effect.

Andrus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object cache - shared vs local

John Huss
I would be ok with disabling the shared cache except for a few entities. Is
there a way with listeners to intercept queries for specific entities and
return something manually and skip the query?

Using the query cache is great, except if you are firing relationships that
weren't prefetched -- in that case you can't avoid using the shared cache
and getting stale data.

On Tue, Jun 27, 2017 at 8:39 AM Andrus Adamchik <[hidden email]>
wrote:

>
> > On Jun 21, 2017, at 4:10 PM, John Huss <[hidden email]> wrote:
>
> > A cache shared between ObjectContexts has a fixed upper limit. 10000 is
> the
> > default maximum number of entries, which can be changed in the Modeler. A
> > cache attached to each ObjectContext (also referred to as "local cache"
> > elsewhere in this chapter), which only stores the objects that were
> > accessed via this context, has no upper limit.
>
> This sounds about right, even in 4.0.
>
> > So there is a Shared cache and a Local cache.  The default behavior for
> > relationship faulting (lazy loading) is to place these objects into the
> > *Shared* cache. Same with Cayenne.objectForPK.
>
> More generally, every query, implicit or explicit, would result in
> selected objects placed in both shared cache and local cache of a given
> context.
>
> > This can cause the shared cache to get large, and in some cases I've had
> > objects that I really wanted to be cached forever to get pushed out of
> the
> > Shared cache by poorly written code that fires a ton of lazy
> relationships.
> >
> > Also this can cause stale data to be returned when not carefully guarding
> > against it with prefetches that refresh any needed relationships.
> >
> > I'd prefer that objects from lazily loaded relationships be placed in the
> > *Local* cache to eliminate both of these problems.  I really only want to
> > ever use the Shared cache *explicitly*, never implicitly.  So this
> default
> > behavior seems backwards to me.
> >
> > Is there a way to change this?
>
> Not easily. It is unmanaged (you can't set per-entity caching policies,
> expiration times, etc.), and this is certainly a big limitation. Some
> people turn it off completely by unchecking "Use Shared Cache" in the
> Modeler, but that's another extreme.
>
> Having said that, I never bothered tweaking shared *object* cache, cause I
> base all my refresh policies on *query* cache instead. Query cache is of
> course fully configurable, and local vs shared can be specified explicitly.
> So my typical approach is to leave object cache alone to do what it does
> behind the scenes, but manage query cache that would update objects in the
> shared cache as a side effect.
>
> Andrus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object cache - shared vs local

Andrus Adamchik

> On Jun 27, 2017, at 10:14 AM, John Huss <[hidden email]> wrote:
>
> I would be ok with disabling the shared cache except for a few entities. Is
> there a way with listeners to intercept queries for specific entities and
> return something manually and skip the query?

DataChannelFilter.onQuery(..) theoretically can do that.

> Using the query cache is great, except if you are firing relationships that
> weren't prefetched -- in that case you can't avoid using the shared cache
> and getting stale data.

True. You will have to guess upfront which relationships are needed if cache control is a concern.

Andrus

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object cache - shared vs local

John Huss
Thanks Andrus, that was very helpful.

I've never seen much documentation on how caching works, particularly for
the Object caches.  So I'm going to write what I've learned here and maybe
it can help someone else. If there is anything amiss please correct me.


OBJECT CACHE

The "Shared Object Cache" is a DataRowStore (snapshot cache) that can be
consulted by EVERY ObjectContext to fulfill requests primarily for
relationship faults (lazy loading). If the shared Object cache is enabled
then any fired fault will end up in the cache and will be available to
every ObjectContext in the app that may fire the fault in the future. It
won't refetch the row from the database then unless you have explicitly
requested a refresh (or an explicit prefetch). However, the size of the
cache is limited by the Domain configuration (in Cayenne Modeler) property
"size of object cache". So rows will be purged from the cache if it gets
full.  This requires caution since you can't count on any shared data
staying put.  Making the cache extremely large may avoid having your data
evicted, but will waste memory as every object you fetch without an
explicit Local cache strategy ends up in this cache.

The "Local Object Cache" is an ObjectStore that each ObjectContext has a
separate instance of. It is tied directly to an individual ObjectContext.
This allows you to hold on to data that you've prefetched into the context
(or explicitly fetched and lost reference to) but haven't accessed
otherwise yet. It disappears when the ObjectContext disappears. It is
actually backed by it's own DataRowStore, which is inconsequential EXCEPT
for the fact that this means it also it affected by the "size of object
cache" property given above. If this size is smaller than the numbers of
rows returned by a single query plus prefetches it will ignore your
prefetched data and fault in these relationships one at a time.

My recommendations for users are to:
1) Disable the Shared Object Cache. Otherwise you'll have to be vigilant to
avoid stale data in cases where you forgot a prefetch to refresh related
data. It's better to over-fetch than to return invalid (stale) data. That
makes it a performance problem instead of correctness problem, which is a
better default behavior.

If you ARE going to use the Shared Object Cache, then you should use a
Local cache strategy on all your queries that you don't want to end up in
the Shared cache, and you should be very careful to prefetch every
relationship that you need to be fresh.

2) After you've disabled the Shared Object cache, set the size of the
Object cache (this is just for the *Local* Object Cache now) to MAX_INT
(2147483647). Otherwise you risk having Cayenne ignore data that you have
explicitly prefetched and having it fall back to horrendous
one-row-at-a-time fetches. A separate cache is created for each
ObjectContext and only lives as only as the context does, so you don't
really have to worry about the potentially large size of the cache as long
as your contexts are all short-lived.

3) In the small number of cases where you actually WANT to have a shared
cache (like for read-only lookup tables) you can implement a
DataChannelFilter to act as a cache for specific entities. This will ensure
that reads of relationships to these lookup tables will always hit the
cache. This takes a bit of effort, but it works.
https://cayenne.apache.org/docs/4.0/cayenne-guide/lifecycle-events.html#comining-listeners-with-datachannelfilters


QUERY CACHE

The Query cache
<https://cayenne.apache.org/docs/4.0/cayenne-guide/performance-tuning.html#caching-and-fresh-data>
is completely separate from both the Shared Object Cache and the Local
Object Cache. However it also has Shared and Local versions, where Shared
query results are available to EVERY ObjectContext and Local query results
are only available to a single ObjectContext. If you are using the query
cache you should set it up explicitly with a custom cache provider like
EhCache. While there may a learning curve for your cache provider,
Cayenne's behavior with the cache won't surprise you - the configuration is
up to you.

A "Local" query cache with no expiration (no cache group) is useful as a
way to treat an explicit query like a relationship fault since the query
will only be executed once during an ObjectContext's lifetime.

A "Shared" query cache is useful for data where you want to avoid having to
fetch every single time and where some amount of staleness is ok. The
cached query result can be figured to expire after a fixed time period or
when a triggering event occurs.


On Tue, Jun 27, 2017 at 9:39 AM Andrus Adamchik <[hidden email]>
wrote:

>
> > On Jun 27, 2017, at 10:14 AM, John Huss <[hidden email]> wrote:
> >
> > I would be ok with disabling the shared cache except for a few entities.
> Is
> > there a way with listeners to intercept queries for specific entities and
> > return something manually and skip the query?
>
> DataChannelFilter.onQuery(..) theoretically can do that.
>
> > Using the query cache is great, except if you are firing relationships
> that
> > weren't prefetched -- in that case you can't avoid using the shared cache
> > and getting stale data.
>
> True. You will have to guess upfront which relationships are needed if
> cache control is a concern.
>
> Andrus
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object cache - shared vs local

Andrus Adamchik
My comment to these recommendations is that "it depends". So please consider all the pros and cons before disabling shared object cache.

If you read everything via a query with local cache and proper prefetches, then shared object cache is arguably as fresh as your query data.

Also in addition to relationship resolution, it serves for resolving all kinds of faults. Most common one being ObjectContext.localObject(..). So in combination with local query cache it is quite useful for seeding short-living throwaway ObjectContexts. And I am using those a *lot*.

Andrus


> On Jun 27, 2017, at 5:45 PM, John Huss <[hidden email]> wrote:
>
> Thanks Andrus, that was very helpful.
>
> I've never seen much documentation on how caching works, particularly for
> the Object caches.  So I'm going to write what I've learned here and maybe
> it can help someone else. If there is anything amiss please correct me.
>
>
> OBJECT CACHE
>
> The "Shared Object Cache" is a DataRowStore (snapshot cache) that can be
> consulted by EVERY ObjectContext to fulfill requests primarily for
> relationship faults (lazy loading). If the shared Object cache is enabled
> then any fired fault will end up in the cache and will be available to
> every ObjectContext in the app that may fire the fault in the future. It
> won't refetch the row from the database then unless you have explicitly
> requested a refresh (or an explicit prefetch). However, the size of the
> cache is limited by the Domain configuration (in Cayenne Modeler) property
> "size of object cache". So rows will be purged from the cache if it gets
> full.  This requires caution since you can't count on any shared data
> staying put.  Making the cache extremely large may avoid having your data
> evicted, but will waste memory as every object you fetch without an
> explicit Local cache strategy ends up in this cache.
>
> The "Local Object Cache" is an ObjectStore that each ObjectContext has a
> separate instance of. It is tied directly to an individual ObjectContext.
> This allows you to hold on to data that you've prefetched into the context
> (or explicitly fetched and lost reference to) but haven't accessed
> otherwise yet. It disappears when the ObjectContext disappears. It is
> actually backed by it's own DataRowStore, which is inconsequential EXCEPT
> for the fact that this means it also it affected by the "size of object
> cache" property given above. If this size is smaller than the numbers of
> rows returned by a single query plus prefetches it will ignore your
> prefetched data and fault in these relationships one at a time.
>
> My recommendations for users are to:
> 1) Disable the Shared Object Cache. Otherwise you'll have to be vigilant to
> avoid stale data in cases where you forgot a prefetch to refresh related
> data. It's better to over-fetch than to return invalid (stale) data. That
> makes it a performance problem instead of correctness problem, which is a
> better default behavior.
>
> If you ARE going to use the Shared Object Cache, then you should use a
> Local cache strategy on all your queries that you don't want to end up in
> the Shared cache, and you should be very careful to prefetch every
> relationship that you need to be fresh.
>
> 2) After you've disabled the Shared Object cache, set the size of the
> Object cache (this is just for the *Local* Object Cache now) to MAX_INT
> (2147483647). Otherwise you risk having Cayenne ignore data that you have
> explicitly prefetched and having it fall back to horrendous
> one-row-at-a-time fetches. A separate cache is created for each
> ObjectContext and only lives as only as the context does, so you don't
> really have to worry about the potentially large size of the cache as long
> as your contexts are all short-lived.
>
> 3) In the small number of cases where you actually WANT to have a shared
> cache (like for read-only lookup tables) you can implement a
> DataChannelFilter to act as a cache for specific entities. This will ensure
> that reads of relationships to these lookup tables will always hit the
> cache. This takes a bit of effort, but it works.
> https://cayenne.apache.org/docs/4.0/cayenne-guide/lifecycle-events.html#comining-listeners-with-datachannelfilters
>
>
> QUERY CACHE
>
> The Query cache
> <https://cayenne.apache.org/docs/4.0/cayenne-guide/performance-tuning.html#caching-and-fresh-data>
> is completely separate from both the Shared Object Cache and the Local
> Object Cache. However it also has Shared and Local versions, where Shared
> query results are available to EVERY ObjectContext and Local query results
> are only available to a single ObjectContext. If you are using the query
> cache you should set it up explicitly with a custom cache provider like
> EhCache. While there may a learning curve for your cache provider,
> Cayenne's behavior with the cache won't surprise you - the configuration is
> up to you.
>
> A "Local" query cache with no expiration (no cache group) is useful as a
> way to treat an explicit query like a relationship fault since the query
> will only be executed once during an ObjectContext's lifetime.
>
> A "Shared" query cache is useful for data where you want to avoid having to
> fetch every single time and where some amount of staleness is ok. The
> cached query result can be figured to expire after a fixed time period or
> when a triggering event occurs.
>
>
> On Tue, Jun 27, 2017 at 9:39 AM Andrus Adamchik <[hidden email]>
> wrote:
>
>>
>>> On Jun 27, 2017, at 10:14 AM, John Huss <[hidden email]> wrote:
>>>
>>> I would be ok with disabling the shared cache except for a few entities.
>> Is
>>> there a way with listeners to intercept queries for specific entities and
>>> return something manually and skip the query?
>>
>> DataChannelFilter.onQuery(..) theoretically can do that.
>>
>>> Using the query cache is great, except if you are firing relationships
>> that
>>> weren't prefetched -- in that case you can't avoid using the shared cache
>>> and getting stale data.
>>
>> True. You will have to guess upfront which relationships are needed if
>> cache control is a concern.
>>
>> Andrus

Loading...