Cayenne object storage / memory usage

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Cayenne object storage / memory usage

John Huss
I did some experimenting recently to see if changes to the way data in
stored in Cayenne objects could reduce the amount of memory they consume.

I chose to use separate fields for each property instead of a HashMap
(which is what CayenneDataObject uses).  The results were very affirming.
For my test of loading 10,000 objects from every table in my database I got
it to use about about *half the memory* of the default class (from 921 MB
down to 431 MB).

I know there has been some discussion already about addressing this topic
for the next major release, so I thought I'd throw in some observations /
questions here.

For my implementation I subclassed CayenneDataObject because in previous
experience I found implementing a replacement to be much more difficult and
subject to more bugs due to the less frequently used code path that
PersistentObject and it's descriptors take you down.  My apps rely on
things that are sort of specific to CayenneDataObject like Validating.

So one question is how we should be addressing the need that people may
have to create their own data classes. Right now I believe the recommended
path is to subclass PersistentObject, but I'm not convinced that that is a
viable solution without wholesale copying most of CayenneDataObject into
your subclass.  I'd rather see a fuller base class (in addition to keeping
PersistentObject around) that includes all of CayenneDataObject except the
property storage (HashMap).

For my implementation I had to modify CayenneDataObject, but only slightly
to avoid creating the HashMap which I wasn't using. However, because class
isn't really intended for customization this map is referenced in multiple
methods that can't easily be overridden to change the way things are stored.

Another approach might be to ask why anyone should need to customize the
way data is stored in the objects if we can just use the best solution
possible in the first place?  I can't imagine a more efficient
representation that fields.  However, fields present difficulties for the
use case where you aren't generating unique classes for your model but just
rely on the generic class.  In theory this could be addressed via runtime
code generation or something else, but that would be quite a change.

So I'm looking forward to discussing this and toward the future.

John
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik
Ah, an exciting topic!

> On Jun 9, 2017, at 5:55 PM, John Huss <[hidden email]> wrote:
> For my test of loading 10,000 objects from every table in my database I got
> it to use about about *half the memory* of the default class (from 921 MB
> down to 431 MB).

Yeah, that's certainly expected.

> So one question is how we should be addressing the need that people may
> have to create their own data classes. Right now I believe the recommended
> path is to subclass PersistentObject,

Formally - yes. But we don't have a "recommended" way from the practical standpoint. Simply because this is too complicated for the majority of users. The most prominent example of an alt Persistent implementation is ROP, but it required a bunch of other things to work beyond the object itself and ClassDescriptor. So I commend you for trying! :)

> Another approach might be to ask why anyone should need to customize the
> way data is stored in the objects if we can just use the best solution
> possible in the first place?

+1. While making it pluggable is nice (even for our own experimentation), the goal is to settle on the most efficient design and make it the default.

>  I can't imagine a more efficient representation that fields.  

True, but fields also require reflection to be accessed "from below" (by Cayenne). Other ideas include:

* use Object[] for data storage (with array positions alphabetically mapped to attribute names).
* use fields, but also use class generation (that you also mentioned) to create some kind of "companion" objects (adapters) for each entity, that can operate on the fields of a given Persistent object, but present a generic API to the rest of the framework.

> However, fields present difficulties for the
> use case where you aren't generating unique classes for your model but just
> rely on the generic class.  

I'd keep Map-based CayenneDataObject for the 5% of apps that use generic classes. (Unless we can make Object[] based structures to work. Those are also generic).

Andrus
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

John Huss
Thanks for commenting! See below.

On Fri, Jun 9, 2017 at 12:48 PM Andrus Adamchik <[hidden email]>
wrote:

> Ah, an exciting topic!
>
> > On Jun 9, 2017, at 5:55 PM, John Huss <[hidden email]> wrote:
> > For my test of loading 10,000 objects from every table in my database I
> got
> > it to use about about *half the memory* of the default class (from 921 MB
> > down to 431 MB).
>
> Yeah, that's certainly expected.
>
> > So one question is how we should be addressing the need that people may
> > have to create their own data classes. Right now I believe the
> recommended
> > path is to subclass PersistentObject,
>
> Formally - yes. But we don't have a "recommended" way from the practical
> standpoint. Simply because this is too complicated for the majority of
> users. The most prominent example of an alt Persistent implementation is
> ROP, but it required a bunch of other things to work beyond the object
> itself and ClassDescriptor. So I commend you for trying! :)
>
> > Another approach might be to ask why anyone should need to customize the
> > way data is stored in the objects if we can just use the best solution
> > possible in the first place?
>
> +1. While making it pluggable is nice (even for our own experimentation),
> the goal is to settle on the most efficient design and make it the default.
>
> >  I can't imagine a more efficient representation that fields.
>
> True, but fields also require reflection to be accessed "from below" (by
> Cayenne).


It doesn't have to use reflection. With Java 7 and newer you can use
strings in switch statements with a single jump, which is what I'm doing.
The class template generates the readPropertyDirectly method with the
switch statement in it.  This is both fast and simple.  Also, using fields
makes it much easier to inspect your objects in the debugger, which is
handy.


> Other ideas include:
>
> * use Object[] for data storage (with array positions alphabetically
> mapped to attribute names).
> * use fields, but also use class generation (that you also mentioned) to
> create some kind of "companion" objects (adapters) for each entity, that
> can operate on the fields of a given Persistent object, but present a
> generic API to the rest of the framework.
>
> > However, fields present difficulties for the
> > use case where you aren't generating unique classes for your model but
> just
> > rely on the generic class.
>
> I'd keep Map-based CayenneDataObject for the 5% of apps that use generic
> classes. (Unless we can make Object[] based structures to work. Those are
> also generic).
>
> Andrus
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik

> On Jun 9, 2017, at 9:29 PM, John Huss <[hidden email]> wrote:
>
>>> I can't imagine a more efficient representation that fields.
>>
>> True, but fields also require reflection to be accessed "from below" (by
>> Cayenne).
>
>
> It doesn't have to use reflection. With Java 7 and newer you can use
> strings in switch statements with a single jump, which is what I'm doing.
> The class template generates the readPropertyDirectly method with the
> switch statement in it.  This is both fast and simple.  Also, using fields
> makes it much easier to inspect your objects in the debugger, which is
> handy.

Yeah, using fields would be great. It is certainly much more developer-friendly. I'd like to run some benchmarks on "switch", but this may be a quick way for us to improve the framework without rewriting the stack.

The only potential advantage of an Object[] approach would be the ability to replace object state atomically. Maybe someday we'll be able to take advantage of that. But at the moment what you are suggesting looks very promising and also "easy".

Andrus

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Michael Gentry
In reply to this post by John Huss
Hi John,

I'm a little surprised that map-based storage is over 2x worse in memory
consumption.  I'm wondering if there is more going on here than storage of
the property values.  Would it be simple enough to adapt your test case to
compare a list of POJOs vs a list of maps and see what the memory footprint
and difference is that way?

I personally was thinking the big improvement for using fields directly is
the speed improvement.  I didn't think the memory consumption difference
would be that dramatic.

Thanks,

mrg


On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:

> I did some experimenting recently to see if changes to the way data in
> stored in Cayenne objects could reduce the amount of memory they consume.
>
> I chose to use separate fields for each property instead of a HashMap
> (which is what CayenneDataObject uses).  The results were very affirming.
> For my test of loading 10,000 objects from every table in my database I got
> it to use about about *half the memory* of the default class (from 921 MB
> down to 431 MB).
>
> I know there has been some discussion already about addressing this topic
> for the next major release, so I thought I'd throw in some observations /
> questions here.
>
> For my implementation I subclassed CayenneDataObject because in previous
> experience I found implementing a replacement to be much more difficult and
> subject to more bugs due to the less frequently used code path that
> PersistentObject and it's descriptors take you down.  My apps rely on
> things that are sort of specific to CayenneDataObject like Validating.
>
> So one question is how we should be addressing the need that people may
> have to create their own data classes. Right now I believe the recommended
> path is to subclass PersistentObject, but I'm not convinced that that is a
> viable solution without wholesale copying most of CayenneDataObject into
> your subclass.  I'd rather see a fuller base class (in addition to keeping
> PersistentObject around) that includes all of CayenneDataObject except the
> property storage (HashMap).
>
> For my implementation I had to modify CayenneDataObject, but only slightly
> to avoid creating the HashMap which I wasn't using. However, because class
> isn't really intended for customization this map is referenced in multiple
> methods that can't easily be overridden to change the way things are
> stored.
>
> Another approach might be to ask why anyone should need to customize the
> way data is stored in the objects if we can just use the best solution
> possible in the first place?  I can't imagine a more efficient
> representation that fields.  However, fields present difficulties for the
> use case where you aren't generating unique classes for your model but just
> rely on the generic class.  In theory this could be addressed via runtime
> code generation or something else, but that would be quite a change.
>
> So I'm looking forward to discussing this and toward the future.
>
> John
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Robert Zeigler-6
I’m also a little surprised at the 1/2-ing… what were the values being stored? I suppose in theory, many values are relatively “small”, memory-wise, so having the overhead of also storing the key could ~double the memory use, but if you’re storing large values, I wouldn’t expect the utilization to drop as dramatically. What were your data values (type and length distribution for strings)?

Thanks!

Robert

> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]> wrote:
>
> Hi John,
>
> I'm a little surprised that map-based storage is over 2x worse in memory
> consumption.  I'm wondering if there is more going on here than storage of
> the property values.  Would it be simple enough to adapt your test case to
> compare a list of POJOs vs a list of maps and see what the memory footprint
> and difference is that way?
>
> I personally was thinking the big improvement for using fields directly is
> the speed improvement.  I didn't think the memory consumption difference
> would be that dramatic.
>
> Thanks,
>
> mrg
>
>
> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:
>
>> I did some experimenting recently to see if changes to the way data in
>> stored in Cayenne objects could reduce the amount of memory they consume.
>>
>> I chose to use separate fields for each property instead of a HashMap
>> (which is what CayenneDataObject uses).  The results were very affirming.
>> For my test of loading 10,000 objects from every table in my database I got
>> it to use about about *half the memory* of the default class (from 921 MB
>> down to 431 MB).
>>
>> I know there has been some discussion already about addressing this topic
>> for the next major release, so I thought I'd throw in some observations /
>> questions here.
>>
>> For my implementation I subclassed CayenneDataObject because in previous
>> experience I found implementing a replacement to be much more difficult and
>> subject to more bugs due to the less frequently used code path that
>> PersistentObject and it's descriptors take you down.  My apps rely on
>> things that are sort of specific to CayenneDataObject like Validating.
>>
>> So one question is how we should be addressing the need that people may
>> have to create their own data classes. Right now I believe the recommended
>> path is to subclass PersistentObject, but I'm not convinced that that is a
>> viable solution without wholesale copying most of CayenneDataObject into
>> your subclass.  I'd rather see a fuller base class (in addition to keeping
>> PersistentObject around) that includes all of CayenneDataObject except the
>> property storage (HashMap).
>>
>> For my implementation I had to modify CayenneDataObject, but only slightly
>> to avoid creating the HashMap which I wasn't using. However, because class
>> isn't really intended for customization this map is referenced in multiple
>> methods that can't easily be overridden to change the way things are
>> stored.
>>
>> Another approach might be to ask why anyone should need to customize the
>> way data is stored in the objects if we can just use the best solution
>> possible in the first place?  I can't imagine a more efficient
>> representation that fields.  However, fields present difficulties for the
>> use case where you aren't generating unique classes for your model but just
>> rely on the generic class.  In theory this could be addressed via runtime
>> code generation or something else, but that would be quite a change.
>>
>> So I'm looking forward to discussing this and toward the future.
>>
>> John
>>

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

John Huss
I was surprised by the difference in memory too, but this is a small diff
(apart from the newly generated readPropertyDirectly/writePropertyDirectly
methods) so there isn't anything else going on.  My unverified assumption
of HashMap is that is doubles in size each time it resizes, so entities
with more fields could cause more waste. For example a entity with 65
fields would have 63 empty array slots (ignoring fill factor).  So the
exact savings may vary.

On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <[hidden email]>
wrote:

> I’m also a little surprised at the 1/2-ing… what were the values being
> stored? I suppose in theory, many values are relatively “small”,
> memory-wise, so having the overhead of also storing the key could ~double
> the memory use, but if you’re storing large values, I wouldn’t expect the
> utilization to drop as dramatically. What were your data values (type and
> length distribution for strings)?
>
> Thanks!
>
> Robert
>
> > On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]> wrote:
> >
> > Hi John,
> >
> > I'm a little surprised that map-based storage is over 2x worse in memory
> > consumption.  I'm wondering if there is more going on here than storage
> of
> > the property values.  Would it be simple enough to adapt your test case
> to
> > compare a list of POJOs vs a list of maps and see what the memory
> footprint
> > and difference is that way?
> >
> > I personally was thinking the big improvement for using fields directly
> is
> > the speed improvement.  I didn't think the memory consumption difference
> > would be that dramatic.
> >
> > Thanks,
> >
> > mrg
> >
> >
> > On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:
> >
> >> I did some experimenting recently to see if changes to the way data in
> >> stored in Cayenne objects could reduce the amount of memory they
> consume.
> >>
> >> I chose to use separate fields for each property instead of a HashMap
> >> (which is what CayenneDataObject uses).  The results were very
> affirming.
> >> For my test of loading 10,000 objects from every table in my database I
> got
> >> it to use about about *half the memory* of the default class (from 921
> MB
> >> down to 431 MB).
> >>
> >> I know there has been some discussion already about addressing this
> topic
> >> for the next major release, so I thought I'd throw in some observations
> /
> >> questions here.
> >>
> >> For my implementation I subclassed CayenneDataObject because in previous
> >> experience I found implementing a replacement to be much more difficult
> and
> >> subject to more bugs due to the less frequently used code path that
> >> PersistentObject and it's descriptors take you down.  My apps rely on
> >> things that are sort of specific to CayenneDataObject like Validating.
> >>
> >> So one question is how we should be addressing the need that people may
> >> have to create their own data classes. Right now I believe the
> recommended
> >> path is to subclass PersistentObject, but I'm not convinced that that
> is a
> >> viable solution without wholesale copying most of CayenneDataObject into
> >> your subclass.  I'd rather see a fuller base class (in addition to
> keeping
> >> PersistentObject around) that includes all of CayenneDataObject except
> the
> >> property storage (HashMap).
> >>
> >> For my implementation I had to modify CayenneDataObject, but only
> slightly
> >> to avoid creating the HashMap which I wasn't using. However, because
> class
> >> isn't really intended for customization this map is referenced in
> multiple
> >> methods that can't easily be overridden to change the way things are
> >> stored.
> >>
> >> Another approach might be to ask why anyone should need to customize the
> >> way data is stored in the objects if we can just use the best solution
> >> possible in the first place?  I can't imagine a more efficient
> >> representation that fields.  However, fields present difficulties for
> the
> >> use case where you aren't generating unique classes for your model but
> just
> >> rely on the generic class.  In theory this could be addressed via
> runtime
> >> code generation or something else, but that would be quite a change.
> >>
> >> So I'm looking forward to discussing this and toward the future.
> >>
> >> John
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Nikita Timofeev
Hi all,

I've run some additional benchmarks for field-based classes inspired
by John and they were so promising, that I've moved on
to the implementation.

So here is pull request for you to review [1].
Here [2] you can see what new generated classes will look like.

For me there is no visible downsides in this solution, e.g. both
memory usage and speed are improved.
All tests are clean and the only minor incompatibility out there
is in HOLLOW state that no longer resets object's values [3]
(though this can be implemented as well, I'm just
not sure this is really needed).

P.S. here is some raw numbers from my benchmarks.
I'm giving absolute numbers, but really only their relation is important.
Results for old version are on the left, for new version on the right.

Memory usage:
==============
1. 10.000 small objects
(int, Date and String ~ 20 chars)
>>> 6Mb vs 2.5Mb <<<

2. 10.000 objects with big values
(int, Date and String ~ 1K chars)
Actually in case of same classes (same field number),
there will be just constant difference,
so this is just to get idea what to expect in different cases.
>>> 24.5Mb vs 21Mb <<<

Performance:
==============
(numbers are in millions ops per sec, measured with JMH benchmark)
1. Getter:
>>> 107 vs 177 <<<

2. Setter:
Not so impressive, as Cayenne stack took most of the
time here to process graph diff, but still new methods are better.
>>> 12.5 vs 14.5 <<<

3. readPropertyDirectly:
>>> 152 vs 248 <<<

4. writePropertyDirectly:
This is map.put() vs switch(String) battle,
and map definitely loosing it :)
>>> 126 vs 582 <<<

[1] https://github.com/apache/cayenne/pull/235
[2] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
[3] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144

On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]> wrote:

> I was surprised by the difference in memory too, but this is a small diff
> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
> methods) so there isn't anything else going on.  My unverified assumption
> of HashMap is that is doubles in size each time it resizes, so entities
> with more fields could cause more waste. For example a entity with 65
> fields would have 63 empty array slots (ignoring fill factor).  So the
> exact savings may vary.
>
> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <[hidden email]>
> wrote:
>
>> I’m also a little surprised at the 1/2-ing… what were the values being
>> stored? I suppose in theory, many values are relatively “small”,
>> memory-wise, so having the overhead of also storing the key could ~double
>> the memory use, but if you’re storing large values, I wouldn’t expect the
>> utilization to drop as dramatically. What were your data values (type and
>> length distribution for strings)?
>>
>> Thanks!
>>
>> Robert
>>
>> > On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]> wrote:
>> >
>> > Hi John,
>> >
>> > I'm a little surprised that map-based storage is over 2x worse in memory
>> > consumption.  I'm wondering if there is more going on here than storage
>> of
>> > the property values.  Would it be simple enough to adapt your test case
>> to
>> > compare a list of POJOs vs a list of maps and see what the memory
>> footprint
>> > and difference is that way?
>> >
>> > I personally was thinking the big improvement for using fields directly
>> is
>> > the speed improvement.  I didn't think the memory consumption difference
>> > would be that dramatic.
>> >
>> > Thanks,
>> >
>> > mrg
>> >
>> >
>> > On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:
>> >
>> >> I did some experimenting recently to see if changes to the way data in
>> >> stored in Cayenne objects could reduce the amount of memory they
>> consume.
>> >>
>> >> I chose to use separate fields for each property instead of a HashMap
>> >> (which is what CayenneDataObject uses).  The results were very
>> affirming.
>> >> For my test of loading 10,000 objects from every table in my database I
>> got
>> >> it to use about about *half the memory* of the default class (from 921
>> MB
>> >> down to 431 MB).
>> >>
>> >> I know there has been some discussion already about addressing this
>> topic
>> >> for the next major release, so I thought I'd throw in some observations
>> /
>> >> questions here.
>> >>
>> >> For my implementation I subclassed CayenneDataObject because in previous
>> >> experience I found implementing a replacement to be much more difficult
>> and
>> >> subject to more bugs due to the less frequently used code path that
>> >> PersistentObject and it's descriptors take you down.  My apps rely on
>> >> things that are sort of specific to CayenneDataObject like Validating.
>> >>
>> >> So one question is how we should be addressing the need that people may
>> >> have to create their own data classes. Right now I believe the
>> recommended
>> >> path is to subclass PersistentObject, but I'm not convinced that that
>> is a
>> >> viable solution without wholesale copying most of CayenneDataObject into
>> >> your subclass.  I'd rather see a fuller base class (in addition to
>> keeping
>> >> PersistentObject around) that includes all of CayenneDataObject except
>> the
>> >> property storage (HashMap).
>> >>
>> >> For my implementation I had to modify CayenneDataObject, but only
>> slightly
>> >> to avoid creating the HashMap which I wasn't using. However, because
>> class
>> >> isn't really intended for customization this map is referenced in
>> multiple
>> >> methods that can't easily be overridden to change the way things are
>> >> stored.
>> >>
>> >> Another approach might be to ask why anyone should need to customize the
>> >> way data is stored in the objects if we can just use the best solution
>> >> possible in the first place?  I can't imagine a more efficient
>> >> representation that fields.  However, fields present difficulties for
>> the
>> >> use case where you aren't generating unique classes for your model but
>> just
>> >> rely on the generic class.  In theory this could be addressed via
>> runtime
>> >> code generation or something else, but that would be quite a change.
>> >>
>> >> So I'm looking forward to discussing this and toward the future.
>> >>
>> >> John
>> >>
>>
>>



--
Best regards,
Nikita Timofeev
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Michael Gentry
Hi Nikita,

I saw the pull request and was taking a glance at it, so thanks for
following up with an e-mail.

The memory improvement looks quite nice, but I'm wondering if you
inadvertently switched old vs new in the performance section?  (Since the
new, on the right, is always slower.)

Thanks,

mrg


On Wed, Jul 5, 2017 at 10:19 AM, Nikita Timofeev <[hidden email]>
wrote:

> Hi all,
>
> I've run some additional benchmarks for field-based classes inspired
> by John and they were so promising, that I've moved on
> to the implementation.
>
> So here is pull request for you to review [1].
> Here [2] you can see what new generated classes will look like.
>
> For me there is no visible downsides in this solution, e.g. both
> memory usage and speed are improved.
> All tests are clean and the only minor incompatibility out there
> is in HOLLOW state that no longer resets object's values [3]
> (though this can be implemented as well, I'm just
> not sure this is really needed).
>
> P.S. here is some raw numbers from my benchmarks.
> I'm giving absolute numbers, but really only their relation is important.
> Results for old version are on the left, for new version on the right.
>
> Memory usage:
> ==============
> 1. 10.000 small objects
> (int, Date and String ~ 20 chars)
> >>> 6Mb vs 2.5Mb <<<
>
> 2. 10.000 objects with big values
> (int, Date and String ~ 1K chars)
> Actually in case of same classes (same field number),
> there will be just constant difference,
> so this is just to get idea what to expect in different cases.
> >>> 24.5Mb vs 21Mb <<<
>
> Performance:
> ==============
> (numbers are in millions ops per sec, measured with JMH benchmark)
> 1. Getter:
> >>> 107 vs 177 <<<
>
> 2. Setter:
> Not so impressive, as Cayenne stack took most of the
> time here to process graph diff, but still new methods are better.
> >>> 12.5 vs 14.5 <<<
>
> 3. readPropertyDirectly:
> >>> 152 vs 248 <<<
>
> 4. writePropertyDirectly:
> This is map.put() vs switch(String) battle,
> and map definitely loosing it :)
> >>> 126 vs 582 <<<
>
> [1] https://github.com/apache/cayenne/pull/235
> [2] https://github.com/stariy95/cayenne/blob/
> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> [3] https://github.com/stariy95/cayenne/blob/
> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>
> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]> wrote:
> > I was surprised by the difference in memory too, but this is a small diff
> > (apart from the newly generated readPropertyDirectly/
> writePropertyDirectly
> > methods) so there isn't anything else going on.  My unverified assumption
> > of HashMap is that is doubles in size each time it resizes, so entities
> > with more fields could cause more waste. For example a entity with 65
> > fields would have 63 empty array slots (ignoring fill factor).  So the
> > exact savings may vary.
> >
> > On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> [hidden email]>
> > wrote:
> >
> >> I’m also a little surprised at the 1/2-ing… what were the values being
> >> stored? I suppose in theory, many values are relatively “small”,
> >> memory-wise, so having the overhead of also storing the key could
> ~double
> >> the memory use, but if you’re storing large values, I wouldn’t expect
> the
> >> utilization to drop as dramatically. What were your data values (type
> and
> >> length distribution for strings)?
> >>
> >> Thanks!
> >>
> >> Robert
> >>
> >> > On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]>
> wrote:
> >> >
> >> > Hi John,
> >> >
> >> > I'm a little surprised that map-based storage is over 2x worse in
> memory
> >> > consumption.  I'm wondering if there is more going on here than
> storage
> >> of
> >> > the property values.  Would it be simple enough to adapt your test
> case
> >> to
> >> > compare a list of POJOs vs a list of maps and see what the memory
> >> footprint
> >> > and difference is that way?
> >> >
> >> > I personally was thinking the big improvement for using fields
> directly
> >> is
> >> > the speed improvement.  I didn't think the memory consumption
> difference
> >> > would be that dramatic.
> >> >
> >> > Thanks,
> >> >
> >> > mrg
> >> >
> >> >
> >> > On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]>
> wrote:
> >> >
> >> >> I did some experimenting recently to see if changes to the way data
> in
> >> >> stored in Cayenne objects could reduce the amount of memory they
> >> consume.
> >> >>
> >> >> I chose to use separate fields for each property instead of a HashMap
> >> >> (which is what CayenneDataObject uses).  The results were very
> >> affirming.
> >> >> For my test of loading 10,000 objects from every table in my
> database I
> >> got
> >> >> it to use about about *half the memory* of the default class (from
> 921
> >> MB
> >> >> down to 431 MB).
> >> >>
> >> >> I know there has been some discussion already about addressing this
> >> topic
> >> >> for the next major release, so I thought I'd throw in some
> observations
> >> /
> >> >> questions here.
> >> >>
> >> >> For my implementation I subclassed CayenneDataObject because in
> previous
> >> >> experience I found implementing a replacement to be much more
> difficult
> >> and
> >> >> subject to more bugs due to the less frequently used code path that
> >> >> PersistentObject and it's descriptors take you down.  My apps rely on
> >> >> things that are sort of specific to CayenneDataObject like
> Validating.
> >> >>
> >> >> So one question is how we should be addressing the need that people
> may
> >> >> have to create their own data classes. Right now I believe the
> >> recommended
> >> >> path is to subclass PersistentObject, but I'm not convinced that that
> >> is a
> >> >> viable solution without wholesale copying most of CayenneDataObject
> into
> >> >> your subclass.  I'd rather see a fuller base class (in addition to
> >> keeping
> >> >> PersistentObject around) that includes all of CayenneDataObject
> except
> >> the
> >> >> property storage (HashMap).
> >> >>
> >> >> For my implementation I had to modify CayenneDataObject, but only
> >> slightly
> >> >> to avoid creating the HashMap which I wasn't using. However, because
> >> class
> >> >> isn't really intended for customization this map is referenced in
> >> multiple
> >> >> methods that can't easily be overridden to change the way things are
> >> >> stored.
> >> >>
> >> >> Another approach might be to ask why anyone should need to customize
> the
> >> >> way data is stored in the objects if we can just use the best
> solution
> >> >> possible in the first place?  I can't imagine a more efficient
> >> >> representation that fields.  However, fields present difficulties for
> >> the
> >> >> use case where you aren't generating unique classes for your model
> but
> >> just
> >> >> rely on the generic class.  In theory this could be addressed via
> >> runtime
> >> >> code generation or something else, but that would be quite a change.
> >> >>
> >> >> So I'm looking forward to discussing this and toward the future.
> >> >>
> >> >> John
> >> >>
> >>
> >>
>
>
>
> --
> Best regards,
> Nikita Timofeev
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik
>  I'm wondering if you
> inadvertently switched old vs new in the performance section?  (Since the
> new, on the right, is always slower.)

The benchmark is million ops per second. So a bigger value is better/faster (kind of like RPM in a car).

Andrus

> On Jul 5, 2017, at 7:31 PM, Michael Gentry <[hidden email]> wrote:
>
> Hi Nikita,
>
> I saw the pull request and was taking a glance at it, so thanks for
> following up with an e-mail.
>
> The memory improvement looks quite nice, but I'm wondering if you
> inadvertently switched old vs new in the performance section?  (Since the
> new, on the right, is always slower.)
>
> Thanks,
>
> mrg
>
>
> On Wed, Jul 5, 2017 at 10:19 AM, Nikita Timofeev <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> I've run some additional benchmarks for field-based classes inspired
>> by John and they were so promising, that I've moved on
>> to the implementation.
>>
>> So here is pull request for you to review [1].
>> Here [2] you can see what new generated classes will look like.
>>
>> For me there is no visible downsides in this solution, e.g. both
>> memory usage and speed are improved.
>> All tests are clean and the only minor incompatibility out there
>> is in HOLLOW state that no longer resets object's values [3]
>> (though this can be implemented as well, I'm just
>> not sure this is really needed).
>>
>> P.S. here is some raw numbers from my benchmarks.
>> I'm giving absolute numbers, but really only their relation is important.
>> Results for old version are on the left, for new version on the right.
>>
>> Memory usage:
>> ==============
>> 1. 10.000 small objects
>> (int, Date and String ~ 20 chars)
>>>>> 6Mb vs 2.5Mb <<<
>>
>> 2. 10.000 objects with big values
>> (int, Date and String ~ 1K chars)
>> Actually in case of same classes (same field number),
>> there will be just constant difference,
>> so this is just to get idea what to expect in different cases.
>>>>> 24.5Mb vs 21Mb <<<
>>
>> Performance:
>> ==============
>> (numbers are in millions ops per sec, measured with JMH benchmark)
>> 1. Getter:
>>>>> 107 vs 177 <<<
>>
>> 2. Setter:
>> Not so impressive, as Cayenne stack took most of the
>> time here to process graph diff, but still new methods are better.
>>>>> 12.5 vs 14.5 <<<
>>
>> 3. readPropertyDirectly:
>>>>> 152 vs 248 <<<
>>
>> 4. writePropertyDirectly:
>> This is map.put() vs switch(String) battle,
>> and map definitely loosing it :)
>>>>> 126 vs 582 <<<
>>
>> [1] https://github.com/apache/cayenne/pull/235
>> [2] https://github.com/stariy95/cayenne/blob/
>> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
>> test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
>> [3] https://github.com/stariy95/cayenne/blob/
>> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
>> test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>>
>> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]> wrote:
>>> I was surprised by the difference in memory too, but this is a small diff
>>> (apart from the newly generated readPropertyDirectly/
>> writePropertyDirectly
>>> methods) so there isn't anything else going on.  My unverified assumption
>>> of HashMap is that is doubles in size each time it resizes, so entities
>>> with more fields could cause more waste. For example a entity with 65
>>> fields would have 63 empty array slots (ignoring fill factor).  So the
>>> exact savings may vary.
>>>
>>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
>> [hidden email]>
>>> wrote:
>>>
>>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>>> stored? I suppose in theory, many values are relatively “small”,
>>>> memory-wise, so having the overhead of also storing the key could
>> ~double
>>>> the memory use, but if you’re storing large values, I wouldn’t expect
>> the
>>>> utilization to drop as dramatically. What were your data values (type
>> and
>>>> length distribution for strings)?
>>>>
>>>> Thanks!
>>>>
>>>> Robert
>>>>
>>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]>
>> wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> I'm a little surprised that map-based storage is over 2x worse in
>> memory
>>>>> consumption.  I'm wondering if there is more going on here than
>> storage
>>>> of
>>>>> the property values.  Would it be simple enough to adapt your test
>> case
>>>> to
>>>>> compare a list of POJOs vs a list of maps and see what the memory
>>>> footprint
>>>>> and difference is that way?
>>>>>
>>>>> I personally was thinking the big improvement for using fields
>> directly
>>>> is
>>>>> the speed improvement.  I didn't think the memory consumption
>> difference
>>>>> would be that dramatic.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> mrg
>>>>>
>>>>>
>>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]>
>> wrote:
>>>>>
>>>>>> I did some experimenting recently to see if changes to the way data
>> in
>>>>>> stored in Cayenne objects could reduce the amount of memory they
>>>> consume.
>>>>>>
>>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>>> (which is what CayenneDataObject uses).  The results were very
>>>> affirming.
>>>>>> For my test of loading 10,000 objects from every table in my
>> database I
>>>> got
>>>>>> it to use about about *half the memory* of the default class (from
>> 921
>>>> MB
>>>>>> down to 431 MB).
>>>>>>
>>>>>> I know there has been some discussion already about addressing this
>>>> topic
>>>>>> for the next major release, so I thought I'd throw in some
>> observations
>>>> /
>>>>>> questions here.
>>>>>>
>>>>>> For my implementation I subclassed CayenneDataObject because in
>> previous
>>>>>> experience I found implementing a replacement to be much more
>> difficult
>>>> and
>>>>>> subject to more bugs due to the less frequently used code path that
>>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>>> things that are sort of specific to CayenneDataObject like
>> Validating.
>>>>>>
>>>>>> So one question is how we should be addressing the need that people
>> may
>>>>>> have to create their own data classes. Right now I believe the
>>>> recommended
>>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>>> is a
>>>>>> viable solution without wholesale copying most of CayenneDataObject
>> into
>>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>>> keeping
>>>>>> PersistentObject around) that includes all of CayenneDataObject
>> except
>>>> the
>>>>>> property storage (HashMap).
>>>>>>
>>>>>> For my implementation I had to modify CayenneDataObject, but only
>>>> slightly
>>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>>> class
>>>>>> isn't really intended for customization this map is referenced in
>>>> multiple
>>>>>> methods that can't easily be overridden to change the way things are
>>>>>> stored.
>>>>>>
>>>>>> Another approach might be to ask why anyone should need to customize
>> the
>>>>>> way data is stored in the objects if we can just use the best
>> solution
>>>>>> possible in the first place?  I can't imagine a more efficient
>>>>>> representation that fields.  However, fields present difficulties for
>>>> the
>>>>>> use case where you aren't generating unique classes for your model
>> but
>>>> just
>>>>>> rely on the generic class.  In theory this could be addressed via
>>>> runtime
>>>>>> code generation or something else, but that would be quite a change.
>>>>>>
>>>>>> So I'm looking forward to discussing this and toward the future.
>>>>>>
>>>>>> John
>>>>>>
>>>>
>>>>
>>
>>
>>
>> --
>> Best regards,
>> Nikita Timofeev
>>

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Michael Gentry
That makes much more sense!  That'll teach me to sleep-read.  Well,
probably not.  :-)

These are pretty nice improvements overall.  When is 4.1 coming out?  :-)

Thanks,

mrg


On Wed, Jul 5, 2017 at 1:21 PM, Andrus Adamchik <[hidden email]>
wrote:

> >  I'm wondering if you
> > inadvertently switched old vs new in the performance section?  (Since the
> > new, on the right, is always slower.)
>
> The benchmark is million ops per second. So a bigger value is
> better/faster (kind of like RPM in a car).
>
> Andrus
>
> > On Jul 5, 2017, at 7:31 PM, Michael Gentry <[hidden email]> wrote:
> >
> > Hi Nikita,
> >
> > I saw the pull request and was taking a glance at it, so thanks for
> > following up with an e-mail.
> >
> > The memory improvement looks quite nice, but I'm wondering if you
> > inadvertently switched old vs new in the performance section?  (Since the
> > new, on the right, is always slower.)
> >
> > Thanks,
> >
> > mrg
> >
> >
> > On Wed, Jul 5, 2017 at 10:19 AM, Nikita Timofeev <
> [hidden email]>
> > wrote:
> >
> >> Hi all,
> >>
> >> I've run some additional benchmarks for field-based classes inspired
> >> by John and they were so promising, that I've moved on
> >> to the implementation.
> >>
> >> So here is pull request for you to review [1].
> >> Here [2] you can see what new generated classes will look like.
> >>
> >> For me there is no visible downsides in this solution, e.g. both
> >> memory usage and speed are improved.
> >> All tests are clean and the only minor incompatibility out there
> >> is in HOLLOW state that no longer resets object's values [3]
> >> (though this can be implemented as well, I'm just
> >> not sure this is really needed).
> >>
> >> P.S. here is some raw numbers from my benchmarks.
> >> I'm giving absolute numbers, but really only their relation is
> important.
> >> Results for old version are on the left, for new version on the right.
> >>
> >> Memory usage:
> >> ==============
> >> 1. 10.000 small objects
> >> (int, Date and String ~ 20 chars)
> >>>>> 6Mb vs 2.5Mb <<<
> >>
> >> 2. 10.000 objects with big values
> >> (int, Date and String ~ 1K chars)
> >> Actually in case of same classes (same field number),
> >> there will be just constant difference,
> >> so this is just to get idea what to expect in different cases.
> >>>>> 24.5Mb vs 21Mb <<<
> >>
> >> Performance:
> >> ==============
> >> (numbers are in millions ops per sec, measured with JMH benchmark)
> >> 1. Getter:
> >>>>> 107 vs 177 <<<
> >>
> >> 2. Setter:
> >> Not so impressive, as Cayenne stack took most of the
> >> time here to process graph diff, but still new methods are better.
> >>>>> 12.5 vs 14.5 <<<
> >>
> >> 3. readPropertyDirectly:
> >>>>> 152 vs 248 <<<
> >>
> >> 4. writePropertyDirectly:
> >> This is map.put() vs switch(String) battle,
> >> and map definitely loosing it :)
> >>>>> 126 vs 582 <<<
> >>
> >> [1] https://github.com/apache/cayenne/pull/235
> >> [2] https://github.com/stariy95/cayenne/blob/
> >> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> >> test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> >> [3] https://github.com/stariy95/cayenne/blob/
> >> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
> >> test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> >>
> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]>
> wrote:
> >>> I was surprised by the difference in memory too, but this is a small
> diff
> >>> (apart from the newly generated readPropertyDirectly/
> >> writePropertyDirectly
> >>> methods) so there isn't anything else going on.  My unverified
> assumption
> >>> of HashMap is that is doubles in size each time it resizes, so entities
> >>> with more fields could cause more waste. For example a entity with 65
> >>> fields would have 63 empty array slots (ignoring fill factor).  So the
> >>> exact savings may vary.
> >>>
> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> >> [hidden email]>
> >>> wrote:
> >>>
> >>>> I’m also a little surprised at the 1/2-ing… what were the values being
> >>>> stored? I suppose in theory, many values are relatively “small”,
> >>>> memory-wise, so having the overhead of also storing the key could
> >> ~double
> >>>> the memory use, but if you’re storing large values, I wouldn’t expect
> >> the
> >>>> utilization to drop as dramatically. What were your data values (type
> >> and
> >>>> length distribution for strings)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Robert
> >>>>
> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]>
> >> wrote:
> >>>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I'm a little surprised that map-based storage is over 2x worse in
> >> memory
> >>>>> consumption.  I'm wondering if there is more going on here than
> >> storage
> >>>> of
> >>>>> the property values.  Would it be simple enough to adapt your test
> >> case
> >>>> to
> >>>>> compare a list of POJOs vs a list of maps and see what the memory
> >>>> footprint
> >>>>> and difference is that way?
> >>>>>
> >>>>> I personally was thinking the big improvement for using fields
> >> directly
> >>>> is
> >>>>> the speed improvement.  I didn't think the memory consumption
> >> difference
> >>>>> would be that dramatic.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> mrg
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]>
> >> wrote:
> >>>>>
> >>>>>> I did some experimenting recently to see if changes to the way data
> >> in
> >>>>>> stored in Cayenne objects could reduce the amount of memory they
> >>>> consume.
> >>>>>>
> >>>>>> I chose to use separate fields for each property instead of a
> HashMap
> >>>>>> (which is what CayenneDataObject uses).  The results were very
> >>>> affirming.
> >>>>>> For my test of loading 10,000 objects from every table in my
> >> database I
> >>>> got
> >>>>>> it to use about about *half the memory* of the default class (from
> >> 921
> >>>> MB
> >>>>>> down to 431 MB).
> >>>>>>
> >>>>>> I know there has been some discussion already about addressing this
> >>>> topic
> >>>>>> for the next major release, so I thought I'd throw in some
> >> observations
> >>>> /
> >>>>>> questions here.
> >>>>>>
> >>>>>> For my implementation I subclassed CayenneDataObject because in
> >> previous
> >>>>>> experience I found implementing a replacement to be much more
> >> difficult
> >>>> and
> >>>>>> subject to more bugs due to the less frequently used code path that
> >>>>>> PersistentObject and it's descriptors take you down.  My apps rely
> on
> >>>>>> things that are sort of specific to CayenneDataObject like
> >> Validating.
> >>>>>>
> >>>>>> So one question is how we should be addressing the need that people
> >> may
> >>>>>> have to create their own data classes. Right now I believe the
> >>>> recommended
> >>>>>> path is to subclass PersistentObject, but I'm not convinced that
> that
> >>>> is a
> >>>>>> viable solution without wholesale copying most of CayenneDataObject
> >> into
> >>>>>> your subclass.  I'd rather see a fuller base class (in addition to
> >>>> keeping
> >>>>>> PersistentObject around) that includes all of CayenneDataObject
> >> except
> >>>> the
> >>>>>> property storage (HashMap).
> >>>>>>
> >>>>>> For my implementation I had to modify CayenneDataObject, but only
> >>>> slightly
> >>>>>> to avoid creating the HashMap which I wasn't using. However, because
> >>>> class
> >>>>>> isn't really intended for customization this map is referenced in
> >>>> multiple
> >>>>>> methods that can't easily be overridden to change the way things are
> >>>>>> stored.
> >>>>>>
> >>>>>> Another approach might be to ask why anyone should need to customize
> >> the
> >>>>>> way data is stored in the objects if we can just use the best
> >> solution
> >>>>>> possible in the first place?  I can't imagine a more efficient
> >>>>>> representation that fields.  However, fields present difficulties
> for
> >>>> the
> >>>>>> use case where you aren't generating unique classes for your model
> >> but
> >>>> just
> >>>>>> rely on the generic class.  In theory this could be addressed via
> >>>> runtime
> >>>>>> code generation or something else, but that would be quite a change.
> >>>>>>
> >>>>>> So I'm looking forward to discussing this and toward the future.
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Nikita Timofeev
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik
In reply to this post by Nikita Timofeev
The fact that we can switch to field-based DataObjects with minimal effort and without sacrificing a single thing in the Cayenne design is a *very* big deal! Thanks John for bringing the possibility to everyone's attention, and Nikita - for the working code and benchmarks.

I am going to try this out on a real app some time next week. Very exciting! :)

Andrus


> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <[hidden email]> wrote:
>
> Hi all,
>
> I've run some additional benchmarks for field-based classes inspired
> by John and they were so promising, that I've moved on
> to the implementation.
>
> So here is pull request for you to review [1].
> Here [2] you can see what new generated classes will look like.
>
> For me there is no visible downsides in this solution, e.g. both
> memory usage and speed are improved.
> All tests are clean and the only minor incompatibility out there
> is in HOLLOW state that no longer resets object's values [3]
> (though this can be implemented as well, I'm just
> not sure this is really needed).
>
> P.S. here is some raw numbers from my benchmarks.
> I'm giving absolute numbers, but really only their relation is important.
> Results for old version are on the left, for new version on the right.
>
> Memory usage:
> ==============
> 1. 10.000 small objects
> (int, Date and String ~ 20 chars)
>>>> 6Mb vs 2.5Mb <<<
>
> 2. 10.000 objects with big values
> (int, Date and String ~ 1K chars)
> Actually in case of same classes (same field number),
> there will be just constant difference,
> so this is just to get idea what to expect in different cases.
>>>> 24.5Mb vs 21Mb <<<
>
> Performance:
> ==============
> (numbers are in millions ops per sec, measured with JMH benchmark)
> 1. Getter:
>>>> 107 vs 177 <<<
>
> 2. Setter:
> Not so impressive, as Cayenne stack took most of the
> time here to process graph diff, but still new methods are better.
>>>> 12.5 vs 14.5 <<<
>
> 3. readPropertyDirectly:
>>>> 152 vs 248 <<<
>
> 4. writePropertyDirectly:
> This is map.put() vs switch(String) battle,
> and map definitely loosing it :)
>>>> 126 vs 582 <<<
>
> [1] https://github.com/apache/cayenne/pull/235
> [2] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> [3] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>
> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]> wrote:
>> I was surprised by the difference in memory too, but this is a small diff
>> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
>> methods) so there isn't anything else going on.  My unverified assumption
>> of HashMap is that is doubles in size each time it resizes, so entities
>> with more fields could cause more waste. For example a entity with 65
>> fields would have 63 empty array slots (ignoring fill factor).  So the
>> exact savings may vary.
>>
>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <[hidden email]>
>> wrote:
>>
>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>> stored? I suppose in theory, many values are relatively “small”,
>>> memory-wise, so having the overhead of also storing the key could ~double
>>> the memory use, but if you’re storing large values, I wouldn’t expect the
>>> utilization to drop as dramatically. What were your data values (type and
>>> length distribution for strings)?
>>>
>>> Thanks!
>>>
>>> Robert
>>>
>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]> wrote:
>>>>
>>>> Hi John,
>>>>
>>>> I'm a little surprised that map-based storage is over 2x worse in memory
>>>> consumption.  I'm wondering if there is more going on here than storage
>>> of
>>>> the property values.  Would it be simple enough to adapt your test case
>>> to
>>>> compare a list of POJOs vs a list of maps and see what the memory
>>> footprint
>>>> and difference is that way?
>>>>
>>>> I personally was thinking the big improvement for using fields directly
>>> is
>>>> the speed improvement.  I didn't think the memory consumption difference
>>>> would be that dramatic.
>>>>
>>>> Thanks,
>>>>
>>>> mrg
>>>>
>>>>
>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:
>>>>
>>>>> I did some experimenting recently to see if changes to the way data in
>>>>> stored in Cayenne objects could reduce the amount of memory they
>>> consume.
>>>>>
>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>> (which is what CayenneDataObject uses).  The results were very
>>> affirming.
>>>>> For my test of loading 10,000 objects from every table in my database I
>>> got
>>>>> it to use about about *half the memory* of the default class (from 921
>>> MB
>>>>> down to 431 MB).
>>>>>
>>>>> I know there has been some discussion already about addressing this
>>> topic
>>>>> for the next major release, so I thought I'd throw in some observations
>>> /
>>>>> questions here.
>>>>>
>>>>> For my implementation I subclassed CayenneDataObject because in previous
>>>>> experience I found implementing a replacement to be much more difficult
>>> and
>>>>> subject to more bugs due to the less frequently used code path that
>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>> things that are sort of specific to CayenneDataObject like Validating.
>>>>>
>>>>> So one question is how we should be addressing the need that people may
>>>>> have to create their own data classes. Right now I believe the
>>> recommended
>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>> is a
>>>>> viable solution without wholesale copying most of CayenneDataObject into
>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>> keeping
>>>>> PersistentObject around) that includes all of CayenneDataObject except
>>> the
>>>>> property storage (HashMap).
>>>>>
>>>>> For my implementation I had to modify CayenneDataObject, but only
>>> slightly
>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>> class
>>>>> isn't really intended for customization this map is referenced in
>>> multiple
>>>>> methods that can't easily be overridden to change the way things are
>>>>> stored.
>>>>>
>>>>> Another approach might be to ask why anyone should need to customize the
>>>>> way data is stored in the objects if we can just use the best solution
>>>>> possible in the first place?  I can't imagine a more efficient
>>>>> representation that fields.  However, fields present difficulties for
>>> the
>>>>> use case where you aren't generating unique classes for your model but
>>> just
>>>>> rely on the generic class.  In theory this could be addressed via
>>> runtime
>>>>> code generation or something else, but that would be quite a change.
>>>>>
>>>>> So I'm looking forward to discussing this and toward the future.
>>>>>
>>>>> John
>>>>>
>>>
>>>
>
>
>
> --
> Best regards,
> Nikita Timofeev

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Robert Zeigler-6
Kudos on the improvements, and to the original developers (Andrus, et al) for a fantastic design. These days, I’ve been doing a lot more Python coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction model more akin to Hibernate, though not as egregious).

Best,

Robert

> On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <[hidden email]> wrote:
>
> The fact that we can switch to field-based DataObjects with minimal effort and without sacrificing a single thing in the Cayenne design is a *very* big deal! Thanks John for bringing the possibility to everyone's attention, and Nikita - for the working code and benchmarks.
>
> I am going to try this out on a real app some time next week. Very exciting! :)
>
> Andrus
>
>
>> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I've run some additional benchmarks for field-based classes inspired
>> by John and they were so promising, that I've moved on
>> to the implementation.
>>
>> So here is pull request for you to review [1].
>> Here [2] you can see what new generated classes will look like.
>>
>> For me there is no visible downsides in this solution, e.g. both
>> memory usage and speed are improved.
>> All tests are clean and the only minor incompatibility out there
>> is in HOLLOW state that no longer resets object's values [3]
>> (though this can be implemented as well, I'm just
>> not sure this is really needed).
>>
>> P.S. here is some raw numbers from my benchmarks.
>> I'm giving absolute numbers, but really only their relation is important.
>> Results for old version are on the left, for new version on the right.
>>
>> Memory usage:
>> ==============
>> 1. 10.000 small objects
>> (int, Date and String ~ 20 chars)
>>>>> 6Mb vs 2.5Mb <<<
>>
>> 2. 10.000 objects with big values
>> (int, Date and String ~ 1K chars)
>> Actually in case of same classes (same field number),
>> there will be just constant difference,
>> so this is just to get idea what to expect in different cases.
>>>>> 24.5Mb vs 21Mb <<<
>>
>> Performance:
>> ==============
>> (numbers are in millions ops per sec, measured with JMH benchmark)
>> 1. Getter:
>>>>> 107 vs 177 <<<
>>
>> 2. Setter:
>> Not so impressive, as Cayenne stack took most of the
>> time here to process graph diff, but still new methods are better.
>>>>> 12.5 vs 14.5 <<<
>>
>> 3. readPropertyDirectly:
>>>>> 152 vs 248 <<<
>>
>> 4. writePropertyDirectly:
>> This is map.put() vs switch(String) battle,
>> and map definitely loosing it :)
>>>>> 126 vs 582 <<<
>>
>> [1] https://github.com/apache/cayenne/pull/235
>> [2] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
>> [3] https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>>
>> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]> wrote:
>>> I was surprised by the difference in memory too, but this is a small diff
>>> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
>>> methods) so there isn't anything else going on.  My unverified assumption
>>> of HashMap is that is doubles in size each time it resizes, so entities
>>> with more fields could cause more waste. For example a entity with 65
>>> fields would have 63 empty array slots (ignoring fill factor).  So the
>>> exact savings may vary.
>>>
>>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <[hidden email]>
>>> wrote:
>>>
>>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>>> stored? I suppose in theory, many values are relatively “small”,
>>>> memory-wise, so having the overhead of also storing the key could ~double
>>>> the memory use, but if you’re storing large values, I wouldn’t expect the
>>>> utilization to drop as dramatically. What were your data values (type and
>>>> length distribution for strings)?
>>>>
>>>> Thanks!
>>>>
>>>> Robert
>>>>
>>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]> wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> I'm a little surprised that map-based storage is over 2x worse in memory
>>>>> consumption.  I'm wondering if there is more going on here than storage
>>>> of
>>>>> the property values.  Would it be simple enough to adapt your test case
>>>> to
>>>>> compare a list of POJOs vs a list of maps and see what the memory
>>>> footprint
>>>>> and difference is that way?
>>>>>
>>>>> I personally was thinking the big improvement for using fields directly
>>>> is
>>>>> the speed improvement.  I didn't think the memory consumption difference
>>>>> would be that dramatic.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> mrg
>>>>>
>>>>>
>>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]> wrote:
>>>>>
>>>>>> I did some experimenting recently to see if changes to the way data in
>>>>>> stored in Cayenne objects could reduce the amount of memory they
>>>> consume.
>>>>>>
>>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>>> (which is what CayenneDataObject uses).  The results were very
>>>> affirming.
>>>>>> For my test of loading 10,000 objects from every table in my database I
>>>> got
>>>>>> it to use about about *half the memory* of the default class (from 921
>>>> MB
>>>>>> down to 431 MB).
>>>>>>
>>>>>> I know there has been some discussion already about addressing this
>>>> topic
>>>>>> for the next major release, so I thought I'd throw in some observations
>>>> /
>>>>>> questions here.
>>>>>>
>>>>>> For my implementation I subclassed CayenneDataObject because in previous
>>>>>> experience I found implementing a replacement to be much more difficult
>>>> and
>>>>>> subject to more bugs due to the less frequently used code path that
>>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>>> things that are sort of specific to CayenneDataObject like Validating.
>>>>>>
>>>>>> So one question is how we should be addressing the need that people may
>>>>>> have to create their own data classes. Right now I believe the
>>>> recommended
>>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>>> is a
>>>>>> viable solution without wholesale copying most of CayenneDataObject into
>>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>>> keeping
>>>>>> PersistentObject around) that includes all of CayenneDataObject except
>>>> the
>>>>>> property storage (HashMap).
>>>>>>
>>>>>> For my implementation I had to modify CayenneDataObject, but only
>>>> slightly
>>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>>> class
>>>>>> isn't really intended for customization this map is referenced in
>>>> multiple
>>>>>> methods that can't easily be overridden to change the way things are
>>>>>> stored.
>>>>>>
>>>>>> Another approach might be to ask why anyone should need to customize the
>>>>>> way data is stored in the objects if we can just use the best solution
>>>>>> possible in the first place?  I can't imagine a more efficient
>>>>>> representation that fields.  However, fields present difficulties for
>>>> the
>>>>>> use case where you aren't generating unique classes for your model but
>>>> just
>>>>>> rely on the generic class.  In theory this could be addressed via
>>>> runtime
>>>>>> code generation or something else, but that would be quite a change.
>>>>>>
>>>>>> So I'm looking forward to discussing this and toward the future.
>>>>>>
>>>>>> John
>>>>>>
>>>>
>>>>
>>
>>
>>
>> --
>> Best regards,
>> Nikita Timofeev
>

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

John Huss
I'm very glad to see this moving forward! Very exciting! Thanks for your
work on this.
On Thu, Jul 6, 2017 at 8:32 AM Robert Zeigler <[hidden email]>
wrote:

> Kudos on the improvements, and to the original developers (Andrus, et al)
> for a fantastic design. These days, I’ve been doing a lot more Python
> coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I
> still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction
> model more akin to Hibernate, though not as egregious).
>
> Best,
>
> Robert
>
> > On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <[hidden email]>
> wrote:
> >
> > The fact that we can switch to field-based DataObjects with minimal
> effort and without sacrificing a single thing in the Cayenne design is a
> *very* big deal! Thanks John for bringing the possibility to everyone's
> attention, and Nikita - for the working code and benchmarks.
> >
> > I am going to try this out on a real app some time next week. Very
> exciting! :)
> >
> > Andrus
> >
> >
> >> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <[hidden email]>
> wrote:
> >>
> >> Hi all,
> >>
> >> I've run some additional benchmarks for field-based classes inspired
> >> by John and they were so promising, that I've moved on
> >> to the implementation.
> >>
> >> So here is pull request for you to review [1].
> >> Here [2] you can see what new generated classes will look like.
> >>
> >> For me there is no visible downsides in this solution, e.g. both
> >> memory usage and speed are improved.
> >> All tests are clean and the only minor incompatibility out there
> >> is in HOLLOW state that no longer resets object's values [3]
> >> (though this can be implemented as well, I'm just
> >> not sure this is really needed).
> >>
> >> P.S. here is some raw numbers from my benchmarks.
> >> I'm giving absolute numbers, but really only their relation is
> important.
> >> Results for old version are on the left, for new version on the right.
> >>
> >> Memory usage:
> >> ==============
> >> 1. 10.000 small objects
> >> (int, Date and String ~ 20 chars)
> >>>>> 6Mb vs 2.5Mb <<<
> >>
> >> 2. 10.000 objects with big values
> >> (int, Date and String ~ 1K chars)
> >> Actually in case of same classes (same field number),
> >> there will be just constant difference,
> >> so this is just to get idea what to expect in different cases.
> >>>>> 24.5Mb vs 21Mb <<<
> >>
> >> Performance:
> >> ==============
> >> (numbers are in millions ops per sec, measured with JMH benchmark)
> >> 1. Getter:
> >>>>> 107 vs 177 <<<
> >>
> >> 2. Setter:
> >> Not so impressive, as Cayenne stack took most of the
> >> time here to process graph diff, but still new methods are better.
> >>>>> 12.5 vs 14.5 <<<
> >>
> >> 3. readPropertyDirectly:
> >>>>> 152 vs 248 <<<
> >>
> >> 4. writePropertyDirectly:
> >> This is map.put() vs switch(String) battle,
> >> and map definitely loosing it :)
> >>>>> 126 vs 582 <<<
> >>
> >> [1] https://github.com/apache/cayenne/pull/235
> >> [2]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> >> [3]
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> >>
> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[hidden email]>
> wrote:
> >>> I was surprised by the difference in memory too, but this is a small
> diff
> >>> (apart from the newly generated
> readPropertyDirectly/writePropertyDirectly
> >>> methods) so there isn't anything else going on.  My unverified
> assumption
> >>> of HashMap is that is doubles in size each time it resizes, so entities
> >>> with more fields could cause more waste. For example a entity with 65
> >>> fields would have 63 empty array slots (ignoring fill factor).  So the
> >>> exact savings may vary.
> >>>
> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> I’m also a little surprised at the 1/2-ing… what were the values being
> >>>> stored? I suppose in theory, many values are relatively “small”,
> >>>> memory-wise, so having the overhead of also storing the key could
> ~double
> >>>> the memory use, but if you’re storing large values, I wouldn’t expect
> the
> >>>> utilization to drop as dramatically. What were your data values (type
> and
> >>>> length distribution for strings)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> Robert
> >>>>
> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[hidden email]>
> wrote:
> >>>>>
> >>>>> Hi John,
> >>>>>
> >>>>> I'm a little surprised that map-based storage is over 2x worse in
> memory
> >>>>> consumption.  I'm wondering if there is more going on here than
> storage
> >>>> of
> >>>>> the property values.  Would it be simple enough to adapt your test
> case
> >>>> to
> >>>>> compare a list of POJOs vs a list of maps and see what the memory
> >>>> footprint
> >>>>> and difference is that way?
> >>>>>
> >>>>> I personally was thinking the big improvement for using fields
> directly
> >>>> is
> >>>>> the speed improvement.  I didn't think the memory consumption
> difference
> >>>>> would be that dramatic.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> mrg
> >>>>>
> >>>>>
> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[hidden email]>
> wrote:
> >>>>>
> >>>>>> I did some experimenting recently to see if changes to the way data
> in
> >>>>>> stored in Cayenne objects could reduce the amount of memory they
> >>>> consume.
> >>>>>>
> >>>>>> I chose to use separate fields for each property instead of a
> HashMap
> >>>>>> (which is what CayenneDataObject uses).  The results were very
> >>>> affirming.
> >>>>>> For my test of loading 10,000 objects from every table in my
> database I
> >>>> got
> >>>>>> it to use about about *half the memory* of the default class (from
> 921
> >>>> MB
> >>>>>> down to 431 MB).
> >>>>>>
> >>>>>> I know there has been some discussion already about addressing this
> >>>> topic
> >>>>>> for the next major release, so I thought I'd throw in some
> observations
> >>>> /
> >>>>>> questions here.
> >>>>>>
> >>>>>> For my implementation I subclassed CayenneDataObject because in
> previous
> >>>>>> experience I found implementing a replacement to be much more
> difficult
> >>>> and
> >>>>>> subject to more bugs due to the less frequently used code path that
> >>>>>> PersistentObject and it's descriptors take you down.  My apps rely
> on
> >>>>>> things that are sort of specific to CayenneDataObject like
> Validating.
> >>>>>>
> >>>>>> So one question is how we should be addressing the need that people
> may
> >>>>>> have to create their own data classes. Right now I believe the
> >>>> recommended
> >>>>>> path is to subclass PersistentObject, but I'm not convinced that
> that
> >>>> is a
> >>>>>> viable solution without wholesale copying most of CayenneDataObject
> into
> >>>>>> your subclass.  I'd rather see a fuller base class (in addition to
> >>>> keeping
> >>>>>> PersistentObject around) that includes all of CayenneDataObject
> except
> >>>> the
> >>>>>> property storage (HashMap).
> >>>>>>
> >>>>>> For my implementation I had to modify CayenneDataObject, but only
> >>>> slightly
> >>>>>> to avoid creating the HashMap which I wasn't using. However, because
> >>>> class
> >>>>>> isn't really intended for customization this map is referenced in
> >>>> multiple
> >>>>>> methods that can't easily be overridden to change the way things are
> >>>>>> stored.
> >>>>>>
> >>>>>> Another approach might be to ask why anyone should need to
> customize the
> >>>>>> way data is stored in the objects if we can just use the best
> solution
> >>>>>> possible in the first place?  I can't imagine a more efficient
> >>>>>> representation that fields.  However, fields present difficulties
> for
> >>>> the
> >>>>>> use case where you aren't generating unique classes for your model
> but
> >>>> just
> >>>>>> rely on the generic class.  In theory this could be addressed via
> >>>> runtime
> >>>>>> code generation or something else, but that would be quite a change.
> >>>>>>
> >>>>>> So I'm looking forward to discussing this and toward the future.
> >>>>>>
> >>>>>> John
> >>>>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Nikita Timofeev
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Aristedes Maniatis-2
In reply to this post by Nikita Timofeev
On 7/6/17 12:19 AM, Nikita Timofeev wrote:
> I've run some additional benchmarks for field-based classes inspired
> by John and they were so promising, that I've moved on
> to the implementation.
>
> So here is pull request for you to review [1].
> Here [2] you can see what new generated classes will look like.

Nice work. Seems so obvious in hindsight. But all good ideas do.

* Does this make debugging memory with a profiler easier since it makes it simpler to identify field usage?

* What is the impact on ROP?

* Would we want this to be an option for users or is there just no upside to the Map implementation?


Ari


--
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik

> On Jul 7, 2017, at 3:03 AM, Aristedes Maniatis <[hidden email]> wrote:
>
> * Does this make debugging memory with a profiler easier since it makes it simpler to identify field usage?

It makes regular debugging easier, as you no longer need to poke inside the HashMap. Probably also memory profiling (fewer levels of nesting inside the object make tracing retain paths simpler).

> * What is the impact on ROP?

None at the moment.

> * Would we want this to be an option for users or is there just no upside to the Map implementation?

In the past I thought readPropertyDirectly/writePropertyDirectly should be faster with a Map. But as John and Nikita have demonstrated, this is no longer true with Java 7 switch-based implementation. Remaining use cases for Map-based objects in my mind are these:

1. Dynamic creation of OR mapping in runtime. I.e. generic objects.
2. A user code that relies on the Map structure to store unmapped properties ("CayenneDataObject.writePropertyDirectly" will not check the key validity).

Since both the old and the new objects will follow the same unchanged framework/DataObject contract, we will preserve full backwards compatibility, allowing users to stay with Map-based objects if they need to. Really it comes down to a cgen temple selection. So I think in 4.1 we might do the following:

* Make the new cgen template the default (after rigorous testing).
* CayenneDataObject will stay around and will serve as a base for generic objects and Map-based objects.
* The old cgen template will be available as an option for those who are concerned with use case #2.

Andrus
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Michael Gentry
On Fri, Jul 7, 2017 at 1:54 AM, Andrus Adamchik <[hidden email]>
wrote:

> 2. A user code that relies on the Map structure to store unmapped
> properties ("CayenneDataObject.writePropertyDirectly" will not check the
> key validity).
>

I'd suggest people who need to store unmapped properties create their own
separate map/attributes to store them in.  If they choose to store unmapped
properties in a map, they can easily have their own custom superclass that
makes a map available to their objects.

mrg
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik
Ok, some data on the new DataObject structure based on the testing with a real application.

1. UPGRADE:

TL;DR: If you are not doing anything fancy, the upgrade is just regenerating Java classes with a new template. For special cases read on...

I did an upgrade of a large old monolithic system, which is a bit too cosy with the old CayenneDataObject structure. It uses every single utility from cayenne-lifecycle, calls generic property API a lot, and otherwise takes advantage of the underlying Map structure. It was a good system to test this upgrade. Here are the instructions based on that experience beyond rerunning cgen:

* Any vars declared as CayenneDataObject need to be replaced with just DataObject. The new object is still a DataObject, but inherits from the new BaseDataObject.
* Superclass of any custom superclasses of the app persistent objects needs to be changed from CayenneDataObject to BaseDataObject.
* Check all direct invocations of 'read|writeProperty[Directly]'. If all of them are using ORM-mapped property names, you are good.  Otherwise you will need to redefined these methods to fall back to a Map on unknown property. E.g. put [1] in the custom superclass. Going forward I think we may fold this code in a Cayenne superclass (HybridDataObject? :))
* One particularly nasty extension in cayenne-lifecycle was the one handling "UUID relationships" (ObjectIdRelationshipHandler and friends ... hopefully not many people are using this). For each such relationship I had to create an ugly hack [2]. But it seems to work.

2. PERFROMANCE

Now the exciting part. For performance testing I picked a monolithic read-only web service app with dozens (hundreds?) of endpoints. Essentially a huge query cache constantly which is refreshed non-stop via Cayenne queries. Lots of object churn and GC. An ideal app to test memory improvements, and the new structures did not disappoint. My benchmark compared the same app running under Cayenne 4.0.B1 (old) and 4.1 with field-based objects patch (new) on Java 7 and Jetty. The app was warmed up to account for class loading and cache initialization, and was then bombarded with HTTP requests for some time. The results:

* Memory use: new is 49% less than old.
* Time spent in GC (per jstat tool): new is 43% less than old.
* Throughput: new is 27% higher (and climbing as the load rises).

Looks impressive! Mind that these numbers are for the entire web app. Though query cache takes probably 90% of the app memory, so Cayenne optimization is having such a huge overall impact. The memory use drop helped in more than one way (can run on a smaller server; less GC means faster average response times and higher throughput). Just think how much money you can save on AWS costs! :)

So here is my +1 on making field-based DataObject the default in 4.1.

Andrus

-------
[1]

private Map<String, Object> values;

@Override
public Object readPropertyDirectly(String propName) {
        return values != null ? values.get(propName) : null;
}

@Override
public void writePropertyDirectly(String propName, Object val) {

        // no synchronization .. this is used for special cases and is hopefully single-threaded
        if(values == null) {
                values = new HashMap<>();
        }

        values.put(propName, val);
}

[2]

private Factory _uuidFactory;

@Override
public void writePropertyDirectly(String propName, Object val) {
    if(UUID_PROPERTY.equals(propName)) {
        if(val instanceof Factory) {
            _uuidFactory = (Factory) val;
            uuid = null;
            return;
        }
        else {
            _uuidFactory = null;
            uuid = (String) val;
        }
    }

    super.writePropertyDirectly(propName, val);
}

@Override
public Object readPropertyDirectly(String propName) {

    if(UUID_PROPERTY.equals(propName)) {
        if(_uuidFactory != null) {
            return _uuidFactory;
        }
    }

    return super.readPropertyDirectly(propName);
}

Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Michael Gentry
Sounds sweet.  Would be a great blog post on our new website design once
that happens, too.  Then tweet to the blog link...

I'm jealous.  Still getting 3.1 polished off here...


On Thu, Jul 20, 2017 at 2:50 AM, Andrus Adamchik <[hidden email]>
wrote:

> Ok, some data on the new DataObject structure based on the testing with a
> real application.
>
> 1. UPGRADE:
>
> TL;DR: If you are not doing anything fancy, the upgrade is just
> regenerating Java classes with a new template. For special cases read on...
>
> I did an upgrade of a large old monolithic system, which is a bit too cosy
> with the old CayenneDataObject structure. It uses every single utility from
> cayenne-lifecycle, calls generic property API a lot, and otherwise takes
> advantage of the underlying Map structure. It was a good system to test
> this upgrade. Here are the instructions based on that experience beyond
> rerunning cgen:
>
> * Any vars declared as CayenneDataObject need to be replaced with just
> DataObject. The new object is still a DataObject, but inherits from the new
> BaseDataObject.
> * Superclass of any custom superclasses of the app persistent objects
> needs to be changed from CayenneDataObject to BaseDataObject.
> * Check all direct invocations of 'read|writeProperty[Directly]'. If all
> of them are using ORM-mapped property names, you are good.  Otherwise you
> will need to redefined these methods to fall back to a Map on unknown
> property. E.g. put [1] in the custom superclass. Going forward I think we
> may fold this code in a Cayenne superclass (HybridDataObject? :))
> * One particularly nasty extension in cayenne-lifecycle was the one
> handling "UUID relationships" (ObjectIdRelationshipHandler and friends ...
> hopefully not many people are using this). For each such relationship I had
> to create an ugly hack [2]. But it seems to work.
>
> 2. PERFROMANCE
>
> Now the exciting part. For performance testing I picked a monolithic
> read-only web service app with dozens (hundreds?) of endpoints. Essentially
> a huge query cache constantly which is refreshed non-stop via Cayenne
> queries. Lots of object churn and GC. An ideal app to test memory
> improvements, and the new structures did not disappoint. My benchmark
> compared the same app running under Cayenne 4.0.B1 (old) and 4.1 with
> field-based objects patch (new) on Java 7 and Jetty. The app was warmed up
> to account for class loading and cache initialization, and was then
> bombarded with HTTP requests for some time. The results:
>
> * Memory use: new is 49% less than old.
> * Time spent in GC (per jstat tool): new is 43% less than old.
> * Throughput: new is 27% higher (and climbing as the load rises).
>
> Looks impressive! Mind that these numbers are for the entire web app.
> Though query cache takes probably 90% of the app memory, so Cayenne
> optimization is having such a huge overall impact. The memory use drop
> helped in more than one way (can run on a smaller server; less GC means
> faster average response times and higher throughput). Just think how much
> money you can save on AWS costs! :)
>
> So here is my +1 on making field-based DataObject the default in 4.1.
>
> Andrus
>
> -------
> [1]
>
> private Map<String, Object> values;
>
> @Override
> public Object readPropertyDirectly(String propName) {
>         return values != null ? values.get(propName) : null;
> }
>
> @Override
> public void writePropertyDirectly(String propName, Object val) {
>
>         // no synchronization .. this is used for special cases and is
> hopefully single-threaded
>         if(values == null) {
>                 values = new HashMap<>();
>         }
>
>         values.put(propName, val);
> }
>
> [2]
>
> private Factory _uuidFactory;
>
> @Override
> public void writePropertyDirectly(String propName, Object val) {
>     if(UUID_PROPERTY.equals(propName)) {
>         if(val instanceof Factory) {
>             _uuidFactory = (Factory) val;
>             uuid = null;
>             return;
>         }
>         else {
>             _uuidFactory = null;
>             uuid = (String) val;
>         }
>     }
>
>     super.writePropertyDirectly(propName, val);
> }
>
> @Override
> public Object readPropertyDirectly(String propName) {
>
>     if(UUID_PROPERTY.equals(propName)) {
>         if(_uuidFactory != null) {
>             return _uuidFactory;
>         }
>     }
>
>     return super.readPropertyDirectly(propName);
> }
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Cayenne object storage / memory usage

Andrus Adamchik
I did mention that it is coming: https://twitter.com/andrus_a/status/887931800705798144

> On Jul 20, 2017, at 1:59 PM, Michael Gentry <[hidden email]> wrote:
>
> Sounds sweet.  Would be a great blog post on our new website design once
> that happens, too.  Then tweet to the blog link...
>
> I'm jealous.  Still getting 3.1 polished off here...
>
>
> On Thu, Jul 20, 2017 at 2:50 AM, Andrus Adamchik <[hidden email]>
> wrote:
>
>> Ok, some data on the new DataObject structure based on the testing with a
>> real application.
>>
>> 1. UPGRADE:
>>
>> TL;DR: If you are not doing anything fancy, the upgrade is just
>> regenerating Java classes with a new template. For special cases read on...
>>
>> I did an upgrade of a large old monolithic system, which is a bit too cosy
>> with the old CayenneDataObject structure. It uses every single utility from
>> cayenne-lifecycle, calls generic property API a lot, and otherwise takes
>> advantage of the underlying Map structure. It was a good system to test
>> this upgrade. Here are the instructions based on that experience beyond
>> rerunning cgen:
>>
>> * Any vars declared as CayenneDataObject need to be replaced with just
>> DataObject. The new object is still a DataObject, but inherits from the new
>> BaseDataObject.
>> * Superclass of any custom superclasses of the app persistent objects
>> needs to be changed from CayenneDataObject to BaseDataObject.
>> * Check all direct invocations of 'read|writeProperty[Directly]'. If all
>> of them are using ORM-mapped property names, you are good.  Otherwise you
>> will need to redefined these methods to fall back to a Map on unknown
>> property. E.g. put [1] in the custom superclass. Going forward I think we
>> may fold this code in a Cayenne superclass (HybridDataObject? :))
>> * One particularly nasty extension in cayenne-lifecycle was the one
>> handling "UUID relationships" (ObjectIdRelationshipHandler and friends ...
>> hopefully not many people are using this). For each such relationship I had
>> to create an ugly hack [2]. But it seems to work.
>>
>> 2. PERFROMANCE
>>
>> Now the exciting part. For performance testing I picked a monolithic
>> read-only web service app with dozens (hundreds?) of endpoints. Essentially
>> a huge query cache constantly which is refreshed non-stop via Cayenne
>> queries. Lots of object churn and GC. An ideal app to test memory
>> improvements, and the new structures did not disappoint. My benchmark
>> compared the same app running under Cayenne 4.0.B1 (old) and 4.1 with
>> field-based objects patch (new) on Java 7 and Jetty. The app was warmed up
>> to account for class loading and cache initialization, and was then
>> bombarded with HTTP requests for some time. The results:
>>
>> * Memory use: new is 49% less than old.
>> * Time spent in GC (per jstat tool): new is 43% less than old.
>> * Throughput: new is 27% higher (and climbing as the load rises).
>>
>> Looks impressive! Mind that these numbers are for the entire web app.
>> Though query cache takes probably 90% of the app memory, so Cayenne
>> optimization is having such a huge overall impact. The memory use drop
>> helped in more than one way (can run on a smaller server; less GC means
>> faster average response times and higher throughput). Just think how much
>> money you can save on AWS costs! :)
>>
>> So here is my +1 on making field-based DataObject the default in 4.1.
>>
>> Andrus
>>
>> -------
>> [1]
>>
>> private Map<String, Object> values;
>>
>> @Override
>> public Object readPropertyDirectly(String propName) {
>>        return values != null ? values.get(propName) : null;
>> }
>>
>> @Override
>> public void writePropertyDirectly(String propName, Object val) {
>>
>>        // no synchronization .. this is used for special cases and is
>> hopefully single-threaded
>>        if(values == null) {
>>                values = new HashMap<>();
>>        }
>>
>>        values.put(propName, val);
>> }
>>
>> [2]
>>
>> private Factory _uuidFactory;
>>
>> @Override
>> public void writePropertyDirectly(String propName, Object val) {
>>    if(UUID_PROPERTY.equals(propName)) {
>>        if(val instanceof Factory) {
>>            _uuidFactory = (Factory) val;
>>            uuid = null;
>>            return;
>>        }
>>        else {
>>            _uuidFactory = null;
>>            uuid = (String) val;
>>        }
>>    }
>>
>>    super.writePropertyDirectly(propName, val);
>> }
>>
>> @Override
>> public Object readPropertyDirectly(String propName) {
>>
>>    if(UUID_PROPERTY.equals(propName)) {
>>        if(_uuidFactory != null) {
>>            return _uuidFactory;
>>        }
>>    }
>>
>>    return super.readPropertyDirectly(propName);
>> }
>>
>>

12