huge to-many relationships

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

huge to-many relationships

Andrus Adamchik
Some to-many relationships have reasonably small size. Others do not. Tens, hundreds, even thousands of objects are all "reasonable". Millions are not. Often relationships from "lookup" / configuration tables are in the later category (e.g. "user_type" -> "user", where "user" has millions of records). My "normal" approach to such relationships is to avoid them all together. I map them on Db* side, but skip them on the Obj* side. Cayenne supports one-way relationships and it works pretty well. The assumption is that there are no realistic scenarios for traversing such relationships in memory all at once. If I ever need all users for a given type, I'd write a query and run it as an iterator, or add an extra qualifier to work with a smaller subset of users.

Today I ran into an issue where I could not easily bypass these relationships (and as expected all these objects are faulted into memory). The scenario is "Deny" delete rule (e.g. when deleting a user_type, check that there are no users for this type, and throw DeleteDenyException otherwise). If I don't map "user_type" -> "user" relationship, I can't setup the "Deny" rule. The delete is still denied due to the DB-side FK constraint, but I can't build a user-friendly error message.

I see a few ways to solve this. Anyways:

1. We can move delete rules from ObjRelationship to DbRelationship level. And then implement the delete rule to check in-memory relationship first, and if not faulted, run some form of EXISTS query that checks related records presence without faulting them in memory. That's a model change. And it sidesteps dealing with huge relationships in general, focusing on this single case. Feels like one of those minor edge cases that will require a massive refactoring effort to fix :)

2. We can treat these huge ObjRelationships as a special type of relationships (marked as such in the Modeler), and apply special strategies to them. E.g. pagination on faulting, List.isEmpty() and List.size() resolved without faulting (thus allowing to deal with Deny rule).

3. And the simplest of them - don't map such relationships, don't map delete rules, and handle "Deny" in a 'validateForDelete'.

Both 1&2 have benefits and downsides. I am still undecided how to better handle it (and whether I should bother at all, and instead just use #3). In any event I figured I'd mention it here. Perhaps somebody has some thoughts on it.

Cheers,
Andrus
Reply | Threaded
Open this post in threaded view
|

Re: huge to-many relationships

Robert Zeigler-6
I’m in favor of #2. I implemented a solution similar to this a long time ago for an app that had a few expensive relationships.  I wouldn’t recommend my implementation (I wrote it about 12 years ago at this point), but the approach worked well.

Robert

> On Oct 9, 2017, at 6:54 AM, Andrus Adamchik <[hidden email]> wrote:
>
> Some to-many relationships have reasonably small size. Others do not. Tens, hundreds, even thousands of objects are all "reasonable". Millions are not. Often relationships from "lookup" / configuration tables are in the later category (e.g. "user_type" -> "user", where "user" has millions of records). My "normal" approach to such relationships is to avoid them all together. I map them on Db* side, but skip them on the Obj* side. Cayenne supports one-way relationships and it works pretty well. The assumption is that there are no realistic scenarios for traversing such relationships in memory all at once. If I ever need all users for a given type, I'd write a query and run it as an iterator, or add an extra qualifier to work with a smaller subset of users.
>
> Today I ran into an issue where I could not easily bypass these relationships (and as expected all these objects are faulted into memory). The scenario is "Deny" delete rule (e.g. when deleting a user_type, check that there are no users for this type, and throw DeleteDenyException otherwise). If I don't map "user_type" -> "user" relationship, I can't setup the "Deny" rule. The delete is still denied due to the DB-side FK constraint, but I can't build a user-friendly error message.
>
> I see a few ways to solve this. Anyways:
>
> 1. We can move delete rules from ObjRelationship to DbRelationship level. And then implement the delete rule to check in-memory relationship first, and if not faulted, run some form of EXISTS query that checks related records presence without faulting them in memory. That's a model change. And it sidesteps dealing with huge relationships in general, focusing on this single case. Feels like one of those minor edge cases that will require a massive refactoring effort to fix :)
>
> 2. We can treat these huge ObjRelationships as a special type of relationships (marked as such in the Modeler), and apply special strategies to them. E.g. pagination on faulting, List.isEmpty() and List.size() resolved without faulting (thus allowing to deal with Deny rule).
>
> 3. And the simplest of them - don't map such relationships, don't map delete rules, and handle "Deny" in a 'validateForDelete'.
>
> Both 1&2 have benefits and downsides. I am still undecided how to better handle it (and whether I should bother at all, and instead just use #3). In any event I figured I'd mention it here. Perhaps somebody has some thoughts on it.
>
> Cheers,
> Andrus

Reply | Threaded
Open this post in threaded view
|

Re: huge to-many relationships

Aristedes Maniatis-2
In reply to this post by Andrus Adamchik
On 9/10/17 9:54PM, Andrus Adamchik wrote:
> 2. We can treat these huge ObjRelationships as a special type of relationships (marked as such in the Modeler), and apply special strategies to them. E.g. pagination on faulting, List.isEmpty() and List.size() resolved without faulting (thus allowing to deal with Deny rule).

Pagination would be such a great addition, and you could set the page size effectively to 0 to bring in just a big list of hollow objects.

Today, queries have pagination and a query cache to improve performance on very large data sets. Is there any equivalent of a query cache when faulting a to-many relation?

Ari


--
-------------------------->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
Reply | Threaded
Open this post in threaded view
|

Re: huge to-many relationships

Andrus Adamchik

> On Oct 10, 2017, at 4:47 AM, Aristedes Maniatis <[hidden email]> wrote:
>
> On 9/10/17 9:54PM, Andrus Adamchik wrote:
>> 2. We can treat these huge ObjRelationships as a special type of relationships (marked as such in the Modeler), and apply special strategies to them. E.g. pagination on faulting, List.isEmpty() and List.size() resolved without faulting (thus allowing to deal with Deny rule).
>
> Pagination would be such a great addition, and you could set the page size effectively to 0 to bring in just a big list of hollow objects.
>
> Today, queries have pagination and a query cache to improve performance on very large data sets. Is there any equivalent of a query cache when faulting a to-many relation?

We don't support this now, but I've been thinking for some time about resolving certain to-many relationships from query cache as if they were query results. So maybe we are onto something here. Perhaps we should pursue "relationships strategies" feature (caching, pagination, lazy size eval).

Andrus