Home

GATE Village Performance

No replies
Dave Kinchlea's picture
Dave Kinchlea
Offline
Joined: 2009-04-22

Next to security, Performance is arguably the least understood characteristic of web applications. One old colleague put it best when he talked about there being really only two states: satisfactory and not-satisfactory ... there is really no middle ground there, things are either fast enough or they are not. Unfortunately, it is also nearly impossible to know ahead of time exactly what "fast enough" is ... with "Google-speed" setting the bar for public web sites and CMS systems using authenticated almost exclusively, there is a serious performance problem waiting in the wings at all time. This was true for Livelink and no less true for Drupal.

Content Management Solutions are usually dynamic in nature, they provide a new page every time a page is visited. This is typically because the visited page will likely be viewed differently for different users. CMS almost always implies security .. that there is content within the system that should not be accessible to some users. With a CMS system like Livelink that uses object-level Access Control (where every single object has it's own modifiable ACL) this is a particular burden because end-users didn't understand (nor care) about the model and so would do things that make sense to a security model that is tuned to containers (file folders, mail boxes etc) and then wonder why their fancy million-dollar ECM software sucked mud when, for instance, they put 1000 files in a folder.

With it's notion of Roles built in, Drupal provides the ability to secure access without too badly affecting performance. Add to that the content-centric (node) point of view and it was fairly clear that Drupal will scale to large volumes of content without much concern to transactional performance --- for most transactions at least.

Volume Scaling

There are two volumes to be concerned with: 1) People, 2) Content. With a stable site (one that doesn't see functional or semantic changes), growth of both users and content really amount to scaling of the database. The good part of a relational database is that when there are enough physical resources available, size of a table has little or no bearing on the cost of accessing a record provided an indexed key is known. This is, of course, not true for all DB calls especially those designed to find a key (searching) -- and this is one of the key problems of Drupal; it is built by a very large group of independent developers of varying experience. There are, without any doubt at all, a large percentage of modules that are using search and other DB algorithms that are only functional at relatively low volume. A 2N search algorithm is always inefficient but not even a noticeable problem until N gets large at which point a Log(n) algorithm is the obvious choice ... my point is that it is NOT obvious until the problem surfaces. (and just to be crystal clear, here, searching using an inefficient algorithm is only possible problem; I do not know what the problems will be but my 25 years of experience tells me there will be issues!

Another concern I have is modules that create lists of content without much thought about how big the lists might be. Like searching, there are architectural limits that prevent certain approaches from being used when large amounts of content is involved -- but unlike searching which at worst will use too much CPU, these sorts of problems can wreak havoc as the system attempts to provide the required resources -- a table join that attempts to allocate so much RAM that it causes other parts of the DB and/or other processes to swap out to disk can suddenly cause excessive paging and sometimes even swap-thrashing; a temporary file that is suddenly gigabytes in size; a slew of file descriptors ... there are many possibilities. Again, if you've never seen this behaviour before and haven't been taught to avoid the issue then it is likely your code is vulnerable to performance concerns with every order of magnitude of growth, You can't know what you don't know!

Complexity Scaling

Only in rare cases in which resource starvation occurs does volume scaling cause an immediate and recognizable drop in performance, it is something that is easily detected through trend analysis of historical monitoring and it will almost always be a gradual decline in performance. There are almost always ways to alleviate the symptoms and there are very likely existing approaches to rectify the situation. But performance issues more often come about by the addition of functionality rather than volume and these are sometimes much more difficult to address.

This is my biggest concern for the approach I've taken; because of the HUGE amount of functionality GATEGATE Village provides it's community builders and the lack of explicit control we will extend towards our Trusted Neighbourhoods, the desired functionality will be provided whenever feasible, I know there will be some communities that will grow to beyond the abilities of some important piece of functionality and be faced with some difficult choices. That is probably unavoidable under any circumstance, but particularly so with our model.

There are really no technical solutions to this problem, only business ones. Our most powerful weapon to combat this is our Managed Service approach ... we have a rigid process in place in order that a new module (or new feature of an existing module) be enabled and we are unwilling to deviate from that process. The process is, more or less:

  1. Determine the code plays nice with existing code
  2. Determine whether the code affects everybody or only specific people
  3. Try to determine transactional performance impact on existing site (anything over 100ms is red flag)
  4. Ensure no significant difference with large user database (1,000, 10,000, and 100,000 users) -- small incremental differences are expected for each but if there is any indication of performance being tied to volume of users then we have a concern and perhaps a red flagNote that sites such as Facebook show there are still a few other levels a very successful site might achieve but we are in business too and that sort of scale is outside our current abilities
  5. Ensure no significant difference in the face of content volume -- this is more difficult to simulate, particularly for GATEGATE Village which has dozens of distinct content types and fully expects that to grow to hundreds if not thousands because not all content is equal in Drupal; prudent business keeps the testing limited to just an increase in the node table and any content types that the module in question explicitly creates or uses. Again in increments of 1,000, 10,000, and 100,000+ to try to determine whether there is a relationship between volume and performance.

This process doesn't guarantee anything of course, but it does provide a level of assurance that problems won't show up at the first sign of a successful community!

 

Member Login