How NOT to Overlook Large Datasets in Drupal

How to handle big datasets in Drupal is often an overlooked dimension when discussing how to scale Drupal. People typically worry how to scale Drupal's handling of web traffic, but largely ignore discussing what happens when you are faced with loading large amounts of data all at once.

The Drupal core has a built-in assumption that menus and vocabularies are small. Pushing these built-in limits causes problems. Loading all of the data within a module can also cause problems. Particularly with taxonomy and menus. For example, look at these functions:

The time to load /admin/structure/menu/manage/%menu is X(N^2), with a practical limit of approzimately 500 menu items. 

Also, /admin/structure/taxonomy/tags consumes php memory linearly. Typically, you'll start to see problems when you're around 30,000 terms or more.

How do I prevent these problems?

There isn't much you can do to prevent these types of big dataset issues in Drupal. You just have to plan and benchmark thoroughly!

If you're writing a module, try and take some of these into consideration when coding for big datasets:

  • Use pagination where appropriate.
  • Don't write API functions that load too much data.
  • Avoid multiplying datasets together.
  • Load data in the background with AJAX.
  • Benchmark and document your potential scalability limits. Can't stress this enough!
Profile picture for user John Clauss