>> BENJI FISHER: Hello, thank you for coming to this talk on caching large navigation menus in Drupal. I'm Benji Fisher. I'm no longer sharing my video so you can't see my funny hat anymore. But there's a reason for that. A few things before we get started. I left this slide in from previous versions of the talk, and I guess we've done the equivalent of that. You can follow along. Just click on this link. BenjiFisher.gitlab.IO/slide decks/index.HTML. Today's talk is at the top of the list. For the sake of the recording, here's my lovely HTML unstyled list and if you click on the first link, you'll get the version of the slides on my GitLab pages which should be identical to the one I'm paging through now.
And there's a recent development, just last week, a certain issue on Drupal.org was fixed, an issue I created about the time that I was working on the stuff I'm talking about in this presentation. An issue was book Manager build items is slow because it loads nodes, and one of my comments on that issue is, I'm pretty sure that I need both the optimization in this issue and some additional caching in order to improve my page load times. I tried using each by itself and did not find any improvement.
So that issue is now fixed. It should be in the next release of I guess the first release of Drupal 8.9. We haven't gotten out of alpha yet and when 9.0 beta comes out, which I believe is scheduled for this week still, that little optimization should be in there. It's a really tiny little minus 2, plus 1 change to make that work.
Yeah, we already went through this, but I encourage you to interrupt me, ask questions, don't be shy. I would rather have questions as they come up than have you save them till the end of the presentation. If it is something that I have already prepared a slide for, I will ask you to be patient, but don't let that stop you.
So the introduction, about me, remember I mentioned the funny hat, that's supposed to give people a clue that I am the person who owns this avatar, the yellow pig. I'm Benji Fisher on most places, Drupal.org, GitHub and GitLab. I was kind of slow to join Twitter so by the time I did somebody already had that user name and I inserted my favorite number 17 to get a unique Twitter handle and the 17 is connected to the yellow pig. If you count carefully there are 17 eyelashes, 7 on the left, 8 on the right. If you Google around for yellow pigs you can probably figure out what that connection is or you can ask me although maybe that's a question to save till the end.
Next slide I usually put in somebody about where I'm working but I'm currently in between jobs so I don't have a company to promote at this point. And the outline of the talk, we've already been through the get started section. This is the last slide on the introduction section is. If you are following along at home and you're looking at this presentation is in reveal.js so if you just click escape you can get the outline view and sort of across the top, you see the various sections that are in this outline. And then the coming sections are an explanation of why we need to do caching for book navigation. I'll talk a little bit about internal versus external cache.
And then I'll go through sort of the soup to nuts of how we get the stuff I'm talking about on to the Drupal page, so there's a Twig template. There's some Drupal code associated with that Twig template and then we'll start focusing in on hook node view which is the Drupal hook where we implement the caching. Then we'll have a look at what's going on in the database to implement the caching and then a few closing sides. As I said this is the end of the introduction so we next go to book navigation needs to be cached.
So this is a project I worked on a few years ago for Pega Systems, and they have a lot of documentation, and they are using Drupal's book module to present this documentation. And here's just a screen shot of one page from the documentation, settings Tab, section form, and for this talk what I want to focus on is this navigation area on the left.
Can you see, by the way, as I move my mouse around the page?
>> Looking good, Benji, looking fantastic.
>> BENJI FISHER: Okay. So as you can see there's this huge menu of pages, and there's some little "down" arrows indicating that a section has been opened up, and then the sort of the right arrow means a collapsed section, and enough of the navigation is opened up to get to the link for yes, it's also in bold here near the bottom. Enough of the navigation is opened up to show the current page. So I guess that's the point for this slide. So here are some numbers. There were 4,472 pages in that book the last time I checked.
The rendered navigation is 2.7 megabytes. Half of that is because we were rendering it twice, once for mobile and once for desktop. That was later fixed.
The initial page load took 40 to 50 seconds. And even after implementing some caching, it took 6 to 9 pages to load, and I was able to reduce that cached page load to 2 to 3 seconds which is still bad and I don't normally jump up and down saying hey, I reduced page load times to 2 to 3 seconds! Because I would be a laughing stock if I did that.
But it's a big improvement over what we had before. So I'm not going from bad to great here. I'm going from horrendous to just bad, or possibly even fair. All of these times are not counting any network lag. I was just testing this on my local, and I wasn't using anything fancy to judge the times. I was literally just starting a timer, and starting the page reload, and seeing how long it took.
And I guess actually let me back up a little bit.
And you tell me, just looking at this page, and looking at that navigation section, do you know what the HTML for that looks like?
>> Should be an unordered list?
>> BENJI FISHER: Not just one unordered list.
>> Many, many nested unordered lists.
>> BENJI FISHER: Exactly. Many nested unordered lists with a few classes, especially the class to identify the active trail. The lists that have to be open to expose the link to the current page.
We'd have to make sure to rebuild the cache for every single page in the book. So I want to have that thing cached just once for the entire book, and then no matter what page you're looking at, you can get the cached version.
So that's my strategy. So how do we go about implementing that?
So first let me talk a little about internal versus external cache. This is one of the strengths of Drupal 8. It is really smart about caching. A lot of work went into making what I do for this project very simple. But it's still kind of complicated so what I'm working with here is internal cache, so just the page navigation is one part of the page, and Drupal stores things like that, bits and pieces that go to make up the page. By default it stores it in the database, and that's why I'll be looking at the database in the last section.
In a production environment you're more likely to store those in memcache or redis. But as far as I'm concerned those are equivalent systems. We're storing bits and pieces of the page in some internal or back end cache, which is different from what's maybe more familiar, an external cache, something that sits in between Drupal and the browser window, and that's typically Varnish or CDN. Or if you don't have access to any of those, Drupal provides default whole page caching in the database.
So that's what's more familiar but what I'm talking about mostly in this presentation is what was on the previous slide, the internal cache, where bits and pieces of pages are stored. So the hard part for me caching whether it's internal or external, the hard part is knowing when to clear it and it's sort of a familiar maxim of programming that there are only two hard things in programming: Naming things, off by one errors and cache invalidation so what we're talking about here is one of the famous few hard things in programming.
So now let me start talking about how we get this page navigation into Drupal, and then we'll start talking about how we get it cached. So there's a Twig template. Here's sort of the simplified version of the Twig template. There's some conditional and there's this variable called "tree," and if it's defined we have some wrapping classes. There's a nav class, there's an anchor, some other stuff but the rendering is all done here that we print out this tree variable in the Twig template and there's some book ID and some URL and some title that are also printed.
So this anchor element above the tree is sort of a skip to the top of the book, and basically there's nothing to this Twig template. Well, this is a simplified version. If you're curious, the entire Twig template is just a little more complicated. There's some stuff, but really all we're doing is taking some variable that we're building in the preprocess function and printing it to the page. So not much at the Twig level as far as this case is doing.
So where's the Drupal code that supports that Twig template? It's all in hook node view, which unfortunately is a little too long to put on one slide, so it scrolls. But this is also a pretty simple function. There are a couple of lines at the top to make sure that we have what we need to build that and if we don't, then we exit early. That's the first three lines. It's also the next three lines.
We collect the variables that we need for the Twig template. Those are the ones I gave short shrift to earlier, and yet another quick return class statement, and then finally, there's the render array that we're going to build to create the book navigation. It's built by a custom theme function, or custom template book nav. We pass to that, that theme function, the book ID. The weight just says where it's going to appear on the page, and then we're going to come back and look at this caching section a lot more closely but I just wanted to point out where it is in context. There are cache keys, cache contexts, cache tags, and a max age and all of this goes into the render element which is returned by the implementation of hook node view and I guess actually I lied. It's not actually returned by that but the build array is passed in by reference so we don't actually have to return anything. We just assign it here.
So here's the implementation of hook theme. As I said, that render array uses a theme, so we just have to define in hook theme the book nav theme element, and the only variable it takes is book ID, which going back to the previous slide, book ID is declared as the variable here. It's whatever we figured out earlier on in that hook node view implementation.
So far, we haven't done any real work. The real work is done in the preprocess function for this book nav theme element. I'm not going to look too closely at this function largely because this was already written. It wasn't anything that I had to change to implement the caching so this was here before I started working on the project, but you and also, it's I think a lot of this code was more or less copied from the book module itself. But we just get various information out of the book module.
So now let's look a lot more closely at the caching which is all in hook node view. Here's the bottom of the hook node view function so you don't have to scroll anymore. Again this is the render array. It's using my theme function, it's using the variable that gets used in the preprocess function to build that element and there are four separate things that go into the caching. There are cache keys, cache contexts, cache tags, and a max age and we'll look at each one of those individually.
The cache keys, there's a unique string that identifies my cache elements, presumably no other Drupal module is going to want to use the name of my module as their prefix. So PDN book nav, PDN book I guess is the name of the module, and then possibly I might want other parts of the same module might want their own caching so I add underscore nav.
And then the second part of the cache key is the book ID, so again, I want to cache the navigation once for the entire book. I don't want to cache it for each page of the book. I want one cache entry for each book, and there might be several different books in the documentation, and each book will have its own navigation, but this particular book is going to have one navigation block that's used on every page.
So those are the two cache keys. This is how we cache once per book.
One point that that's not used too frequently is that if you don't specify cache keys, if you don't specify cache keys, then any other caching information you give will bubble up to whatever is in closing your render element.
Frequently, your render element is inside a block, and then the cache data will bubble up to the block, and the block will be individually cached. For some reason, again, I was not building a site from scratch. I was adding caching to a module that was already there. For some reason, they chose not to use a block. They just had a raw render element and I needed to cash that render element, so it was really important that I put in cache keys. Often you don't have to worry about that, if you're going to be caching at the block level.
Second are the cache contexts, and if the page is viewed in a different language, then the link text will change. Maybe the URLs will also change if they're controlled by path auto. So in fact, at the time I was doing this work, the site was not multilingual. I think there were plans at the time that eventually it would be multilingual so I was trying to be a little proactive here, but by putting in the contexts, this says that when I said cache once per book, I was leaving something out. Really what I want is to cache one copy per book per language.
So for this book, there will be a separate cache entry for the English version and the Spanish version and the French version, whatever other languages the site is translated into.
The third element are the cache tags, and these are saved in the database, and what happens is that when the book I'm looking at is updated, Drupal will go through and look for this tag, node/, and then the number of the book, the book ID, the node ID of the book. It will look through the cache table for this particular cache tag. I guess I said node/. I should have said node:book ID. So it will look through the cache table for what I've deleted and delete the corresponding cache entry from the table. That's assuming we're using the database for caching it will work the same way if we're using memcache or redis or some other back end.
At the page level, so remember I'm mostly talking about the internal cache here but there's also the page level cache. At the page level, cache tags are sent in the HTTP headers so if you're using Varnish or CDN to cache your entire pages, the cache tags are used there to say when the CDN or Varnish just validate an entire page. But for my purposes I'm looking at the internal cache and I want to know when to invalidate that.
The final entry, cache entry, is the max age, and I set it to "permanent," and that just means, don't let this expire based on time. Just keep it around until I say to clear it.
So when I update the book ID, or rather when I update the root node of the book, that is when we should cache it, and don't invalidate the cache until then.
Thank you, please ask your question. Can I unmute you? Or can you do that yourself.
>> I think I can do that, can you hear me?
>> BENJI FISHER: Yes.
>> Perfect. In regards to the cache tags in the previous slide, what is the default if you do not specify cache tags? Like, what happens? Is it just cached in general cache? Or what happens when you want to clear the cache and there's no tag for what you're building? Does that make sense?
>> BENJI FISHER: That's a good question. I'm not sure I know the answer. I don't think it's going to go to any sort of general cache. By specifying the cache keys, I'm sort of saying which database table we're going to use, and we'll see that explicitly on a later slide.
If you don't specify cache tags, then I it might just throw an error. It might put a null or an empty string in for the cache tag. But basically it doesn't work and I don't know what the failure mode would be.
>> Okay, no worries. Thank you.
>> BENJI FISHER: That is the most I can say. Yep.
As long as we're paused, any other questions before I go on?
And then let's see. I see you muted yourself. I can lower your hand.
So the cache max age, I just went through that.
So, yeah, let's see very explicitly what the effect of these various settings is in the database. And it would look a little different if we were using memcache or redis but the basic idea, the basic structure should be the same.
And it also goes into the cache render table. So if I just go into MySQL and ask it to describe that table, there is a CID, cache ID, that's a unique identifier for this table. There's data, which is a longblob, that's being cached, in our case the entire HTML for the navigation.
There's an expire, which is a time stamp. Created, when the cache entry was entered.
There's a Boolean for serialized, is this serialized PHP data or not? There's an entry for the cache tags, and there's a checksum, and I think I'll look at all of these except for the checksum. Maybe not all of them.
I never figured out what that is. I was never curious about it.
So here's the query I run. So I guess I'm only looking at these five so there must be something else I'm leaving out, but I'm going to select the cache ID, expire created tags, checksum from the cache render table. I'm going to look at entries where the cache ID starts off with PDN underscore book. I guess I probably should have said PDN underscore book underscore nav, but at the moment, it gives the same result either way.
And I'm going to limit myself to one result, and who knows what the /G means in the my SQL query. I have a little story about the /G. Several years ago I went to a training session at Drupal Con on database and other optimizations for Drupal, and it had some really good presenters, including David Strauss from Pantheon, and my God, I can always learn something from him.
But I was able to teach him what the /G means. It says normally you use the query with the semicolon, if you use a back slash G with a semicolon, you get results one per line instead of in a table, and since we have some rather long entries here, namely the cache ID is kind of long, it's nice to have it displayed this way rather than having CID expire, created tags and checksum going across the top as the data entries, so that's my little database entry.
I guess the other thing I left out was the blob. I don't want to see in this little investigation, I don't want to see the full megabyte or so of HTML that I'm actually caching so I'm leaving out the data and I guess I'm also leaving out the Boolean for whether it's serialized or not.
But these are the database entries that correspond to the cache entries that I gave before. So I'll go through them now one at a time. And I've introduced some whitespace to make it a little easier to read. So here's the CID, the cache ID.
It starts off with the unique string I gave it, my cache key. Then there's a colon. Then there's the node ID for the book. Again not for the page in the book, just one for the book.
And there are these curious things in brackets: Languages, that's English as I said. At the time I did this I was the only language.
Then there's another one in brackets, the theme. And there's the name of the current theme.
And then there's finally the user permissions, which is some sort of cache. So I guess I've already said this. The first two things are the things that I sort of expected there, because I explicitly mentioned them when I specified the cache keys.
The languages come from the cache contexts. Remember I said I was being a little bit proactive and saying that we should cache once per language. But at the moment, it's a mystery where the theme and user permissions come from, but I'll explain that a little bit later.
So this entire string, I've added whitespace for the purpose of the slide, but it's really just one long string where things are separated by colons, not by whitespace.
This is the cache ID, and this identifies the thing that will be invalidated any time I update this particular node. And there will be, if I do have different languages or if I do have different themes, there will be different entries, and also, if different users with different roles are looking at the page, I'll have one copy for anonymous users. I'll have one copy for Admin users, because maybe it doesn't make any difference for this particular application, but in general, it's a really bad idea to take some HTML that you've generated for an Admin user to view and cache it, and then show it to an anonymous user. That could cause serious problems.
The max age expires minus 1. That corresponds to the cache setting the max age "permanent" that I said. So I'm showing here both the expire and the created. I assume, I haven't tested, but if I said to cache it for a day, what's a day, 86,400 seconds? I believe that if I had done that, then the expire setting would be 86,400 more than the created time stamp.
And the cache tags, so node:704369, that's the one that I specified, and then there are some additional cache tags that are created by the system. If something in the configuration system changes, then it's also going to be invalidated. And I'm not sure what the rendered cache tag means. Maybe that's something that's used if you're clearing all of your render cache. Probably that is what it is but I haven't actually checked. So those are my cache tags.
So going back to the cache ID, it had these permissions in it. Where did that come from? And the answer is, if you look at sites/default/services.yml, there's this very helpful comment here, the renderer.config and it has these required cache contexts so even if you don't specify any cache contexts Drupal will automatically put in anything that's in this config file, this yml file and the renderer will automatically associate these cache contexts with every render array, and so the default is languages, language interface, so in fact although I thought I was being proactive by saying we should hash based on language, I didn't have to do that because it was already specified in this default. The theme is the second and you may remember in the cache ID there was a theme entry, theme in brackets and the third was user permission in brackets and the comment shows that those are the Drupal defaults and no one has changed the defaults for this.
So those are the cache contexts you get by default and if you don't change this you'll always get an Admin version and an anonymous version and whatever other roles you have. Each user has some set of permissions. That set of permissions gets hashed and you'll get a different version of your data for each permission hash. You'll get a different investigation for each theme you've enabled on your site so it's not too common we have different people looking at the same page in different themes but it can happen and if it does, it will be cached separately, and it will be cached per language.
All of which makes a lot of sense.
So Drupal has great defaults. A lot of work has gone into this. So conclusion, let me just sort of show you the outline again, where we've been.
I explained why book navigation had to be cached. Talked a little bit about the internal versus external cache. Showed all of the code that's needed to get this menu onto the page. There's the Twig template. There's Drupal code, importantly hook node view, and then also the theme functions and the preprocess function.
And then we looked specifically at the cache entry in the render array, that was in hook node view, and we looked at the effect of those settings on the database. And now we're at the end, conclusion.
So thank you for listening. We have about 10 minutes left. Are there any questions that you have been saving?
>> Doug Vann has a question. Thank you so much Benji that was awesome. Very formulaic, very procedural, very detailed. So whenever I'm seeing anything with hooks in 2020, my immediate thought goes to obviously you're using some D9 compatible things here but at what point are you concerned that the rug is going to be pulled out from underneath you?
>> BENJI FISHER: I think the answer to that is answered in other sessions where we talk about D9 readiness. When and if these hooks are deprecated, the automatic testing for deprecated code will let us know about it.
I don't believe hook node view is being deprecated for Drupal 9, I haven't checked but I don't think the answer is any different here than it is anywhere else. As far as I know, many hooks are still with us for Drupal 9, and if I'm right about that, for this particular hook, then it will still be with us until Drupal 10. It might be deprecated in Drupal 9 and then removed in Drupal 10, but I think we're okay with this for all of Drupal 9.
>> Excellent. Thank you.
>> BENJI FISHER: If anyone has any reason to disagree with that, speak up, and if not, Aaron, go ahead.
>> Aaron: So I've personally used some of these, like, caching tags, contexts, all that good stuff but one thing that I've never been quite sure of when I'm building something custom no matter what it be, is kind of what are the use cases? I know you kind of went over some of those but are there instances where you, say caching is not an issue. Say I didn't have this huge menu that I had to cache but maybe I have something that could be cached, it could be better performance, I guess. What are some of the main use cases maybe you've seen when you should be like: Yes, definitely use context here. Definitely use tags here, et cetera.
>> BENJI FISHER: Huh. I haven't really thought about that. So here I guess one thing I talked about how big this navigation was. Frequently, the reason for wanting to cache something is how computation intensive it is. So I guess the questions to ask are sort of identify the things that are computationally expensive. Figure out whether they're candidates for caching. Do you actually have to regenerate it fresh for every view? Or can you cache it for 15 seconds? Or can you cache it per user? Or can you cache it per role or per language?
Just have to figure out how many different variations of it there are. Those variations will be your cache contexts. You have to figure out how to identify this one thing and make it different from another thing. So in my case, one book versus another book, that's often a node ID or something. Did you want to say something else?
>> Maybe it would help if I kind of flipped the question. Maybe just in the context of this project that you saw with the books and everything. Was there anything regarding caching that maybe was cached too much? Or it was kind of detrimental to the performance of the site? Maybe from a visual standpoint. Or was there anything where it's like yeah, if you cache it this hard or do this many things, it might, you know, kind of hurt performance rather than help it?
>> BENJI FISHER: So I don't think it was a problem here for performance. If you are careless about your caching strategy, then you can spend effort caching something, whether that be in the database or memcache or redis and if you end up generating a new version before you ever get to use the cached version then it's a total waste and that's a bad thing. These are the questions that make caching or cache invalidation one of the two hard things in computing.
So I was pretty aggressive about the rules for caching so I'm pretty sure that I didn't cause any performance problems by caching too much, by caching the things that wouldn't be used. My problem is that when I specified the cache tag is based just on the node ID of the book, that potentially causes problems. Someone might update a particular page of the book and the title of that page, or the link to that page, might change. But updating an individual page of the book would not trigger a cache invalidation, so that is a potential problem.
And we can work around that through process. We can tell the people managing the documentation that any time they update a page of a book, they have to update the root of the book, as well, in order to clear the carbs. Or in fact, the way this particular project works, the all of this documentation is actually generated in an external system, and then imported into Drupal with migrate API, hey, that's another presentation, so I think that for this particular project, these huge books whose navigation I was worrying about, it was actually the right choice.
But it's certainly something to worry about that your caching is too aggressive and that it doesn't get invalidated, it doesn't get refreshed, when you need it to.
So is that a satisfactory answer?
>> Aaron: Yes, it is. Sorry, I know the question I asked is very contextual and large, so any insight was great. So thank you.
>> BENJI FISHER: So again, there are some good documentation pages. I'm afraid I don't have them open in tabs right now but they sort of walk you through the thought process. Actually, I can probably find it pretty quickly with a little help from Google, or whatever search engine I'm using but if you look for caching render arrays, Drupal, I bet one of the topics will be the page I'm thinking of. Yeah, here it is. Drupal.org/docs/API with lots of hyphens.
And it leads you through the sorts of things you have to think about. And it talks about how those thoughts affect the contexts, the tags and the max age. The one thing you don't see in this documentation, one thing that's not talked about a lot, are the cache keys, and that's largely because a lot of caching is done at the block level, and as I said earlier on, it's sort of unusual that on this project I was caching the render array itself, and not letting the cache data bubble up to something larger like a block.
So we're actually really close to time, so let me just run through my last few slides. For feedback, if you go back to the page for this session, there will be a feedback form. I'll reload it. I doubt the feedback form is there yet. Oh, I'm wrong. The feedback form is there.
So please do go back. Say what you think about it. I probably will be giving this talk again in the future. This is not the first time I've presented it, so it's always nice to be able to make some improvements along the way.
And please do come to Contribution Day on Saturday. It will go from 10:00 a.m. to 4:00 p.m. I should have said that's Central Time, Chicago time. You don't have to know how to code to give back.
For new contributors, Amyjune from Kanopi Studios will be giving the training from 10:00 a.m. to noon. I'll be running a session on improving the documentation for the help topics module which we hope is going to eventually replace the current help module or somehow augment to work with but they basically, I'll be working with documentation in Drupal core and there will be a bunch of other tracks so please join us then. I don't have any I don't have links for other events going on but there are social events tonight and tomorrow night. Check out the website for links there and thank you, Doug, for hosting and Heidi for doing the captioning, and thank you to everyone who came and listened. I appreciate it.
>> Thanks, Benji.
>> Thank you, Benji.