For a site I’ve been working on I’m developing a PHP module that allows to display page suggestions based not on the page content but rather on which pages past visitors requested.
The rationale for this is – intuitively – that the pages visited by the majority of past visitors probably are the pages that the majority of future visitors could be interested in. It actually may sound more complicated than it really is: what it means is that when a user requests a certain page, the module extracts from the server log which other pages have been visited the most by past visitors of the current page.
Starting from this simple idea I’ve been adding a bunch of refinements, such as filtering out pages already linked by the current page, that try to improve the quality of the suggestions. As a bonus, I implemented a (rather basic) visualization of the scores as computed by the module. This can be a quite handy way to spot immediately if there are pages that are not performing really well. This is an early screenshot of this visualization (the column on the left contains the name of all the pages on the site and the column on the right contains the corresponding number of page views) that already highlights some problems, namely that the two top visited pages are poorly linked to the rest of the site (the corresponding row is completely red almost everywhere).
Obviously this approach is far from perfect, but I think is an interesting concept nonetheless. I already have some ideas about how to further improve this method – for example by taking into account not only the current page that the user is viewing but rather all pages the user has visited up to now. Also, since the website has (for now) very low traffic, scalability is not (yet) a problem but obviously for this to be really useful it should be made as scalable as possible.