Design of JUpdate
From Joomla! Documentation
Revision as of 18:00, 10 November 2012 by Pasamio
This is an attempt to document many of the design decisions behind the JUpdate class in Joomla!. Since this package seems to be contentious, I figure I'd put a whole heap of documentation here and compile a significant chunk of what has been rewritten into a single place.
For most of this document, I in this context is Sam Moffatt. This is perhaps written more like an email than a wiki page but this isn't Wikipedia.
JUpdate and the update package in general was designed to provide a generic, reusable update system for extension within Joomla!. The main aim was to be able to support a directory the size of the JED in a scalable manner and where possible shift workload from the central Joomla servers and provide mechanisms for each Joomla site to be able to handle some of the work. It was also designed to support a decentralised model so that we weren't 100% dependent upon the JED having the supporting infrastructure (which it hasn't got yet due to the usual reason: nobody has the time or interest to develop it). Initially people suggested it should be tied entirely to the JED as it already has the information but that wouldn't be as inclusive. Building a distributed model fits more with the way the Joomla project works with third party extensions.
Much of the design of the update system is based on Debian's DEB and APT system. APT provides a mechanism for finding and installing packages including their dependencies. The main source of this information is in the "control files" for a package. Many of the fields in the extension.xml file can be related to the fields specified in the control system. These include the use of tags, maintainer and section. Relationships were also intended to be included matching the style and format of the Debian though with an XML inspired interface.
I personally believe that expanding on a Debian inspired design for Joomla in the long term is a solid mechanisms for moving forwards. Debian have been working on their system for a long time and it has been demonstrated to be a solid and reliable base for their system. It contains many pieces of information that aren't presently included and could be included. Additionally it provides features that are much desired with standard and defined structures on how that should work (e.g. how dependencies work and a complete dependency model).
The decentralisation of the system is also a side effect of looking at how APT works. APT permits multiple "repositories" to be defined which you can use to search for packages for installation and update. The JUpdate implementation follows along with some of this concept and the general ability to configure locations to look for new files.
Since 2005, I've been trying to build an update system for Joomla! (at the time Mambo). It was kicked off when Google in mid 2005 announced their first "Summer of Code" program. I'd personally been seeing a major problem with Mambo extensions that it was hard to know when they were out of date and to update them. It was also hard to be able manage dependent packages and have them install properly. I put in my proposal for SoC but started before I was accepted on building out the system. That update system was originally based on XML-RPC which obviously requires a dedicated server and has a communication overhead. The project did push some stuff like the development of a libraries folder with jimport's predecessor, mosFS::load, to facilitate shifting the installer libraries out of the installer component and putting them somewhere more generic. Ultimately the fork from Mambo to Joomla halfway through SoC meant that there was a terrible amount of disruption and none of the projects really succeeded past the fork in their original form.
For those curious, here's some links:
You'll find web install was added here as well, an early onBeforeInstall similar to our current onExtensionBeforeInstall and an early library installer. While the web install was added with 1.5 in 2008 (three years later), it wouldn't be until Joomla 1.6 that much of other items were added (2009).
As noted above, the aim was to deliver a generic, reusable update system. Primary client for this work was to be able to integrate with the JED which has a large repository of the base information we would need and is already pre-existing. However the Joomla! community is also less centralised as a rule so the system was designed to be decentralised by it's nature. This is reflected by the fact that Joomla is less centrally controlled than similar projects like Wordpress and Drupal, both of which have a heavier more centralised approach towards development (particularly with Drupal). Throughout the period there were also issues with what the JED did and didn't include which meant there were some extensions that would have never benefited from the update functionality. Providing a decentralised model still permitted having a major central repository of extensions in addition to also permitting flexibility for third party extensions.
As the JED was the primary target for this the design of the system followed the perceived needs for a JED level system. It was to be expected that the system would receive a large number of hits from Joomla sites around the world and would need to handle the load. For this reason the earlier idea of using XML-RPC to handle the updates backed onto a database layer was scrapped. Much of the information in the system didn't update that often which meant that for the most part it was static. While some extensions would update each day, this wouldn't likely be all of them so having such a high level of responsiveness was an unnecessary burden on the server.
Instead of relying on this mechanism, a simpler mechanism around using plain XML files was used. Plain XML files can be hosted without a PHP interpreter being invoked or a database connection made: a simple Apache (or even more lightweight HTTP daemon) could be used to serve the files. The operating system would be able to cache the files in memory efficiently and other layers of caching in front of the service could also be added easily and effectively. XML files also added an extra feature: as they didn't require any significant server infrastructure, they could be authored and maintained by anyone and hosted online.
Utilising XML files didn't preclude tying into a database system. Instead what could happen is either that the static XML files were re-generated from a third party application as required or a web service layer could emulate XML file paths but really be using mod_rewrite to map that onto a Joomla component not dissimilar to how the SEF layer works. This means that for commercial developers they would create a process where they added update site paths that mapped to their customers and used tools like IP address verification of the server to show or hide updates in addition to including a token in the URL. Sadly nobody seems to have taken the initiative to build such a system.
Why use PHP's HTTP stream wrapper API and XML Parser?
The decision to use the HTTP stream wrapper API was to support being able to stream larger files. When this work was started, the PHP memory limit was set to 16MB. Roughly half of this could be consumed at any given time by the core Joomla install. This means we have 8MB of memory to handle processing an update file. Given that using SimpleXML will trigger a 10 to 15 times increase in memory usage against the base data, a half a megabyte file may be sufficient to trigger memory exhaustion. For this reason both streaming the remote file and stream processing the XML was chosen to minimise the amount of memory that is used. At the moment the JED has nearly 9000 listings, this would mean that to fit in all listings at one version a listing would require around half a kilobyte each. Fortunately memory limits have increased which mitigates these concerns a little and since then code was originally written, the HTTP stream wrapper API has been replaced by JHttp meaning that there will be a few copies of the response body in memory.
Client versus Client ID
Client is used in the update system as the name of the client being targeted for templates, modules and plugins. The name is used instead of the internal client ID to ensure that the system is compatible should the internal ID change or new items be added. In holding with the general aim to be generic and not specific. Originally client wasn't required for components but a change to make components default to administrator instead of site. This triggered a response which instead of fixing the root issue just made things worse.
- Debian Policy Manual - Control files and their fields
- Debian Package Required Files - control
- Sample Extension com_alpha/alpha.xml Line 100
- Debian Policy Manual - Declaring Relationships Between Packages
- OSM SoC 2005 Repository - mosFS class
- #24305 3rd party components install in location "site" instead of "administrator"
- #24338 Autoupdate fails for components after applying a patch from issue #24305
- Platform Pull Request 676