(3600 words; updated with some additional thoughts, and reworded a few sentences on 2017-01-09)
While turning various ideas for future side projects around in my head, I keep thinking about how to tackle the problem of writing independent, platform-agnostic applications with a graphical user interface (GUI).
Existing cross-platform GUI frameworks
There are already many ways to do this:
- Use a cross-platform language/framework/environment that has some notion of GUI already built-in from the start. Java immediately comes to mind. (Are there any others?)
- Use a toolkit/framework that is deliberately designed to create GUI applications that run across platforms. The two most abundant and popular ones appear to be:
- Qt (on which the Linux desktop environment KDE is based, as well as a large number of more or less largish cross-platform GUI apps)
- GTK+ (on which the Linux desktop environment GNOME is based, among many others, too)
- Alternatives (likely incomplete): List of widget toolkits – I haven’t seen most of these in action; and those that I have at least had a glimpse of seem to be relatively niche solutions (which is not meant to be a judgement; they could still be great, but I know very little about them)
I have looked at all of these, and they all have their advantages and drawbacks.
Java GUIs, for instance, have a bad reputation for being »ugly«, cluttered, and often sluggish. Following their progress for the last 10 to 15 years, that appeared (to me) to be mostly true, but seems less true today. Still, even today I don’t know any Java GUI application that feels particularly elegant to use, or particularly »at home« on any host platform or operating system. (I should add that, coming from macOS, I’m probably quite prejudiced – or spoiled, if you want to call it that).
And, of course, that isn’t to say that they aren’t useful. Just to give an example: I’ve spent some time working with JetBrain’s line of IDEs, which are (to my knowledge) completely written in Java, using no native UI components, and they are really impressive and powerful tools. (While it looks like they put a whole lot of work into building good user interfaces, they are still so cluttered that I just don’t enjoy using their products as much as I like using, say, Sublime Text or Atom, despite being aware that IntelliJ IDEA eats both of these for breakfast when it comes to features.)
Qt and GTK+
Qt and GTK+ have similar issues, while Qt additionally used to have some problems with licensing and the question of who exactly »owns« or controls that technology. It looks like it’s mostly liberal to use now, but not free or open-source by definition, if I understood correctly.
The latter three also seem to have another thing in common, which is their relative bloat, to put it disrespectfully. This may be an incorrect impression on my side, but from looking at real-world apps and code samples using these frameworks, it seems they are all somewhat heavy. They want to be as flexible and support as many features as possible (which makes complete sense), but this naturally comes at a price, namely that they are all quite large and complex. Too large, and too complex, for my taste.
At a glance, the APIs are huge, and it appears you’ll have to invest a very substantial amount of time before you can write anything beyond the most trivial »toy« or »demo« apps. As a consequence, you become somewhat locked-in to using one framework or the other. It’s not very likely that you’ll end up being able to do the same thing in all of these frameworks, due to the effort required to get there. It’s very possible that I’m just lazy (or even misguided or misled), but this prospect has kept me very reluctant to seriously learn Qt, GTK+, Java GUIs, or anything similar. I guess I’m just too afraid that once I do decide on one and dive into it, I may find out it’s not what I was looking for after all, but by then I’ll have spent months or maybe years, and I may not be able to turn back. I’ll be sitting in a truck that’s so huge and heavy I wouldn’t know how to stop it anymore. (Is that a reasonable fear? I’d love to hear your feedback on this.)
The web browser as a GUI framework
The relatively recent possibility of using a browser engine to build a GUI on top of it is very exciting. For one, it’s extremely approachable (particularly if you have experience building web applications) and flexible. Basically, you can do anything that you can do on a web page (something that I think no native GUI framework can directly compete with). It’s also probably the easiest way to build a GUI app that runs natively today. But, some significant drawbacks of this approach are immediately obvious:
- Size and memory requirement. Even writing a »Hello World« app in Electron requires running the browser shell and node.js, which takes up significant space on disk and uses significant memory. Let’s look at some numbers, taking Atom on macOS as a popular example:
- The entire Atom app bundle takes about 260 MB (current release, 1.13.1).
- The Electron framework alone (which contains a complete Chromium browser engine) is just over 100 MB.
- Almost all of the remaining weight, namely about 150 MB (across ~5,600 files) resides in the Resources directory of the app bundle.
- Of this 150 MB, the node.js executable is about 23 MB in size, and the node_modules directory (which is where practically all of Atom’s functionality resides) contains the majority of the files (~5,300 in total; about 23 MB as well).
- For comparison, another popular text editor, Sublime Text 2 (release 2.0.2 on macOS), uses about 27 MB on disk, which is roughly one-tenth of Atom, or about the size of just node.js on its own.
- You may not care about any of these numbers, and on most today’s hardware, they probably don’t matter. About memory usage, I’m not so sure, but I haven’t checked that.
Is there another way?
Having considered all of this, I’ve been thinking about how else to approach writing GUI apps. To set the stage, this is what my ideal solution would look like:
- It is truly platform-agnostic, i.e. it makes no assumptions on what hardware, operating systems etc. it should run on. It should be relatively little work to get it to work on a different platform.
- It is independent, i.e. it’s not controlled by a corporate entity, or by any body that has control over it in a way that it can coerce you into doing things a certain way that leans more toward serving its own interests than that of its users or developers. This probably implies that it would have to be open-source.
- It hits some kind of sweet-spot in trading off performance, flexibility, feature-richness, complexity and approachability (i.e. ease of use). It seems like you can’t have a solution that is maximally performant, flexible, approachable, has all the features you could think of, and is minimally complex, all at the same time. Acknowledging this, I’d immediately sacrifice feature-richness. Reducing features to a minimum would get rid of complexity and, consequently, probably improve both approachability and performance. It would not necessarily affect flexibility.
- It’s compact and fast. Did I mention that I don’t like bloat? I’d like my GUI layer to be as lightweight (in terms of memory usage and executable size) and performant as possible, while still being pragmatic. I do actually want to build some useful real-world apps with it, so there is a lower bound to the minimalism — and one aspect of this experiment would be to find out exactly where that bound lies. (All of this is already implied in the previous point, but I’d like to stress it.)
- It’s easy to use. You should be able to set it up and produce a working app in a few lines of code. Again, this is implied in the point above, but needs emphasis.
- It allows me to keep my code closed. While I’m all for the ideals of free and open software, I don’t want to be forced to open-source my app, or to give it away for free. I think it’s important to be able to offer a product and get paid for it, in whatever appropriate way. People have to pay bills, and it’s naive to expect that someone will happily hop on and sponsor your work. However, this point refers to applications written with the GUI framework in question, not the framework itself. The framework itself should be free and open-source according to generally accepted standards.
An approach to a minimal GUI layer …
I believe that real-world usable GUIs can be built from a very limited set of very limited building blocks, in particular, if they can be combined or nested in suitable ways.
For instance, imagine a GUI framework that offers only four primitives:
- some notion of a container (for positioning, i.e. layout),
- a static text label,
- and two types of controls, namely
- a text-input field,
- and some notion of a clickable surface, i.e. a button or switch.
You could probably implement the large majority of existing GUIs using just these four things, and they would not look or behave drastically differently. Most of the controls that are more complex than this could simply be implemented as nested combinations of the above. For example, a list of clickable items (such as a menu, select or similar controls) could be made from a container full of clickable surfaces.
(You could even get rid of the static text label as a separate primitive, and just define a surface that has some kind of displayable content and can optionally react to certain kinds of events. That is essentially the concept of a UI view in object-oriented UI frameworks such as macOS’s AppKit.)
… and implementing it
How would I go about implementing such an idea? I still need some kind of more fundamental framework or library that will handle the most low-level tasks for me, while abstracting them from the underlying platform.
At the moment, I am considering the following setup:
- The SDL library (Simple DirectMedia Layer) as a foundation for supplying me with events and giving me screen surfaces to draw on. This library is mainly used in cross-platform games — and not just »toy« games; big ones! I’m not aware that it’s being used regularly to build GUI apps, but I don’t see a reason why it shouldn’t be. It is quite low-level, which I see as an advantage, because, from what I can tell, it is extremely efficient (i. e. fast), thanks to the low overhead. (Also, while it’s most often used with C or C++, there are bindings for many languages, including ones which liberate you from manual memory management, while still compiling to very fast code, such as Go.)
- The Cairo library for 2D drawing. SDL only has very primitive drawing routines, which is deliberate, as actual drawing is mostly outside of its scope. Cairo offers all the drawing calls that you’d realistically ever need, but it’s still relatively compact, approachable, and from what I gather, also very fast. It’s quite popular as well; many applications and frameworks use it for 2D drawing under the hood. There is explicit support for using SDL together with Cairo. In addition, there is a lot of overlap between Cairo and SVG (Scalable Vector Graphics), which is extremely welcome, as SVG is an excellent open and cross-platform text-based format for representing 2D [vector] graphics. Having the option to easily translate Cairo drawing calls to SVG and back is sure to become useful.
- The Pango library for rendering text. Pango in turn uses HarfBuzz, which is a library for drawing text shapes (glyphs). Rendering text is extremely demanding if you want to support even just a fraction of the writing systems used in the world today. I don’t even want to think of getting into all the intricacies involved, so it’s wonderful that there exists a library that abstracts this away. Cairo explicitly encourages using Pango for text rendering, as its own text capabilities are limited. (So, SDL plays nice with Cairo, Cairo plays nice with Pango — it would seem like all the really painful low-level stuff is taken care of.)
- For apps that need it, 3D rendering would be handled using OpenGL. SDL basically assumes that you’ll be using OpenGL anyway, as it’s mainly used for games. The two work together very well from what I gather.
All of these libraries are open, free and independent in (most) every sense, and while offering everything within their scope that you might need, they are sufficiently manageable that I believe it’s possible to learn and understand their respective APIs completely within reasonable time. (This would be the advantage of using something that does one thing only instead of trying to do everything at once.)
Should we go deeper?
I don’t think it would make sense to go more low-level than this. You could write your own 2D drawing code instead of using something like Cairo, but SDL doesn’t even know how to draw lines (it can only set pixels or fill rectangles), and you’d have to implement your own line-drawing code, which seems a bit crazy. Also, say, you only want to support English text in the UI and are fine with having only very limited typographical control, then you could do without Pango. You could even create your own simple bitmap (or even vector) font and draw text »manually«, foregoing Cairo as well, but then we’re seriously heading into crazyland.
Additional limitations and things to consider
(This section was added after a reader pointed out some of these points to me on Twitter; thank you!)
First of all, though I think it follows from the above, I’d like to make it clear that I don’t intend to replicate the breadth of something like GTK+ or Qt. That would be outright silly and of course completely unrealistic.
On the contrary — the original thought that got me started on this was »What would be the minimal scope of a GUI layer, written from scratch, so that you can build real-world-useful applications with it?«.
The kinds of applications I have in mind would probably be single-window apps, by which I mean that they would be contained in a single window of the host system. It would still be possible to have a notion of »window« (such as tool palettes) within that window surface, using the aforementioned container primitive, which may overlap other containers.
Handling multiple windows entails dealing with the windowing system of the host platform, which would take me away from my goal of platform-agnosticism. If SDL has support for managing multiple windows at the host system level, it’s not a problem, but even then this would be a very low-priority goal, and I may even decide not to support it by design. It would make things needlessly complicated, and encourage clutter by making it possible for an app to have lots of separate windows (which I personally think is a very bad UI design choice*). I don’t have any application concepts in mind that could not be built within a single window surface of the host platform.
* The only situation I can think of where having multiple windows actually makes sense is in a document-based application (say, a text editor) where you want to look at several documents side-by-side. However, this can be emulated by allowing content areas to be split and arranged next to each other within the same window.
Besides, complex document-based applications are not my design goal. I explicitly intend to keep things very simple. If you wanted to build the next OpenOffice or something like that, you’d go for a full-featured UI framework anyway. You’d need a wealth of other functionality that a simple system like the one I’m thinking of can’t — or doesn’t want to — offer. To reiterate, there’s no point in trying to imitate what’s already available in that area.
That said, a reader pointed out that my concept would be lacking in three important areas:
I thought about these points (which are very valid), and I believe basic localisation and internationalisation would be relatively straightforward to achieve. Static text (labels) can be run through a translation map, and we can have locale-aware formatters for certain variable things like datetimes, currencies, or numbers in general. As I’ll be using Pango to render text, I won’t have to deal with the complexities of handling international text. At least I hope so.
On to scriptability. One way to achieve this would be to design the application in such a way that the UI performs calls to some kind of internal API or library that implements the actual functionality. This would be advantageous in other ways, as we could then uncouple this API from the application and expose it, so that it may be used via any other (non-graphical) interface, such as a command line. I don’t think it makes a lot of sense to have the GUI itself be scriptable. The point of scriptability would be to automate things, and »remote-controlling« parts of the GUI would be a needlessly roundabout and inefficient way to do that.
Accessbility is a tough one, and I don’t have an idea how to support it directly. On the other hand, if we have scriptability, then we can get accessibility, too: A user with accessibility requirements could use the application via its scripting API, which may be wrapped through some means that makes use of the host platform’s accessibility support. However, if accessibility is a high priority, it makes more sense to use a full-featured UI framework that already supports it. (Again, I won’t be competing with what these frameworks offer. If it were even realistically possible, I’d end up recreating something that resembles GTK+ or Qt, including all of their complexity. If this was were I was heading, I would be using GTK+ or Qt in the first place. You get the point.)
So far, so good
The above setup, I believe, would be a solid foundation to build a simple UI layer on, using just the primitives I mentioned. I imagine this would already take you quite a long way. Of course you’d be very far removed from the sophistication of GTK+, Qt, or even the macOS AppKit, but it would be a very interesting starting point.
If it turns out to be workable, then I can see lots of incentive to turn this into a framework to build apps with, always keeping in mind that the learning curve should be low.
As for actually building a GUI app using this setup: no, I haven’t written a single line of code yet. Right now, it’s just an idea that I’m exploring.
I’d love to hear your thoughts, and thanks for reading!