Intro

Hey everybody, I’ve been having a bunch of private conversations about what comes next after the most recent server move, so I wanted to take some time and put the engineering plan thoughts to paper. I’m a big fan of doing instead of saying, so I don’t typically like to make promises as my personal development time is pretty limited due to the real world and parenting, but E2 continues to be a priority for execution for me. That said, it has been helpful for me to lay out the order of operations for where to go next and what the issues are that need to be overcome. The good news is that none of this is technically hard to do; it is just time consuming and a little bit finicky as I perform surgery on the codebase to begin to build it up.

This writeup is going to explain at a point in time where we are at, my current thoughts as to necessary improvements and I am going to explain some of my technical opinions on things. I’m totally willing to be wrong so if you come from a tech background, please feel free to disagree or make a pull request. We are open source and will continue to be for the near future. This is also a part of our disaster recovery plan so that the site will continue to evolve and be in good hands should something happen to me. Don’t worry, I’m not sick or have any major problems other than potty training a 2.5 year old, but it is a strong consideration in any mature business or community.

The Web Then

As a bit of a history lesson, E2 has a long and unique history as a test bed for a novel product. It was originally nate and co’s brainchild and while it didn’t go anywhere, a few sites that have adopted it are still alive, namely us and PerlMonks. The code has diverged pretty heavily over the years, and while there are some small reference touchpoints remaining, that isn’t much. The Everything Engine on which this was based is a combination of a CMS (Content Management System) and a coding framework.

Since then, better versions of both have emerged. In the CMS world you see ones such as Wordpress exist, and web frameworks are everywhere, including Ruby on Rails, and even perl-based ones such as Dancer, Poet, or Catalyst. These are interesting experiments and we may join a more standard framework one day, but the dreams of this engine doing anything else than serving the needs of this community are incredibly dead.

The engine itself relies on some of the behavior inside of mod_perl, a legacy-based way of using perl memory management internals to optimize page delivery. This prevented database calls by keeping many of the objects local to the memory store and using it like a cache so that startup execution no longer needed to happen. In modern times, applications have moved away from these types of quirky patterns for more robust caching tiers, like using memcached. We’re still tied to it for the meantime, but it holds us to having to maintain linux servers to get this done.

Also, the engine relied on calling eval() on the code. A lot. The code for the application is stored inside of the database, and while this was novel at the time, it presents a structurally awful and incredibly insecure way. First, it was computationally expensive to do so, and it prevented some of the caching of compiled objects in ways that could keep the server running fast. It meant that you could change the tires on the car while it was still running, but in reality, who the hell needs to do that. This behavior kind of metastasized into other bad patterns, such as the “patching” construct that well intentioned coders spun up to deal with code changes, that will at some point need to be unwound.

Lastly, the site has kind of this archaic model of inheritance which allowed for simplistic patterns in how the data relationships were held, but it made it rather hard to change or create more complex data types, and playing with the database structures was fraught with zero rollback. This means that it was abused in subtle ways and that data model needs to be cleaned up. Features all work a bit differently and have different data storage religions.

The Web Now

The web has changed! It’s a mobile-first world and E2 performs poorly under mobile. There’s a few reasons for this, none of which is anybody’s fault. The core display engine revolves around the concept of nodelets, page sub-sections which are customizable blocks of random information that means everyone gets their own page setup. There are something like 30 of them, and it means as a design tenet the page doesn’t hide information away that people use infrequently.

So for instance, take the Epicenter, which pretty much everybody uses. In a modern-UI content creation or manipulation website such as Github or Google Docs (where I am writing this) or anywhere else, the options would be out of the way and dynamically presented to the user if they need it. This means that under mobile we’re always displaying it, and under the desktop version it just crowds the right-hand side of the space. It’s not great design, and while it is customizable, the very concept of customizable UIs has kind of gone by the wayside for the sake of simplicity.

The web allows people to make small changes to a website without pulling a lot more information. These highly dynamic sites have a few different manifestations, but in general just about everything these days is done through an API, which makes the externally facing functionality easier to test and easier to code for both desktop and mobile. E2 has some of these concepts today; for instance the chatterbox and other nodelets update dynamically. They do this however by fetching freshly rendered HTML blocks from the server and just jam them into the page. It’s good for now, but again makes it difficult to handle mobile because of the poor separation between display and display generator.

The business of E2

The mobile issue described is an issue that impacts the core business of E2. Google, which is E2’s main driver of new traffic penalizes the site for not being good on mobile. While I’d love to plead with them to give us a break since we are an older site, I get it, and these steps are to correct those problems and continue to place us down the path of sustainability. Basically the site needs constant engineering to stay on top of the platform and industry changes or the way Google’s search is designed, it will be swept into the dustbin of history. That dire warning aside, E2 is pretty cheap to maintain now that everything is optimized and clean. It still pays for itself, and costs about $4/day to run, based on Google ads shown to non-logged in users. It’s not a high revenue number to hit. The corporation is not a non-profit, so while the occasional unsolicited offer of money to throw into the pot is wonderful, we’re in good shape and we don't need to hold out our hats. There's enough cash in the bank to ride out problems for years at that burnrate, so there would be time to course-correct

The order of operations to the future

This is going to be broken into a few tracks based upon where we are, and which way is the way forward. E2 is nothing if not a complicated jigsaw puzzle that needs to be converted over to a modern way of doing things.

Track 1: Cloud

Cloud means a lot of different things, but for us at E2 it means reducing the amount of things necessary in running servers so that one can focus on the application. Porting things to the cloud is kind of its own treatise, but it means not thinking about things as application servers that are run and managed by somebody other than the cloud provider. As has been stated before, we are moving the application further down the path of being a serverless application, but this is a big change due to the way that we are shackled to mod_perl. This track and the application track are going to intermingle and feed each other as it goes on.

One of the core challenges of cloud is in the instrumentation of the application. There isn’t a lot of QA and automated tests available, and there isn’t a lot of cloud-provided insight when something doesn’t work.

Things in this bucket:

Crash reporting (Top Priority) - Right now when you see a “Hamster Error” or a “Server Error!” these are separate problem lines where it is hard to be alerted that something broke. Most errors come in through the user community and get written to the log in a very hard-to-parse format. It is presented to me as a stacktrace and some other information about who is doing what on the page, but the process very much required occasionally scanning the logs for certain strings and parsing them out to help figure out what error to go and kill. I need to develop a cloud-based (likely an Amazon SQS construct) way to just pull those logs off of the machine and deal with them.

Secretless configs (This year) - The amazon keys for several internal resources are not done via the IAM instance role, so there’s a lingering potential security problem with that setup. These need to be ephemeral credentials, including the database and Amazon SES sender passwords.

Log parsing (This year) - Logs from our Apache2 instances are rotated off to an S3 bucket right now where they kind of just sit and get rotated out to Glacier. It’s not a particularly smart way to do it, but it was the infrastructure- (not cloud-) way of handling it and it prevented disk fill up. Once crash reporting is done, then some of the application logging stuff won’t be as big of a deal and will be normal traffic that I can trend on, including important things like 404 and 500 tracking, which we don’t do today.

Removal of the bastion server (Next year) - The bastion server is paid for the entire year, so before that subscription gets re-upped, it should be moved to Amazon Lambda jobs. This is kind of a complicated task since Lambda doesn’t support perl, but there are a number of articles on how to create custom receivers, and I get what needs to be done. Since there won’t be any more server stuff, the crash reporting is going to be particularly key for how I debug problems in what will be an increasingly serverless swarm.

Push for serverless computing (The Future) - Once the application is in a good enough state so that it doesn’t have prohibitive mod_perl-reliant caching needs inside of an Amazon Lambda micro VM, then we can start to eliminate the webheads to save on cost, and use the serverless tech to make it happen. The whole topic is some incredibly interesting stuff, and is going to reduce our costs to operate by about half. This is blocked on the application rewrites as we need to make the application run as a stand-alone app server, and I don’t know how the CGI.pm current argument parsing stuff interfaces with Amazon API and Lambda functionality.

Track 2: Application

Phase 1a: Object model conversion (Ongoing)

The entire object model of E2 is about halfway rewritten, with most of the tricky design and emulation pieces underway. Within the code exists an entirely separate execution path which utilizes real controllers in the MVC, or Model-View-Controller application paradigm. This creates a “strangle pattern” where I need to sweep through the older code blocks and convert them over. It’s going to mean an end to a lot of the traditional internal constructs of htmlcodes and some of the more complicated display patterns such as show content are going to become templatized blocks which are easier to understand and modify.

There’s a few outcomes from this, the first of which is that objects have properties that make modifying the backend storage very easy. So instead of relying that every little different implementation is calling castVote() correctly, I am dealing with $user->vote($writeup). Also, each node property such as title refers to some kind of storage, but with the indirection layer, it can be easily changed in the future. So for instance, $node->{title} kind of relied on the engine always building a Hashref the same way, and finding every instance that calls the raw keys was really hard. With $node->title, it becomes a subroutine which returns the title from the backend store. It makes lazy-loading data easy: I don’t have to go and get “title” until I need it, and it can be stored in any table I choose. It can be VARS, nodeParam, or the raw node table itself. It also allows super cool constructs like $writeup->parent->title and other very clean ways of validating data that goes into DB storage.

This is based on the ultra-cool power of Perl Moose, which is a goofy name for a modern object system in Perl. It adds a fair amount of compile and execution time to the project, but it is largely negated by the speed impact of not having to parse and evaluate code over and over again.

At the end of the day, it’s going to be simpler to debug, simpler to modify, and much easier to test so that I know that new sweeping changes can be made without breaking anything.

Phase 1b: Templates for everything! (Ongoing)

Right now the HTML for E2 is generated in a pile of different mechanisms, including the old container system, the nodelet blocks, various htmlcode subroutines and the hundreds of documents laying around. Some of these documents were holders for pieces of functionality, especially administrative functionality like editing utilities, that aren’t a part of the core. They attempt to directly muck with the database in kind of scary and potentially unsupported ways, but they need to be converted to real supported functions that have a layer of indirection and checking on things.

This means chopping these scripts into templates and having a controller system which cleanly feeds those templates the display information it needs. This is a bit of an interim step, but the controllers will then have very clean ways to be ported to Javascript-based controllers in the future; the separation is actually the hard part. It also means going through them with a fine-toothed comb and throwing away the junk that we don’t need. If you visit a page like A year ago today or Supported HTML Tags, that’s the new system. It’s got a very kind of minor quirks that I’m polishing up, but that’s fully generated by actual templates doing the work.

Currently a lot of that code is evaluated still in the database, so this pulls it out, and while some of these pages are infrequently used to not be a performance detriment, it means being able to atomically update the codebase with a single push from git. Updates are currently handled via a series of Ruby deployment scripts that call the AWS Opsworks recipe functions, which refresh the servers and restart everything. It’s completely transparent to the user and takes about a minute to safely update everything.

Without the atomic updates, I have to write a bunch of emulation code to make sure that everything works or that the site knows about a/b states so that the database then gets updated to use different code. It’s easier to just work out of the git repo, and eventually when everything is just a proper controller, it’ll be much easier to publish features and issue bugfixes.

This will mean the eventual death of editing anything through the web interface other than certain administrative settings. Those current interfaces are more of a bug than a feature, but I don’t mean to take knobs away from anyone who has them today; they'll just be more streamlined.

Track 3: Interface (Upcoming)

Phase 1: Consolidate the Javascript

Currently there’s Javascript everywhere in the application. Previous cuts at doing functionality plugins were to have different Javascript libraries that got loaded and would make things work or not work. The major ones are the ajaxUpdate code and some of the editor stuff, but various pages have snippets that need to be brought back into civilization. We’re built on top of jQuery, so instead of having to signal to the page that it needs various javascript components to make it work, the goal would be to get everything into one manageable Javascript file that gets versioned in the application. This has already been done for the most part, but there is a little bit more work to do in terms of pulling in the stray pieces and making sure that they don’t trigger on their presence, but on the presence of some preference flag or some other key to loading.

Phase 2: Modernize 3rd party JS

We’re on a very old version of jQuery, and it needs to be updated. There’s various bridging libraries that exist out there to help us do the right thing, but there are obvious incompatibilities in the way that our JS is being used today. This is a somewhat small task, but it’s very important; it will enable the modern libraries that accompany CSS frameworks like Bootstrap that I think are great ways to get quick wins on looking good.

Phase 3: Convert Javascript to use updated APIs / eliminate ajaxupdate

Once everything is in one place, before the last of the opcode stuff can be turned down, the ajaxupdate code needs to be moved over from wrapping individual opcodes to using the documented and versioned APIs that are largely now present. There is more testing work and some integration stuff that needs to happen before we go all in on this framework, but it is largely in place, and is used in several small testbeds within the application. This is a basic rewrite of a lot of the code, but once things are largely in templates, then the default can be to refactor down to the javascript and pull them from the templates that they then reside in. The potential future home for E2 is in a React.js based application, so this will set up a virtuous refactor pattern.

This also includes having a full API testing suite around them so that the system can detect breakage in the code before it gets shipped. Most modern apps do this, and while the current codebase has hundreds of tests to catch this, it’s still very nascent and the backlog is filled with a lot of reminders to write tests.

Phase 4: Update CSS and utilize Bootstrap for mobile-first layout

I have been quietly paring down some of the customization features in E2 in order to make a single clean master-layout. We might support alternate color options, like the very popular dark mode in macOS Mojave or whatever. This will reduce the amount of variable paths we have to account for in how we deal with the upgrade. There are a lot of design principles here, but in general, we want the following:

  • No more nodelets - Tools are available on objects if necessary and core features are well understood parts about pages. The chatterbox is a unique design, but I look towards Facebook messenger as well as other novel potential integrations such as Slack or Discord for ways to help users communicate.
  • Use Bootstrap - There are a number of basic features of Bootstrap which create very feature-filled websites that are mobile-first. Many even professional sites use it, so there’s no reason not to jump on that bandwagon.
  • Icons and Glyphs instead of text flags - One of E2’s odd anachronisms is the IRC-like use of text flags for a number of things. For instance, we understand that C! means “cool”, but that’s ultimately kind of awful design. A number of free or inexpensive front libraries exist, including Font Awesome and Fontastic that you’ve seen all over the place, but didn’t realize it was a commercial product until now

Track 5: The Backend Database

Finally, the last bit of debt we are dragging around is the backend database. It’s the least impactful, because of some mechanisms that have been put in place around it, but it is still very hard to touch and very hard to commit changes to if there was ever sweeping feature changes to be done. While this work is still in the future, there are a few principles to follow:

  • Tables joining should go away unless needed
  • VARS should be JSON
  • Params should also be JSON
  • Heck, maybe everything should be JSON

Conclusion

Thanks for reading through this 8-page manifesto on the current state of the union here on the site. I wanted to answer the questions of what comes before a new interface, but I wanted to be open book about my thoughts and my priorities. They are certainly up for discussion and there may be some ways to accelerate the front-end fixes once a good deal of the work has been done; it’s not an all or nothing deal. Thank you to everyone for your continued contribution to the community; I hope to keep it maintained and well-run from a technical perspective for years to come.

--jaybonci