• Some things that I learned making GitLab Pages HTTPS-only

    This week, my most significant contribution yet to the GitLab project was merged into master.

    The contribution adds the ability for GitLab Pages websites to be served only over HTTPS connections, with plain HTTP requests 301-redirected to their secure counterpart.

    The changes required to make this work were not trivial, and actually required two separate patches to be merged: one in Ruby and a second in Go.

    Here’s a random sampling of some things that I learnt during the development process.

    GitLab is a fun project to hack on

    The GitLab application consists of a Rails monolith plus a number of secondary services, including a dedicated HTTP server for GitLab Pages which is written in Go.

    GitLab Pages websites are typically auto-generated by the GitLab CI/CD system. The artifacts of this process are static website assets, such as HTML files, which are served by the Pages server.

    Implementing HTTPS-only pages required two separate changes to the overall application.

    Firstly, the Rails monolith had to be altered to write additional metadata to a per-project configuration file, indicating whether the project’s Pages website should be forced to HTTPS or not. Supporting this were the user interface changes to allow a user to enable or disable the behaviour, and the usual layers of automated testing.

    Secondly, the Go HTTP server had to be updated to parse the newly-added configuration, and adapt its behaviour accordingly (that is, serve a 301-redirect when appropriate).

    I love the challenge of implementing a distributed feature across multiple technical domains. Being forced to consider the interaction between the services, error-handling, reliability and so on means that it’s possible to learn much more than the sum of the two individual pull requests.

    Additionally, GitLab’s code quality in general feels very good. The Rails app is written in a modern style, with heavy use of service objects and comprehensive automated testing, including a very slick parallelized CI flow.

    Finally, the GitLab community appears to be very professional and friendly, and contributing a significant feature was a very positive experience. Thanks guys!

    Go is a nice language

    Generally I distrust Google as a corporation, and this political bias had led me to be somewhat suspicious about Go as a language. This turns out to have been a fallacy, and it’s been great to dispel it.

    Go is a much nicer language than I unfairly expected it to be. In its syntax, basic types and use of pointers it immediately reminded me of C, which made me feel comfortable very quickly.

    I also found the Go toolchain to be very productive and enjoyable (although given Google’s investment in the ultra-slick Android toolchain this probably shouldn’t be a surprise). Being able to import modules directly from Git repositories, and auto-format code from the command-line are just two simple examples.

    This project helped me to understand why Go has become so popular, and when it might be considered the right tool to choose in the future.

    Parameterized RSpec tests are a great thing

    When writing RSpec tests, it is common to start with this kind of pattern:

    RSpec.describe "something" do
      before do
        send_request(path, "GET")
      end
    
      context "path is /" do
        let(:path) { "/" }
    
        it { is_successful }
      end
    
      context "path is /abc" do
        let(:path) { "/abc" }
    
        it { is_successful }
      end
    end

    This feels simple and reasonably readable, until a second contextual variable is introduced. Not only do we quickly see code-repetition being introduced, but the nesting also starts to feel unwieldy:

    RSpec.describe "something" do
      before do
        send_request(path, method)
      end
    
      context "path is /" do
        let(:path) { "/" }
    
        context "method is GET" do
          let(:method) { "GET" }
          it { is_successful }
        end
    
        context "method is POST" do
          let(:method) { "POST" }
          it { is_not_successful }
        end
      end
    
      context "path is /abc" do
        let(:path) { "/abc" }
    
        context "method is GET" do
          let(:method) { "GET" }
          it { is_successful }
        end
    
        context "method is POST" do
          let(:method) { "POST" }
          it { is_not_successful }
        end
      end
    end

    Introducing even more contextual levels exacerbates the problem further.

    Enter RSpec::Parameterzied, which allows the same examples to be defined using a simple, flat table instead:

    RSpec.describe "something" do
      using RSpec::Parameterized::TableSyntax
    
      where(:path, :method, :success) do
        "/"    | "GET"  | true
        "/abc" | "GET"  | true
        "/"    | "POST" | false
        "/abc" | "POST" | false
      end
    
      with_them do
        before do
          send_request(path, "GET")
        end
    
        it "returns the expected response" do
          expect(response.success).to eq success
        end
      end
    end

    RSpec will now run each of the rows in the table as a separate example, with inputs corresponding to the column values.

    Although a downside of this approach is that debugging the specs can be more challenging, in test-heavy applications with a lot of contextual variants this feels like a useful option.

    Conclusions

    Implementing HTTPS-only GitLab Pages has been a great learning experience, and a lot of fun! I’d highly recommend anybody who wants to hack on a non-trivial, backend-centric Rails app to check out the GitLab Contributing page and get involved.

  • Launching Fargate instances from AWS Lambda

    Over the last few days, I’ve been experimenting with serverless web application development. This has included testing out Fargate, Amazon’s new managed container deployment service, and the more established Lambda and API Gateway services.

    The end result that I’ve been trying to achieve is to use Lambda to launch, via an API Gateway endpoint, a one-off asynchronous container execution on Fargate. So far, I’ve managed to put most of the jigsaw pieces together with only one major blocking experience.

    The problem was giving the Lambda execution role the requisite permissions to launch ECS instances (which can include Fargate instances) automatically. There are a couple of blog posts on the subject out there, and one in particular that states that the Lambda role needs two policies: one that allows the ecs:RunTask action on the relevant resources, and another that adds the iam:PassRole that allows the ecs:RunTask role to be passed onto the task execution service itself.

    This seems reasonably clear, but several hours of trying to make the Visual Editor apply the policies correctly I was still receiving permissions errors when Lambda tried to launch the Fargate container.

    User: arn:aws:sts::123456789012:assumed-role/my-lambda-func-role/myFunc is not authorized to perform: iam:PassRole on resource: arn:aws:iam::123456789012:role/ecsTaskExecutionRole
    

    The solution ended up to be very simple. Switching to the JSON view in the Visual Editor showed that the generated JSON was pretty far away from what I was expecting it to be. Directly editing the JSON, and pasting in the following policies, resolved the issue immediately.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": [
            "ecs:RunTask"
          ],
          "Resource": [
            "*"
          ]
        },
        {
          "Sid": "VisualEditor1",
          "Effect": "Allow",
          "Action": [
            "iam:PassRole"
          ],
          "Resource": [
            "arn:aws:iam::123456789012:role/ecsTaskExecutionRole"
          ]
        }
      ]
    }
  • Adding a Jekyll helper script

    After a month or so of blogging with Jekyll, I’m very happy with the platform. In particular, I feel that having dived much deeper into its internal workings than I had done before, I expect it to be able to support my pretty basic blogging needs well into the future.

    One minor annoyance has been the lack of an easy way to generate new post templates. This is an irritation because having to manually create a new file and manually fill in all the front matter values adds friction to a process that should be as fluid as possible.

    So today I’ve added a simple but useful script to my Jenkins repo, which allows a new post to be generated from the command line with minimal effort.

    It provides the following interface:

    Usage: new [options]
        -t, --title TITLE                Title for post
        -s, --slug SLUG                  Slug for post
        -l, --layout LAYOUT              Layout for post
        -c, --category CATEGORY          Add category for post (can be specified multiple times)
        -f, --force                      Overwrite existing file
    

    So this command:

    ./bin/new -t "Here's a new post" -s new-post-here -c tutorials -c random
    

    generates a new post with an appropriate date and the following front matter:

    ---
    title: Here's a new post
    slug: new-post-here
    layout: post
    categories: tutorials random
    date: '2018-01-30 00:00:00 +0000'
    ---
    
    Hello reader...
    

    I expect that this will make it even easier to post on this blog! If you could find this useful, it’s available on GitHub.

  • Using pg_isready to optimize test runs

    While working with Docker Compose to set up a containerized RSpec testing environment today, I ran into a problem.

    RSpec, running in one container, was intermittently trying to access the PostgreSQL database, located in a second container, before it had finished booting. Naturally, this was causing sporadic connection errors and test failures.

    The testing environment was distributed, with many discrete Docker Compose applications running in parallel. And PostgreSQL’s launch time varied considerably, sometimes being under a couple of seconds, but occasionally taking ten seconds or more.

    This meant that naively setting a timeout (sleep 5 && bundle exec rspec…) was an unsatisfactory approach. It would have increased the run time of every test environment, but without completely solving the problem.

    While researching the most appropriate solution, I discovered a useful tool. pg_isready allows a PostgreSQL database’s status to be queried quickly and simply from a script or the command line.

    Once I’d found it, using it in the RSpec container’s entrypoint script was simple:

    #!/bin/bash
    
    set -e
    echo "Launching Rails $RAILS_ENV environment with target: $@"
    
    until pg_isready -q -d my_database_name
    do
      echo "Waiting for database to be ready..."
      sleep 1
    done
    
    bundle exec rake db:reset
    bundle exec rspec $@
  • 2017 roundup part 3: routing_report

    A tiresome but essential aspect of managing large-scale applications is keeping track of cruftsuccinctly defined by Wikipedia as

    anything that is left over, redundant and getting in the way.

    Ruby on Rails applications are naturally no exception to this principle. Let’s take two of the most elemental components of a Rails app: routes and actions.

    Rails will not explicitly warn you if a route is added that has no corresponding action, or vice versa. Instead it will simply allow your app to accumulate cruft. Although trying to access such an action or route will immediately throw a RoutingError or return a 404 response, this does not help to avoid the number of unnecessary routes and actions from increasing over time.

    A disciplined approach to development and careful code review can help to avoid this fate, but the larger an application gets—and the more developers that work on it—it becomes ever more likely that superfluous routes or actions will be collected.

    This is where another gem that I released in 2017, routing_report, comes in.

    The gem adds a single Rake task to your Rails application. When you run the task, a report is generated that highlights both routes that have no reciprocal actions, and actions that have no corresponding routes.

    To install the gem, just add it to your Gemfile:

    group :development do
      gem 'routing_report'
    end

    And discover unwanted routes and actions by running the Rake task:

    rake routing_report:run

    Here is the report for the popular open source Rails app GitLab:

    +------------------------------------------------------------------------------+
    | Routes without actions (18)                                                  |
    +------------------------------------------------------------------------------+
    | doorkeeper/token_info#show                                                   |
    | doorkeeper/tokens#create                                                     |
    | doorkeeper/tokens#revoke                                                     |
    | oauth/applications#edit                                                      |
    | oauth/applications#show                                                      |
    | oauth/authorizations#show                                                    |
    | projects/badges#index                                                        |
    | projects/boards#create                                                       |
    | projects/boards#destroy                                                      |
    | projects/boards#update                                                       |
    | projects/boards#update                                                       |
    | projects/branches#new                                                        |
    | projects/merge_requests#diff_for_path                                        |
    | projects/milestones#sort_issues                                              |
    | projects/milestones#sort_merge_requests                                      |
    | projects/services#index                                                      |
    | projects/tags#new                                                            |
    | snippets/notes#delete_attachment                                             |
    +------------------------------------------------------------------------------+
    
    +------------------------------------------------------------------------------+
    | Actions without routes (57)                                                  |
    +------------------------------------------------------------------------------+
    | application#not_found                                                        |
    | application#redirect_back_or_default                                         |
    | devise#_prefixes                                                             |
    | devise/confirmations#create                                                  |
    | devise/confirmations#new                                                     |
    | devise/confirmations#show                                                    |
    | devise/omniauth_callbacks#failure                                            |
    | devise/omniauth_callbacks#passthru                                           |
    | devise/passwords#create                                                      |
    | devise/passwords#edit                                                        |
    | devise/passwords#new                                                         |
    | devise/passwords#update                                                      |
    | devise/registrations#cancel                                                  |
    | devise/registrations#create                                                  |
    | devise/registrations#destroy                                                 |
    | devise/registrations#edit                                                    |
    | devise/registrations#new                                                     |
    | devise/registrations#update                                                  |
    | devise/sessions#create                                                       |
    | devise/sessions#destroy                                                      |
    | devise/sessions#new                                                          |
    | doorkeeper/applications#create                                               |
    | doorkeeper/applications#destroy                                              |
    | doorkeeper/applications#index                                                |
    | doorkeeper/applications#new                                                  |
    | doorkeeper/applications#update                                               |
    | doorkeeper/authorizations#create                                             |
    | doorkeeper/authorizations#destroy                                            |
    | doorkeeper/authorizations#new                                                |
    | doorkeeper/authorized_applications#destroy                                   |
    | doorkeeper/authorized_applications#index                                     |
    | omniauth_callbacks#authentiq                                                 |
    | omniauth_callbacks#cas3                                                      |
    | omniauth_callbacks#failure_message                                           |
    | omniauth_callbacks#ldap                                                      |
    | omniauth_callbacks#saml                                                      |
    | profiles/notifications#user_params                                           |
    | projects/branches#recent                                                     |
    | projects/git_http_client#actor                                               |
    | projects/git_http_client#authenticated_user                                  |
    | projects/git_http_client#authentication_abilities                            |
    | projects/git_http_client#authentication_result                               |
    | projects/git_http_client#redirected_path                                     |
    | projects/git_http_client#user                                                |
    | projects/merge_requests/conflicts#authorize_can_resolve_conflicts!           |
    | projects/network#assign_commit                                               |
    | projects/protected_refs#create                                               |
    | projects/protected_refs#destroy                                              |
    | projects/protected_refs#index                                                |
    | projects/protected_refs#show                                                 |
    | projects/protected_refs#update                                               |
    | sherlock/application#find_transaction                                        |
    | sherlock/file_samples#show                                                   |
    | sherlock/queries#show                                                        |
    | sherlock/transactions#destroy_all                                            |
    | sherlock/transactions#index                                                  |
    | sherlock/transactions#show                                                   |
    +------------------------------------------------------------------------------+
    
    

    The results aren’t foolproof yet—for example, gems like Devise and Doorkeeper that satisfy routes in unconventional ways tend to generate false positives. This would need to be improved before the gem could be extended to be used programatically, for example as part of an continuous integration flow. (Pull requests are welcome by the way!)

    But for manually highlighting redundant actions and dispensable routes the gem is already a useful tool.

    You can find it on GitHub here: https://github.com/rfwatson/routing_report.

  • 2017 roundup part 2: rack-filter-param

    Hot on the heels of thes is another Ruby gem: this time containing some Rack middleware.

    rack-filter-param solves a simple problem. As the name suggests, it allows HTTP parameters to be filtered from incoming requests based on arbitrary application logic.

    The middleware needs only the most minimal configuration. For example, here it is being configured to strip the client_id parameter from incoming requests, based on some assumed application-level logic:

    use Rack::FilterParam, {
      param: :client_id,
      if: -> (value) { should_I_do_it?(value) } 
    }

    When the application method should_I_do_it? returns a truthy value, the client_id parameter is stripped from the request, which is then passed upstream.

    It is a library born of a real-world refactoring problem. A legacy authentication library, that had been part of Mixlr’s API, was being replaced with a newer library. While doing so it was discovered that a particular HTTP querystring parameter, hard-coded into numerous client apps, triggered some unwanted and buggy behaviour in the new library.

    Fixing the problem by pushing out updates to the client apps, and removing the parameter completely, wasn’t an option. That would have left the app broken for all users who didn’t or couldn’t update to the new version. And patching the new library felt wrong too, because it would only increase future maintenance burden and make it more difficult to keep the library up-to-date.

    The next step was to attempt to remove the parameter using Nginx, but the unwanted behaviour was only triggered when the parameter had certain values. In other words, the removal of the parameter depended on application logic, and application logic belongs in the application and not hidden in the front-end web server configuration.

    So the correct solution appeared to be to manipulate the request at the Rack middleware layer, which is part of the application but kicks in early enough to be able to remove the parameter in good time. And extracting it as a gem kept the application itself clean of legacy code paths, and allowed it to be re-used by anybody in the future.

    You can read more and browse the source code over on my GitHub page.

    https://github.com/rfwatson/rack-filter-param

  • 2017 roundup part 1: Thes gem

    When writing, I often need to refer to a thesaurus. But I find visiting the best online option, www.thesaurus.com, to be an unpleasant and distracting experience.

    So I decided to write a tool to improve matters. It’s called thes and while it’s a very simple utility, it’s also one that I’ve found consistently useful and time-saving over the last few months.

    The idea is simple: simply enter thes <some search term>, and the gem will search www.thesaurus.com and print the results straight back to the console in table format. Think of it as a command-line thesaurus client, with no bandwidth-wasting JavaScript, assets and adverts or annoyingly colourful design!

    Here’s an example of entering thes incredible:

    Thes

    Thes is available as a gem:

    gem install thes
    

    or, alternatively, on GitHub: https://github.com/rfwatson/thes

  • Adding privacy-conscious share buttons to Jekyll

    Recently, I’ve been getting familiar with Jekyll, the static site/blog generator.

    Firstly, Jekyll is a really great free software project! I’ve deployed websites generated by Jekyll before, but I haven’t spent much time diving beneath the service and getting to know the library itself. Now I’ve spent some time with it I really like the way it works internally — the interface feels very clean, and the plugin API it exposes which opens up a lot of possibilities for creativity when building blogs.

    To help familiarize myself with it, I’ve written a Jekyll plugin for the first time. It’s called jekyll-stealthy-share, and it adds Liquid tags to Jekyll that will inject simple, HTML-only share buttons into any static page.

    The share buttons are HTML-only. This is because I have no wish to inject JavaScript code from Facebook, Twitter et al into my blog, nor pass the privacy implications of doing that onto my visitors. Additionally, rendering the buttons myself ensures that I have full control over their appearance.

    Here’s the share buttons in action:

    The plugin does two things:

    • Adds {% stealthy_share_buttons %} and {% stealthy_share_assets %} Liquid tags, that can be injected into posts at will
    • Adds a /assets/share.css static file, using a Jekyll generator

    The share buttons can also be customized and re-ordered, and new sharing templates added in your own site’s content.

    See jekyll-stealthy-share on GitHub for more details.

  • Notes from setting up a Jekyll blog on AWS S3 and CloudFront

    Here are my notes from today’s task of setting up this blog.

    • Using Jekyll
      • Easy to setup
      • Built-in syntax highlighting
      • write/commit/push workflow
      • easy to deploy as static website
      • Compatible with GitHub and GitLab pages
    • Theme
    • Hosting
      • Considered GitHub pages, GitLab pages and static hosting using AWS S3
      • Requirements:
        • free (gratis)
        • easy-to-maintain, preferably static/serverless
        • Basic HTTPS support, with redirect from HTTP
      • GitLab Pages
        • Free static site hosting comparable to GitHub Pages
        • Pros:
          • Powerful containerized CI/build system
          • HTTPS support for static site, even on custom domains
        • Cons:
          • No option to force HTTPS
          • Tricky to get LetsEncrypt TLS certs to work with CI/build system - official tutorial is slightly out-of-date
      • GitHub pages
        • Pros:
          • Popular and well-documented
          • HTTPS support, including option to force HTTPS
        • Cons:
          • No HTTPS support for custom domains
      • AWS S3
        • Pros:
          • less managed than GitLab/GitHub - so a bit more setup, but also more flexibility
          • HTTPS support, as well as supporting “redirect objects” which effectively allow us to set up arbitrary 301 redirects on our site programatically.
          • supports multiple subdomains
        • Cons:
          • Not quite free, but very low cost for low usage
      • Ended up choosing S3
        • Created bucket, enabled static website hosting option
        • Added Cloudfront web distribution
          • Follow approach here to avoid 403 errors from S3 when serving paths without an explicit index.html, like /about
        • Added simple deploy script making use of AWS CLI
    • DNS
      • Registered domain
      • Added hosted zone to AWS Route 53
      • Updated nameserver records in Namecheap control panel to point at AWS
      • Added alias record to point at Cloudfront distribution
      • Added MX record to point at pre-existing Mailinabox setup

subscribe via RSS