Laravel and Logstash

As we get larger clients, our need to not be cowboying our monitoring / alerting is increasing. In our scenario we are injesting logs via Logstash and sending them all to an AWS Elasticsearch instance, and if it is of severity ERROR we send it to AWS Simple Noticiation Service (which people or services can subscribe to) as well as send them to PagerDuty.

Input
For each of our services we have an input config which basically says ‘consume this file patter, call it a laravel file, and add its stack name to the event.’

input {
  file {
    path => "<%= scope['profiles::tether::www_root'] %>/storage/logs/laravel-*.log"
    start_position => "beginning"
    type => "laravel"
    codec => multiline {
      pattern => "^\[%{TIMESTAMP_ISO8601}\] "
      negate => true
      what => previous
      auto_flush_interval => 10
    }
    add_field => {"stack" => "tether"}
  }
}

Filter
Since its a type laravel file, we pull out the environment its running in, and log severity, plus grab the ip of the instance, build the SNS message subject and make sure the event timestamp is the one in the log, not the time logstash touched the event. (Without that last step, you end up with > 1MM entries for a single day the first time you run things.)

filter {
  # Laravel log files
  if [type] == "laravel" {
    grok {
      match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\] %{DATA:env}\.%{DATA:severity}: %{GREEDYDATA:message}" }
    }
    ruby {
      code => "event.set('ip', `ip a s eth0 | awk \'/inet / {print$2}\'`)"
    }
    mutate {
      add_field => { "sns_subject" => "%{stack} Alert (%{env} - %{ip})" }
    }
    date {
      match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
      target => "@timestamp"
    }
  }    
}

Output
And then we pump it around where it needs to be.

If you are upgrading ES from 5.x to 6.x you need to have the template_overwrite setting else the new schema doesn’t get imported and there was some important changes that were made. The scope stuff is for Puppet to do replacements. And there is a but in 6.4.0 of the amazon_es plugin around template_overwrite…

output {
  amazon_es {
    hosts => ["<%= scope['profiles::laravel::es_host'] %>"]
    region => "us-west-2"
    index => "logstash-<%= scope['environment'] %>-%{+YYYY.MM.dd}"
    template => "/etc/logstash/templates/elasticsearch-template-es6x.json"
	template_overwrite => true
  }
}
output {
  if [severity] == "ERROR" { 
    sns {
      arn => "arn:aws:sns:us-west-2:xxxxxxxxx:<%= scope['environment'] %>-errors"
      region => 'us-west-2'
    }
  }
}

I’m not quite happy with our Pageruty setup as the de-duping is running at an instance level right now. Ideally, it would have the reason for the exception as well but that’s a task for another day.

output {
  if [severity] == "ERROR" { 
      pagerduty {
        event_type => "trigger"
        description => "%{stack} - %{ip}"
        details => {
          timestamp => "%{@timestamp}"
          message => "%{message}"
        }
        service_key => "<%= scope['profiles::laravel::pagerduty'] %>"
        incident_key => "logstash/%{stack}/%{ip}"
      }
  }
}

For the really curious, here is my Puppet stuff for all this. Every machine which has Laravel services has the first manifest, but there are some environments which have multiple services on them which is why the input file lives at the service level.

modules/profiles/manifests/laravel.pp

  class { 'logstash':
    version => '1:6.3.2-1',
  }
  $es_host = hiera('elasticsearch')
  logstash::configfile { 'filter_laravel':
    template => 'logstash/filter_laravel.erb'
  }
  logstash::configfile { 'output_es':
    template => 'logstash/output_es_cluster.erb'
  }
  if $environment == 'sales' or $environment == 'production' {
    logstash::configfile { 'output_sns':
      template => 'logstash/output_sns.erb'
    }

    $pagerduty = lookup('pagerduty')
    logstash::configfile { 'output_pagerduty':
      template => 'logstash/output_pagerduty.erb'
    }
  }
  unless $environment == 'development' {
    file { [ '/etc/logstash/templates' ]:
      ensure => 'directory',
      group  => 'root',
      owner  => 'root',
      mode   => 'u=rwx,go+rx'
    }

    file { [ '/etc/logstash/templates/elasticsearch-template-es6x.json' ]:
      ensure => 'present',
      group  => 'root',
      owner  => 'root',
      mode   => 'u=rwx,go+rx',
      source => 'puppet:///modules/logstash/elasticsearch-template-es6x.json',
      require => Class['Logstash']
    }

    logstash::plugin { 'logstash-output-amazon_es': 
      source => 'puppet:///modules/logstash/logstash-output-amazon_es-6.4.1-java.gem',
      ensure => '6.4.1'
    }
  }

modules/profiles/manifests/.pp

  logstash::configfile { 'input_tether':
    template => 'logstash/input_tether.erb'
  }

The next thing I need to work on is consuming the ES data back into our app so we don’t have to log into Kibana or the individual machines to see the log information. I think every view-your-logs solution I’ve seen for Laravel has been based around reading the actual logs on disk which doesn’t work in a clustered environment or where you have multiple services controlled by a hub one.

Structured Logs in Laravel (Part 2)

The previous post showed you how to tweak Laravel’s default logging setup to output in json which is a key part of creating structured logs. With structured logs you can move yourself towards a more Observable future by tacking on a bunch of extra stuff to your logs which can then be parsed and acted upon by your various systems.

And Laravel supports this out of the box — it just isn’t called out that obviously in the docs. (It is the ‘contextual information’ section on the Logging doc page. (Or ‘Errors and Logging’ for pre-5.6 docs). Basically, you create an array and pass it as the second argument to your logging call and it gets written out in the ‘extras’ part of the log.

ubuntu@default:/var/www/tether$ sudo php artisan tinker
Psy Shell v0.9.7 (PHP 7.1.20-1+ubuntu16.04.1+deb.sury.org+1 — cli) by Justin Hileman
>>> use Ramsey\Uuid\Uuid;
>>> $observationId = Uuid::uuid4()->toString();
=> "daed8173-5bd0-4065-9696-85b83f167ead"
>>> $structure = ['id' => $observationId, 'person' => 'abc123', 'client' => 'def456', 'entry' => 'ghi789'];
=> [
     "id" => "daed8173-5bd0-4065-9696-85b83f167ead",
     "person" => "abc123",
     "client" => "def456",
     "entry" => "ghi789",
   ]
>>> \Log::debug('some debug message here', $structure);
=> null

which gets output like this

{"message":"some debug message here","context":{"id":"daed8173-5bd0-4065-9696-85b83f167ead","person":"abc123","client":"def456","entry":"ghi789"},"level":100,"level_name":"DEBUG","channel":"development","datetime":{"date":"2018-09-03 18:31:31.079921","timezone_type":3,"timezone":"UTC"},"extra":[]}

Of course there is no ‘standard’ for structured logs (nor should there be as they really are context sensitive), but most of the examples I’ve seen all have some sort of id for giving context for tracing things around.

Note: The id in this case is solely for dealing with logs message output. This is not for application request tracing which I think is also really interesting but have not delved into yet.

Structured Logs in Laravel

I’ve been following the likes of Charity Majors on the twitters and one of her big things around Observability is producing logs in a ‘structured’ format. (Loosely defined as ‘something that a machine can easily read and make decisions on.)

Out of the box, Laravel ships with a logging system that uses Monolog and its LineFormatter which is Apache-esque.

const SIMPLE_FORMAT = "[%datetime%] %channel%.%level_name%: %message% %context% %extra%\n";

Which means regexes to parse, etc. but they are designed more for human consumption than machine consumption.

The hints of how to change the format to a structured (json) one is right in the docs but as the expression goes, ‘an example would be handy here’. So here you go.

/*
|--------------------------------------------------------------------------
| Logging Changes
|--------------------------------------------------------------------------
|
| Structured logs ftw
|
*/
$app->configureMonologUsing(function ($monolog) {
    $days = config('app.log_max_files', 5);
 
    // default 
    $path = storage_path() . '/logs/laravel.log';
    $handler = new RotatingFileHandler($path, $days);
    $handler->setFormatter(new LineFormatter(null, null, true, true));
    $monolog->pushHandler($handler);
 
    // structured
    $path = storage_path() . '/logs/laravel.json';
    $handler = new RotatingFileHandler($path, $days);
    $handler->setFormatter(new JsonFormatter());
    $monolog->pushHandler($handler);
});

Drop this right before the ‘return $app;’ in bootstrap/app.php and you’ll have two logs, one the default way, and one the new structured way. I’m including both at the moment because we have a bunch of log capture / manipulation stuff around the default structure I haven’t changed yet. Once thats all updated I’ll get rid of the default section.

There’s been a bunch of noise in the Laravel world around ‘supporting the enterprise’. Adding a ‘log format’ is one of those small enterprise support things that adds huge value.

(And yes, I know, pull requests are welcome, but until then, here is a blog post.)

mobilexco/laravel-scout-elastic; an AWS Elasticsearch driver for Laravel Scout

A large piece of what we’ll be doing the last half of this year is improving the support workflows inside Tether (our MarTech platform) and that includes Search. Being a Laravel shop, it makes sense to start with Scout to see if that gets us close, if not completely over the line.

We use Elasticsearch for other things in Tether so it made sense to use that as the Scout backend through the super helpful ErickTamayo/laravel-scout-elastic package. And it worked as advertised right out of the box for local development (inside Vagrant with a local Elasticsearch server). But as soon as we moved the code to a cloud environment that used an AWS Elasticsearch instance it was throwing all sorts of wacky errors. Turns out, AWS Elasticsearch is different than Elastic Elasticsearch — not completely, just how communication is sent over the wire.

No problem. we’re clearly not the first one to discover this problem, and sure enough there are a number of forks of the package that add this in. But, they commit one of the following sins;

  • Required AWS credentials to be checked into your repo in a .env file
  • Used env() to fetch configuration settings which breaks if you are caching configs (and you really should be)
  • Required manual intervention with Elasticsearch while deploying.

These are the result of Laravel being in the awkward teenage years, and all very solvable. I just wish it didn’t seem that things I need require these fixes…

Anyhow, mobilexco/laravel-scout-elastic uses the defaultProvider() which means it will work with IAM roles on your AWS infrastructure to authenticate with Elasticsearch. This the official AWS recommended approach and does not require the presence of keys on the server (and all the pain around rotation, etc. that comes with using keys).

It also publishes conf/laravel-scout-elastic.php for setting flags it needs to decide whether to use Elastic or AWS implementations of Elasticsearch rather than env() so config:cache works. (This should likely be better called out in the Laravel docs for creating packages…)

The package also includes a new Artisan command (scout:create-index) which can be called by via something like AWS CodeDeploy (in the AfterInstall hook) to ensure the index is created that Scout will be using. Which is useful if you are in an environment where your Elasticsearch access is restricted to only the boxes that need access and those boxes don’t have ssh installed on them. (Artisan commands are run either CodeDeploy or SSM.)

Hopefully this saves someone the 12 hours of distracted development it took to come up with this solution.

Client-specific domains with CloudFormation for clients that use Google as email provider

A number of our clients want vanity domains for their experiences, which adds a laywer (or two) of operations overhead beyond just having a line item in the invoice. In the spirit of ‘infrastructure as code’-all-the-things, this is now my process for registering new domains for our clients

  1. Register the domain through Route 53
  2. Delete the hosted zone that is automatically created. (It would be nice if there was an option when getting the domain to not automatically create the hosted zone.)
  3. Login to Google Apps and add the domain as an alias. When prompted to verify, choose Ghandi as the provider and get the TXT record that is needed
  4. Create a CloudFormation stack with this template. Some interesting bits;
    • Tags in the AWS Tag Editor are case sensitive so ‘client’ and ‘Client’ are not equivilent
    • I think all my stacks will include the ‘CreationDateParameter’ parameter from now on which gets added as a tag to the Resource[s] that can accept them. This is part of the ‘timebombing’ of resources to make things more resilient. In theory I can also use AWS Config to find Resources that are not tagged and therefore presumably under CloudFormation control.
    • Same thing for the ‘client’ tag. Though still nto keen on that name or billing_client or such.
    {
      "AWSTemplateFormatVersion": "2010-09-09",
      "Parameters": {
        "ClientNameParameter": {
          "Type": "String",
          "Description": "Which client this domain is for"
        },
        "DomainNameParameter": {
          "Type": "String",
          "Description": "The domain to add a HostedZone for"
        },
        "GoogleSiteVerificationParameter": {
          "Type": "String",
          "Description": "The Google Site Verification TXT value"
        },
        "CreationDateParameter" : {
          "Description" : "Date",
          "Type" : "String",
          "Default" : "2017-08-27 00:00:00",
          "AllowedPattern" : "^\\d{4}(-\\d{2}){2} (\\d{2}:){2}\\d{2}$",
          "ConstraintDescription" : "Date and time of creation"
        }
      },
      "Resources": {
        "clienthostedzone": {
          "Type": "AWS::Route53::HostedZone",
          "Properties": {
            "Name": {"Fn::Join": [".", [{"Ref": "DomainNameParameter"}]]},
            "HostedZoneTags": [
              {
                "Key": "client",
                "Value": {"Ref": "ClientNameParameter"}
              },
              {
                "Key": "CloudFormation",
                "Value": { "Ref" : "CreationDateParameter" }
              }
            ]
          }
        },
        "dnsclienthostedzone": {
          "Type": "AWS::Route53::RecordSetGroup",
          "Properties": {
            "HostedZoneId": {
              "Ref": "clienthostedzone"
            },
            "RecordSets": [
              {
                "Name": {"Fn::Join": [".", [{"Ref": "DomainNameParameter"}]]},
                "Type": "TXT",
                "TTL": "900",
                "ResourceRecords": [
                  {"Fn::Sub": "\"google-site-verification=${GoogleSiteVerificationParameter}\""}
                ]
              },
              {
                "Name": {"Fn::Join": [".", [{"Ref": "DomainNameParameter"}]]},
                "Type": "MX",
                "TTL": "900",
                "ResourceRecords": [
                  "1 ASPMX.L.GOOGLE.COM",
                  "5 ALT1.ASPMX.L.GOOGLE.COM",
                  "5 ALT2.ASPMX.L.GOOGLE.COM",
                  "10 ALT3.ASPMX.L.GOOGLE.COM",
                  "10 ALT4.ASPMX.L.GOOGLE.COM"
                ]
              }
            ]
          }
        },
      }
    }
  5. Update the domain’s nameservers for the ones in our newly created Hosted Zone. I suspect this could be done via a Lambda backed custom resource, but that’s a couple steps too complicated for me right now. If I have to do this more than once every couple weeks it’ll be work the learning time.
  6. Validate the domain with Google.
  7. Manually create a certificate for ${DomainNameParameter} and *.${DomainNameParameter}. (For reals, this should be an automatic thing for domains registered in Route 53 and hosted within Route 53.)

And then I need to create an ALB for the domain and point it at the right service. But thats getting rather yak shave-y. The ALB needs to be added to the ASG for the service but those are not under CloudFormation control so I need to get them under control.

HubSpot in an AWS World

We recently moved our corporate website from WPEngine to HubSpot and as part of that, you have to do some DNS trickery. HubSpot helpfully provides instructions for various DNS providers, but not Route 53. Reading the ones they do provide though provides a good idea what is needed;

  1. Add a CNAME for your HubSpot domain as the www record
  2. Add an S3 hosting bucket to redirect everything to www.yourdomain.com
  3. Add a CloudFront distribution to point to your bucket

Now, this is likely 5 minutes of clicking, but AWS should be done with minimal clicking, in favour of using CloudFormation (or TerraForm or such). As such, it took about 10 hours…

Lesson 1 – Don’t create your Hosted Zones by hand.

Currently, all our Hosted Domains in Route 53 were either created by hand as the domains were registered somewhere else, or created at registration time by Route 53. This is a challenge as CloudFormation cannot edit (to add or update) records in Hosted Zones that were not created by CloudFormation. This meant I needed to use CloudFormation to create a duplicate Hosted Zone and let that propagate through the internets and then delete the existing one.

Here’s the CloudFormation template for doing that — minus 70+ individual records. Future iterations likely would have Parameters and Outputs sections, but because this was a clone of what was already there I just hardcoded things.

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Resources": {
    "zonemobilexcocom": {
      "Type": "AWS::Route53::HostedZone",
      "Properties": {
        "Name": "mobilexco.com."
      }
    },
    "dnsmobilexcocom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneId": {
          "Ref": "zonemobilexcocom"
        },
        "RecordSets": [
          {
            "Name": "mobilexco.com.",
            "Type": "MX",
            "TTL": "900",
            "ResourceRecords": [
              "1 ASPMX.L.GOOGLE.COM",
              "5 ALT1.ASPMX.L.GOOGLE.COM",
              "5 ALT2.ASPMX.L.GOOGLE.COM",
              "10 ALT3.ASPMX.L.GOOGLE.COM",
              "10 ALT4.ASPMX.L.GOOGLE.COM"
            ]
          }
        ]
      }
    },
    "dns80808mobilexcocom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneId": {
          "Ref": "zonemobilexcocom"
        },
        "RecordSets": [
          {
            "Name": "80808.mobilexco.com.",
            "Type": "A",
            "TTL": "900",
            "ResourceRecords": [
              "45.33.43.207"
            ]
          }
        ]
      }
    }
  }
}

Lesson 2 – Don’t forget that DNS is all about caching and you could clone a domain and forget to include the MX record because you blindly trusted the output of CloudFormer only to realize you had stopped incoming mail overnight but worked for you because you had things cached…

Lesson 3 – Even though you are using an S3 Hosted Website to do the redirection, you are not actually using an S3 Hosted Website in the eyes of Cloud Front.

This cost me the most amount of grief as it led me to try and create an S3OriginPolicy, an Origin Access Identity, etc. that I didn’t need.

Note: in order to make this template work, you need to first have issued a certificate for your domain through ACM. Which is kinda a pain. My current top ‘AWS Wishlist’ item is auto-provisioning of certificates for domains that are both registered and hosted within your account.

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Parameters": {
    "DomainNameParameter": {
      "Type": "String",
      "Description": "The domain to connect to hubspot (don't include the www.)"
    },
    "HubspotCNameParameter": {
      "Type": "String",
      "Description": "The CName for your hubspot site"
    },
    "AcmCertificateArnParameter": {
      "Type": "String",
      "Description": "ARN of certificate to use in ACM"
    }
  },
  "Resources": {
    "s3mobilexcocom": {
      "Type": "AWS::S3::Bucket",
      "Properties": {
        "BucketName": {"Ref": "DomainNameParameter"},
        "AccessControl": "Private",
        "WebsiteConfiguration": {
          "RedirectAllRequestsTo": {
            "HostName": {"Fn::Join": ["", ["www.", {"Ref": "DomainNameParameter"}]]},
            "Protocol": "https"
          }
        }
      }
    },
    "dnswwwmobilexcocom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneId": {
          "Fn::ImportValue" : "hosted-zone-mobilexco-com:HostedZoneId"
        },
        "RecordSets": [
          {
            "Name": {"Fn::Join": ["", ["www.", {"Ref": "DomainNameParameter"}, "."]]},
            "Type": "CNAME",
            "TTL": "900",
            "ResourceRecords": [
              {"Ref": "HubspotCNameParameter"}
            ]
          }
        ]
      }
    },
    "dnsmobilexcocom": {
      "Type": "AWS::Route53::RecordSetGroup",
      "Properties": {
        "HostedZoneId": {
          "Fn::ImportValue" : "hosted-zone-mobilexco-com:HostedZoneId"
        },
        "RecordSets": [
          {
            "Name": {"Fn::Join": ["", [{"Ref": "DomainNameParameter"}, "."]]},
            "Type": "A",
            "AliasTarget": {
              "DNSName": {"Fn::GetAtt": ["httpsDistribution", "DomainName"]},
              "HostedZoneId": "Z2FDTNDATAQYW2"
            }
          }
        ]
      }
    },
    "httpsDistribution" : {
      "Type" : "AWS::CloudFront::Distribution",
      "Properties" : {
        "DistributionConfig": {
          "Aliases": [
            "mobilexco.com"
          ],
          "Origins": [{
            "DomainName": {"Fn::Join": ["", [{"Ref": "DomainNameParameter"}, ".s3-website-", {"Ref": "AWS::Region"}, ".amazonaws.com"]]},
            "Id": "bucketOriginId",
            "CustomOriginConfig": {
              "HTTPPort": 80,
              "HTTPSPort": 443,
              "OriginProtocolPolicy": "http-only"
            }
          }],
          "Enabled": "true",
          "DefaultCacheBehavior": {
            "ForwardedValues": {
              "QueryString": "false"
            },
            "TargetOriginId": "bucketOriginId",
            "ViewerProtocolPolicy": "allow-all"
          },
          "ViewerCertificate": {
            "AcmCertificateArn": {"Ref": "AcmCertificateArnParameter"},
            "SslSupportMethod": "sni-only"
          },
          "PriceClass": "PriceClass_100"
        }
      }
    }
  }
}

Lesson 4 – Naming conventions are a thing. Use them.

As soon as you start doing ImportValue or AWS::CloudFormation::Stack. In theory the ImportValue lines could use DomainNameParameter with Fn::Sub to switch the . to a – and this would be an entirely generic template, but this is working well enough for me. And of course, your naming convention could be (and likely is) different.

Harmonizing Maintenance Windows

At the moment we are only using RDS and ElastiCache within AWS, but more services we use the more maintenance windows is going to come up. Rather than have them at random places around the week and clock, I figure it would be useful to have just a single window that we can subsequently work into our SLAs etc. Now, I really like the management consoles AWS has, but its a lot of clicks to track things — especially if I start using something like CloudFormation and Autoscaling or something to be making things magically.

Scripting to the rescue.

Our applications are PHP based, but at my heart I’m a Python guy, so I whipped up one. And aside from the fear of modifying running items, it appears to have worked well.

import boto3
 
maintenance_window = 'sun:09:35-sun:10:35'
 
# rds can have maintenance windows
update_rds = False
rds = boto3.client('rds')
paginator = rds.get_paginator('describe_db_instances')
for response_iterator in paginator.paginate():
    print('Current RDS Maintenance Windows')
    for instance in response_iterator['DBInstances']:
        print('%s: %s UTC' % (instance['DBInstanceIdentifier'], instance['PreferredMaintenanceWindow']))
        if instance['PreferredMaintenanceWindow'].lower() != maintenance_window.lower():
            update_rds = True
 
if update_rds == True:
    paginator = rds.get_paginator('describe_db_instances')
    for response_iterator in paginator.paginate():
        for instance in response_iterator['DBInstances']:
            if instance['PreferredMaintenanceWindow'].lower() != maintenance_window.lower():
                rds.modify_db_instance(
                    DBInstanceIdentifier=instance['DBInstanceIdentifier'],
                    PreferredMaintenanceWindow=maintenance_window
                )
    paginator = rds.get_paginator('describe_db_instances')
    for response_iterator in paginator.paginate():
        print('Adjusted RDS Maintenance Windows')
        for instance in response_iterator['DBInstances']:
            print('%s: %s UTC' % (instance['DBInstanceIdentifier'], instance['PreferredMaintenanceWindow']))
 
# elasticache can have maintenance windows
update_ec = False
ec = boto3.client('elasticache')
paginator = ec.get_paginator('describe_cache_clusters')
for response_iterator in paginator.paginate():
    print('Current ElastiCache Maintenance Windows')
    for instance in response_iterator['CacheClusters']:
        print('%s: %s UTC' % (instance['CacheClusterId'], instance['PreferredMaintenanceWindow']))
        if instance['PreferredMaintenanceWindow'].lower() != maintenance_window.lower():
            update_ec = True
 
if update_ec == True:
    paginator = ec.get_paginator('describe_cache_clusters')
    for response_iterator in paginator.paginate():
        for instance in response_iterator['CacheClusters']:
            if instance['PreferredMaintenanceWindow'] != maintenance_window:
                ec.modify_cache_cluster(
                    CacheClusterId=instance['CacheClusterId'],
                    PreferredMaintenanceWindow=maintenance_window
                )
 
    paginator = ec.get_paginator('describe_cache_clusters')
    for response_iterator in paginator.paginate():
        print('Adjusted ElastiCache Maintenance Windows')
        for instance in response_iterator['CacheClusters']:
            print('%s: %s UTC' % (instance['CacheClusterId'], instance['PreferredMaintenanceWindow']))

It’s always a Security Group problem…

I’ve got a number number of private subnets within my AWS VPC that are all nice and segregated from each other. But every time I light up a new Ubuntu instance and tell it to ‘apt-get update’ it times out. Now, since these are private subnets I can get away with opening ports wide open, but AWS is always cranky at me for doing so. I feel slightly vindicated that the same behaviour is asked about on Stack Overflow often too, but anyways, I figured it out this week. Finally. And as usual with anything wonky network-wise in AWS it was a Security Group problem.

  1. First thing, read the docs carefully.
  2. Read it again, more careful this time
  3. Setup the Routing. I actually created 2 custom routing tables rather than modify the Main one; explicit is better than implicit (thanks Python!)
  4. Create an ‘apt’ Security Group to be applied to the NAT instance with the inbound rule, from your private VPC address space for HTTP (80), HTTPS (443) and HKP (11371). HTTP is the default protocol for apt but if you are adding new repos the key is delivered via HTTPS and then validated against the central key servers via HKP. You’ll need outbound rules for those ports too per the docs

And now you should be able to lock down your servers a bit more.

Faster feedback by limiting information frequency

Code coverage is one of those wacky metrics that straddles the line between useful and vanity. On one hand, it gives you an idea of how safely you can make changes, but on the other it can be a complete fake-out depending on how the tests are constructed. And it can slow your build down.

A lot.

I suspect a lot of our pain is self-induced, but our Laravel application’s ‘build and package’ job jumps from under 2 minutes to around 15 once we turn on code coverage. Ouch.

So I came up with a compromise position in the build in that the tests always run, but the coverage only gets calculated every 15th build (around once a day). Here is what the relevant task for that job now looks like.

# hack around jenkins doing -xe
#set +e
 
mkdir -p jenkins/phpunit
mkdir -p jenkins/phpunit/clover
 
# run coverage only every 15 builds
if [ $(($BUILD_ID%15)) -eq 0 ]; then
  phpunit --log-junit jenkins/phpunit/junit.xml --coverage-clover jenkins/phpunit/clover.xml --coverage-html jenkins/phpunit/clover
else
  phpunit --log-junit jenkins/phpunit/junit.xml
fi
 
# hack around presently busted test
#exit 0

Some things of note;

  • The commented out bits at the beginning and end allow me to force a clean build if I really, really want one
  • My servers are all Ubuntu so use ‘dash’ as its shell which forces slightly different syntax which my fingers never get right the first time
  • I don’t delete the coverage log so the later ‘publish’ action doesn’t fall down. It just republishes that again
  • As we hire more people and the frequency of things landing in the repo increases, I’ll likely increase the spread from 15 to something higher
  • At some point we’ll spend the time to look at why the tests are so slow, but not now.

Using Puppet to manage AWS agents (on Ubuntu)

One of the first thing any cloud-ification and/or devops-ification project needs to do is figure out how they are going to manage their assets. In my case, I use puppet.

AWS is starting to do more intensive integrations into things using agents that sits in your environment. This is a good, if not great, thing. Except if you want to, oh, you know, control what is installed and how in said environment.

Now, it would be extremely nice if AWS took the approach of Puppet Labs and host a package repository which would mean that one could do this in a manifest to install the Code Deploy Agent.

  package { 'codedeploy-agent':
    ensure => latest,
  }
 
  service { 'codedeploy-agent':
    ensure  => running,
    enable  => true,
    require => Package[ 'codedeploy-agent' ],
  }

Nothing is ever that easy of course. If I was using RedHat or AWSLinux I could just use the source attribute of the package type such as below to get around the lack of repository but I’m using Ubuntu.

  package { 'codedeploy-agent':
    ensure   => present,
    source   => "https://s3.amazonaws.com/aws-codedeploy-us-east-1/latest/codedeploy-agent.noarch.rpm",
    provider => rpm,
  }

So down the rabbit hole I go…

First, I needed a local repository which I setup via the puppet-reprepro module. Which worked well — except for the GPG part. What. A. Pain.

After that, I cracked the install script and fetched the .deb file to install…

$ aws s3 cp s3://aws-codedeploy-us-west-2/latest/VERSION . --region us-west-2
download: s3://aws-codedeploy-us-west-2/latest/VERSION to ./VERSION
$ cat VERSION
{"rpm":"releases/codedeploy-agent-1.0-1.751.noarch.rpm","deb":"releases/codedeploy-agent_1.0-1.751_all.deb"}
$ aws s3 cp s3://aws-codedeploy-us-west-2/releases/codedeploy-agent_1.0-1.751_all.deb . --region us-west-2
download: s3://aws-codedeploy-us-west-2/releases/codedeploy-agent_1.0-1.751_all.deb to ./codedeploy-agent_1.0-1.751_all.deb

…and dropped it into the directory the repo slurps files from.

Aaaannnnnd, nothing.

Turns out that the .deb AWS provides doesn’t provide an optional trait in its control file. But reprepro wants it to be mandatory. No problem.

$ mkdir contents
$ cd contents/
$ dpkg-deb -x ../codedeploy-agent_1.0-1.751_all.deb .
$ dpkg-deb -e ../codedeploy-agent_1.0-1.751_all.deb ./DEBIAN
$ grep Priority DEBIAN/control
$

Alright. Add in our line.

$ grep Priority DEBIAN/control
Priority: Optional
$

And now to package it all back up

$ dpkg-deb -b . ../codedeploy-agent_1.0-1.751_all.deb
dpkg-deb: building package 'codedeploy-agent' in '../codedeploy-agent_1.0-1.751_all.deb'.

Ta-da! The package is now able to be hosted by a local repository and installed through the standard package type.

But we’re not through yet. AWS wants to check daily to update the package. Sounds good ‘in theory’, but I want to control when packages are updated. Necessitating

  cron { 'codedeploy-agent-update':
    ensure  => absent
  }

Now we’re actually in control.

A few final comments;

  • It’d be nice if AWS would provide a repository to install their agents via apt — so I can selfishly stop managing a repo
  • It’d be nice if the Code Deploy agent had the Priority line in the control file — so I can selfishly stop hacking the .deb myself. The Inspector team’s package does…
  • It’d be nice if AWS didn’t install update scripts for their agents
  • The install script for Code Deploy and Inspector is remarkably different. The teams should talk to each other.
  • The naming convention of the packages for Code Deploy and Inspector are different. The teams should talk to each other.

(Whinging aside, I really do like Code Deploy. And Inspector looks pretty cool too.)