Eloquent Pagination and Map — not the bestest of friends

Eloquent can make things really easy. But sometimes that thing is shooting yourself in the foot with a performance problem.

We recently released an API endpoint to a customer that was paginated. The first version of the code looked something like this.

return $model
    ->relationship
    ->map(function ($relation) {
        // tonne of heavy lifting
    })
    ->paginate();

It is nothing crazy — just manipulate the objects in the relationship and send it on its way back to the caller. Tests all passed and things were great.

Until it was deployed into production and it took almost 3 minutes to run.

What was happening was that it was doing the ‘tonne of heavy lifting’ against all 3000 models and then returning the page that was requested. That’s not quite efficient.

The solution is to flip things around and paginate first and then the map.

$items = $model
    ->relationship
    ->paginate();

$mapped = $items
    ->getCollection()
    ->map(function ($relation) {
        // a tonne of heavy lifting
    });

$items->setCollection($mapped);

return $items;

By mapping only the things in the requested pagination page, the time dropped by around 85%.

Needless to say, we’ve stopped using the map-then-paginate pattern in our application … and you likely should as well.

Laravel News Catchup

Next up in my ‘eventually I’ll catch up on email’ queue is Laravel News, which anyone who deals with Laravel should be signed up to. Now, this folder has 180 things in it … but most were already read so I’m not going to read them again, but since Halloween, this is what I find interesting…

  • A bit of a deep dive into how email validation works and can be extended
  • Laravel Meta Fields lets you attach random amounts of random metadata to models. The last two applications I’ve been responsible for didn’t do this in nearly as elegant a manner
  • Laravel Log Route Statistics seems like it could be an interesting way to either determine what parts of your application has nice test coverage and / or use production data to guide analysis and refactorings. But, it also logs to the database which could get very noisy in a large scale application.
  • Laravel Request Logger is from the same guy and is interesting, with the same caveats I suspect.
  • I’m rebuilding VMs to two companies right now and part of that will be integrating Horizon. How to get notified when Laravel Horizon stops running seems like a useful thing to keep in mind, though I’m not sure I’m keen on it being an artisan command. And things will be running in a cluster, sooo, yay, more things to worry about?
  • Hiding Sensitive Arguments in PHP Stack Traces is always a handy thing to keep in mind. Doing this properly means you can de-scope your log files from things like Right To Delete and such
  • NPS gives me hives, but Laravel NPS seems like a straight-forward way of requesting and storing it
  • I have to think about our pagination strategy over the next couple weeks, so Efficient Pagination Using Deferred Joins is rather timely

ArchTech Newsletter

On the upside, my mail is getting nicely sorted. On the downside, its now sorted /and/ neglected. So this is one of, hopefully, many posts where I catch up on things. First up, is the ArchTech Newsletter which has some hightlights from their weekly twitter thread to your inbox, plus some other bits around products and packages they are working on. Subscribe here.

  • LazilyRefreshDatabase looks like a nice cleanup for tests.
  • Sidecar has a tonne of potential. It feels like it farms your queue workers to the AWS Lambda — and so in any language it supports
  • The Road to PHP: Static Analysis is an email drip course, on, well, status analysis
  • Laravel 8.x and newer projects shouldn’t be using Guzzle (or heaven forbid, curl) but should be using the built-in HTTP client. Getting it to throw errors is a useful thing to know how to do.
  • Laravel SEO looks like it could reduce some code on some of my projects.
  • Mail Intercept lets you test mail in Laravel by, well, intercepting it rather than Faking it. I think I like this concept. We’ll be testing a tonne of mail for i18n reasons by the end of February so might make use of this.
  • I don’t think you should ever be storing sensitive data as properties of jobs, but if you insist, then you should be using ShouldBeEncrypted on those jobs

Experimenting where to put MySQL

At MobileXCo, our ‘app’ consists of 5 Laravel apps plus their supporting services like MySQL, Redis, ElasticSearch and we have them all in a single Vagrant instance. VMs aren’t ‘cool’ anymore, but our production infrastructure is VM based (EC2) and onboarding developers is pretty easy as its just ‘vagrant up’.

That said, as an experiment I moved where MySQL lived a couple places to see if I could simplify the VM by not having it in there. After all, we use RDS to host our database in production, so why not externalize that from development as well?

First, the baseline of having everything in the VM

Time: 19.69 seconds, Memory: 94.25 MB 

This maths out to about .75s per test. Not too too shabby.

Next up was reaching out of the VM to the Host (macOS 10.15.4).

Time: 28.1 seconds, Memory: 94.25 MB

Which is 1.08s per test which is on the border of consideration for running the server on the Host — if setup wasn’t as trivial as it is in the VM. (Which is fully configured via the Puppet Provisioner using the same scripts as production. Well, with a couple minor tweaks through environment checks.)

Lastly, I spent an couple hours teaching myself about docker-compose and ran MySQL in a container using the mysql:5.7 image. (And if I’m am going to be honest, this really was an excuse to do said learning.) Port 33306 on the Host forwarded to port 3306 on the Container so really this is Guest -> Host -> Container, but

Time: 1.02 minutes, Memory: 94.25 MB

That’s … erm, not awesome at 2.38s.

This can’t be a unique configuration and I find it hard to believe that such a performance discrepancy would not have been addressed which makes me think there is some networking tuning options I don’t know about. If anyone has any ideas on how to tweak things, let me know and I’ll re-run the experiment.

Secure Node Registration in Selenium Grid 4

This is another in the small series of ‘things that have changed in Selenium Grid but I have not yet added to the official docs’ posts.

When Selenium Grid was first created, the expectation was that your Grid Hub was nicely secured inside your network, along with your Nodes so everything should be trusted to communicate. But now that we live in a cloud environment, that assumption isn’t quite as tight as it once was. You could have everything tightly locked down in your AWS account, but if someone gets their access key comprimised who can make instances, well, it’s a problem. I know if I was a bad guy, I would be scanning for open Grid Hubs and then figure out how to register with them. There is a wealth of information to be had; competitive intelligence on new features not available in the wild, account details for production testing, etc.

Last night I pushed a change that prevents rouge Grid Nodes from registering in your Grid Hub (so it will be available in the next alpha or you can build it yourself now). I don’t know if this has ever happened in the wild, but the fact that it could is enough that it needed to be closed down.

In order to secure Node registration, you need to supply the new argument --registration-secret in a couple different places. If the secrets do not match, the Node is not registered. This secret should be treated like any other password in your infrastructure, which is to say, not checked into a repo or other practice. Instead it should be kept in something like Hashicorp Vault or AWS Secrets Manager and only accessed (via automated means) when needed.

Standalone Server

When running your Hub as a single instance, there is only one process so only one place that needs the secret handed to it

  java -jar selenium.jar \
       hub \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       --registration-secret cheese

Distributed Server

When running your Hub in a distributed configuration, the Distributor and Router servers need to have it.

  java -jar selenium.jar \
       distributor \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       -s https://sessions.grid.com:5556 \
       --registration-secret cheese \
  java -jar selenium.jar \
       router \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       -s https://sessions.grid.com:5556 \
       -d https://distributor.grid.com:5553 \
       --registration-secret cheese

Node

Regardless of your approach to running the Server, the Node needs it too. (Obviously.)

  java -jar selenium.jar \
       node \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       --detect-drivers \
       --registration-secret cheese

Detection

When a Node fails to register, two things happen;

  1. A log entry is created at an ERROR level saying a Node did not register correctly. Your Selenium infrastructure needs the same attention to its logging as your production infrastructure. So this should trip an alert to someone in whatever manner it is this would happen for a potential security problem in any other environment.
  2. An event is dropped onto the bus. The Selenium Server shipped with a 0mq bus built-in, but when deploying in a real environment I would suggest using it with something like AWS SQS (or your cloud’s equivilant) as your queuing system which you and then have something like AWS Lambda watch for these events and trigger actions accordingly.

It should be noted further that these are all on the side of the Server, not the Node. The rouge Node is not given any indication that secrets are configured or that the secret they sent was incorrect.

I was on the ‘Test Guild Automation Podcast’

Woke up this morning to a note from Joe that my episode of the Test Guild Automation Podcast is now live. It should come as no surprise that I talk about Selenium infrastructure with him. I felt like I was rambling, but pretty sure Joe kept pulling me back on topic — but since I don’t like how my voice sounds on recordings, I’ll have to you let me know how it turned out.

Secure Communications with Selenium Grid 4

For the last couple years, my schtick has been that I don’t care about your scripts, just your infrastructure. I’m pretty sure in my talk at SeConf London I mused that it was bonkers that we had got away communicating to the Se Server via HTTP. (I have to deal with vendor audits at work and they get really antsy at any mention of HTTP.) At SeConf Chicago I crashed the Se Grid workshop and asked (knowingly) if I was correct that communication was only via HTTP hoping someone would fix it for me. Alas, no one took the bait so at SeConf in London, I was describing the problem to Simon who happened to be creating a ticket (actually, a couple) as I talked, and then I got an alert saying it was assigned to me. The squeaky wheel applies its own grease it seems.

There are a couple catch-22’s in place before I can update the official Selenium Documentation (have you seen the new doc site? It’s great!), so in lieu of that, here is a quick how-to on something that will be in Selenium 4 Alpha 2 (or now if you build it yourself).

What is below is the output of ‘info security’ on the new server. (The ‘info’ command is also new and as of yet undocumented.)


Selenium Grid by default communicates over HTTP. This is fine for a lot of use cases, especially if everything is contained within the firewall and against test sites with testing data. However, if your server is exposed to the Internet or is being used in environments with production data (or that which has PII) then you should secure it.

Standalone

In order to run the server using HTTPS instead of HTTP you need to start it with the --https-private-key and --https-certificate flags to provide it the certificate and private key (as a PKCS8 file).

  java -jar selenium.jar \
       hub \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem

Distributed

Alternatively, if you are starting things individually you would also specify HTTPS when telling where to find things.

  java -jar selenium.jar \
       sessions \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem
  java -jar selenium.jar \
       distributor \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       -s https://sessions.grid.com:5556
  java -jar selenium.jar \
       router \
       --https-private-key /path/to/key.pkcs8 \
       --https-certificate /path/to/cert.pem \
       -s https://sessions.grid.com:5556 \
       -d https://distributor.grid.com:5553 \

Certificates

The Selenium Grid will not operate with self-signed certificates, as a result you will need to have some provisioned to you from a Certificate Authority of some sort. For experimentation purposes you can use MiniCA to create and sign your certificates.

  minica --domains sessions.grid.com,distributor.grid.com,router.grid.com

This will create minica.pem and minica.key in the current directory as well as cert.pem and key.pem in a directory sessions.grid.com which will have both distributor.grid.com and router.grid.com as alternative names. Because Selenium Grid requires the key to be in PKCS8, you have to convert it.

  openssl pkcs8 \
    -in sessions.grid.com/key.pem \
    -topk8 \
    -out sessions.grid.com/key.pkcs8 \
    -nocrypt

And since we are using a non-standard CA, we have to teach Java about it. To do that you add it to the cacert truststore which is by default, $JAVA_HOME/jre/lib/security/cacerts

  sudo keytool \
      -import \
      -file /path/to/minica.pem \
      -alias minica \
      -keystore $JAVA_HOME/jre/lib/security/cacerts \
      -storepass changeit \
      -cacerts

Clients

None of the official clients have been updated yet to support this, but if you are using a CA that the system knows about you can just use an HTTPS Command Executor and everything will work. If you are using a non-standard one (like MiniCA) you probably will have to just through a hoop or two similar to here in Python which basically says “Yes, yes, I know you don’t know about the CA but I do so just continue along anyways.”

from selenium import webdriver

import urllib3
urllib3.disable_warnings()

options = webdriver.FirefoxOptions()
driver = webdriver.Remote(
    command_executor='https://router.grid.com:4444',
    options = options
)

driver.close()

Scrum is an anti-pattern for Continuous Delivery

I’ve been saying that ‘Scrum is an anti-pattern for Continuous Delivery’ for awhile, including last week’s post which got a ‘huh?’ so here is my beef with Scrum.

Actually my complaint isn’t with Scrum itself, but with Sprints and if you remove those then the whole house of cards fall down. (This is similar to my stance on Java, which I do dislike, but I loathe Eclipse so Java is tolerable in something other than Eclipse. Barely.)

The whole point of Continuous Delivery, to me, is to ‘deliver’ improvements to whatever it is you do, to your customers ‘continuously.’ Where continuously means, well, continuously. Not ‘at the end of an arbitrary time people which usually is about 2 – 3 weeks in length.’ This is why ‘Mean Time To Production’ is such an important metric to me and drives all other changes to the delivery pipeline.

“But Adam, how will we plan what we do if we don’t get to play Planning Poker every week?” Easy. Your customers will tell you. And that ‘customer’ could be internal. If something is important, you will know. If something is more important than something else, then it will bump that down the queue. This isn’t to say discussing things and figuring out how to slice them into smaller and smaller units isn’t necessary. It absolutely is. And learning how to do this is perhaps one of the hardest problems in software. Which leads to…

“But Adam, this is a big story that will take the full sprint to complete.” Slice it smaller, hide the work-in-progress behind feature flags and still push your local changes daily. (You should be using them anyways to separate feature launch from availability.)

“But Adam, we could deploy at any point — we just do it once a sprint.” Why? You are actively doing a disservice to your customers and your company by holding back things that could improve their experience and make you more money. Disclaimer: this becomes a more real argument when deploying to IoT or other hardware. I don’t want my thermostat to get updated 20 times a day. But if the vendor could do it, I’ll accept that.

“But Adam, we are in a regulated environment and have to do Scrum.” That’s a strawman argument against working with your auditors. See Dave’s recent Continuous Compliance article.

“But Adam, how will we know if we are getting better at estimating?” The same way you do with Scrum or anything else, which is to collect data. This is a bulk food type problem. When you go to buy say peanut butter from the bulk food store, you take in your container and they weigh it before you scoop your peanut-y deliciousness into it, and after. They then do the math to know the weight of just the peanut button. Same thing can be done here. If you know how long your deploys take, you can do math to say between time code was started to the time it was available in production. And then remove the fixed time of deployments to get the actual length of time something took. In its entirety, not just ‘in development’. (I don’t actually track this metric right now. Things take the length of time they take. But I think this is sound theory.)

“But Adam, where do all our manual testers fit in this world?” They are just part of the process. This is a key difference between Continuous Deployment and Continuous Delivery. If your process says humans touch it, then humans touch it. But there also needs to be a way to short-circuit around them in the case of an emergency.

“But Adam, our database is so archaic and fragile that deployments are a huge risks and sprints minimize that.” That’s a good place to start to change things. A local company still does releases weekly overnight on Wednesdays after 5 years because of this. I’m pretty sure it stopped being a tech problem and well into a people problem a couple years ago.

So if not Scrum, then what? The ‘easy’ answer is Kanban. The harder answer is of course ‘it depends’ and likely looks like a tailored version of Kanban to solves your team’s problems. I really like the notion of a work item flowing across a board, but also dislike enforcing WIP limits and the artificial moving of things left to make room for something else because the tooling requires it.

Let me know what other “But Adam’s” I missed in the comments.

Oh, I’ve got one more.

“But Adam, that is hard.” Yes. Yes it is. (It’s also super fun.)

‘So what would you do?’

Another ‘free consulting is content’ post. The context here is a 10 year old company a friend of mine is the VP of Engineering at whose delivery pipeline worked … but there were some horrible manual steps (as compared to manually pushing a button steps which are perfectly acceptable, if not desirable) and the things were too custom and black box-y. Oh, and the deploy from CircleCI was just flat out broken right now. The gist of the conversation was ‘if you helped us out, what would it look like.’

What’s interesting is that this, and other conversations like this that I have had in the last month have really distilled my thoughts around pipelines which leads to a playbook of sorts, but that’s beyond scope of this. Aside from this looking a lot like what the playbook looks like.

Anyhow, here is the ‘only slightly edited’ bit of free consulting I gave.

  1. Check that things that should already be done are done

Root account has a hardward MFA token that is somewhere secure, CloudTrail is enabled and has the fun Lambda script to auto re-enable is disabled, deletion protection turned on, etc.

  1. CodeDeploy

Since deploying from CircleCI is busted anyways, get it producing CodeDeploy packages and manually install the agent on all the boxes

  1. Packerize all images

Standardize on a Linux distro (another other than Amazon Linux 2 is silly). Create base AMIs with CodeBuild triggered off of Github webhooks to the $company-Packer repo. Again, doesnt matter which configuration management tool Packer uses — as long as they can justify the choice. And as I mentioned, AWS has given a credible reason to use Ansible with the integration of running playbooks from System Manager.

  1. Replace CircleCI with CodePipeline (orchestration) and CodeBuild (build, test and package) — since deploy is already done via CodeDeploy
  2. Feature Flags

Managed via an admin screen into the database (not file-based) to dark launch features to cohorts and/or percentages before full availability.

  1. Airplane Development

‘Can you do development on an airplane without internet access’ — so no shared databases, needing to reach out to the internet for js or fonts or icons, etc. Look at developer onboarding at this point too. Vagrant is great. Docker is the hotness. But Vagrant means you can literally have the same configuration locally as you do in production. Docker can too of course if you are going Fargate/ECS.

  1. Health

Monitoring (all the layers, reactive and proactive — all but one of my major outages could have been predicted if I was watching the right things), Logging (centralized, retention), Paging (when and why and fix the broken windows), Testing-in-Production (it’s the only environment that counts), Health Radiator (there should be a big screen with health indicators, but system and business in your area), etc.

  1. Autoscaling

Up and Down, at all the layers. Driven by monitoring and logging.

  1. Bring everything under Terraform control

Yes, only at this point. It ‘works’ now — just not the way you want it to. Everything above doesnt ‘work’. Again, I’d use Terraform over CloudFormation, but for ‘all in on aws’ CloudFormation is certainly an option. Now if only CloudFormation was considerd a first class citizen inside AWS and supported new features before competitors like Terraform does. CloudFormation still doesn’t have Route 53 Delegation Sets the last time I checked.

  1. Disaster Recovery

‘Can you take the last backup and your Terraform scripts and light up $company in a net new aws account and be able to down a maximum of how long it takes to copy RDS snapshots or lose only data from last in-flight backup.’

  1. Move to Aurora

Just because I like the idea of having the ability to have the database trigger Lambda functions

  1. Observability

Slightly different than Health — basically I would use Honeycomb because Charity, etc. are far too smart.

  1. Chaos Engineering

Self healing, multi-region, etc. If Facebook can cut the power to their London datacenter and no one notices, $company can do something less dramatic with equal effect.

And then it’s ‘just’ keeping the ship sailing the way you want, making slight corrections in the course along the way.

We need a priest (QA) to bless (test) all our work

A friend of mine pinged me during his commute this morning about my thoughts on weening a team off of thinking they need ‘a priest (QA) to bless (test) all our work’. ‘Free’ consulting means it gets to be content. :D

Obviously, this is a ‘people problem’. So the approach will vary place-to-place and even within the place. Regardless though, need to start by expunging ‘QA as Quality Assurance’ from the organization. They don’t actually ‘Assure’ anything. You, or a half dozen other people could override. So ‘Quality Assistance’ is a nicer reframing. Or better still, ‘Testing’.

Then you need to play detective and find out what the inciting event was that caused a) the first ‘QA’ person to be hired, and b) how they got anoited as priests. Smooth transisition away from that requires you know those two things.

Organizationally, I would be interested in;

  • how many things are found by the testers
  • what the categorization is (because those are developer blind spots)
  • how many things that are found actually hold up the build until fixed
  • and of those, how many could have shipped

From a purely technical perspective, some practices that address this;

  • dark launches via feature flags and have new stuff rolled out slowly to user slices
  • acknowledge that production is different than any other environment and is the only environment that matters. To quote Charity; ‘I test in production, and so do you.’
  • the only metric that matters in today’s world is ‘mean time to production’. Something isn’t ‘done’ unless it is in production being used by the target customer. Everything you want to do hinges on that. Put on your wall a whiteboard with ‘number of deploys today’, ‘number of deploys this week’, ‘number of deploys this month’ which you increment each time it goes to production
  • if you think your feature stories are small enough, you need to slice it more
  • not to overload the term, but increase the observability of the application in the more traditional way not honeycomb way. If you are pushing to production fast and often, you need to know its behaving or not fast and often. Number of logins per 5 minutes, number of registrations per 5 minutes, number of searches per 5 minutes, etc. Every new feature / fix needs to have a measure to know if it is working. (It will take a long time to get to here.)
  • move to trunk based development. Everyone should be pushing code at least once every 2 days. Feature branches allow people to get sloppy.
  • Obviously, TDD is huge in this. (or TAD — I don’t care, just slow down and write some damn tests before committing)
  • Steal from Etsy’s playbook and have your pipeline such that day 1 at <redacted> is pushing to production, and day 2 is paperwork / onboarding. Forces you to get your development environment in shape so you can onboard someone from bare machine to productive in an hour and also it breaks the feeling of sanctity around production and creates shared ownership. I believe everyone at etsy did this, not just developers. (Though obviously non-developers had a borrowed environment and were hand-held)

MTTP reduction is the whole thing purpose of building out a Continuous Delivery pipeline. ‘QA Priest’ doesn’t fit time wise for that. (Its also why Scrum is a Continuous Delivery anti-pattern.)

But again, this is a Culture thing. To quote Jerry; ‘things are the way they are because that is the way they got there’ — figure that out and you can change the culture.