The secret to killer location queries? Meet the Geohash

Discovering the way our planet has been divided up to make your location queries as fast as lightning.

Over the last few weeks I’ve been learning all about how to write efficient location queries on Firebase’s Firestore database, for my iOS app. And whilst this article specifically talks about Firebase and shows example of Swift code, the concepts here are relevant across most types of databases and coding languages.

2020’s seemingly endless lockdowns and curfews were good for one thing… the time to dig into some long-overdue personal projects. For me, it is has been the design and development of Rally — a new social community for LGBTQ sports players.

Among the many challenges that a social app can pose, location queries are one of the big ones. Not only is the maths hard to get your head around, but depending on the backend framework you are using, it’s very possible that it what you want to do just isn’t supported very well.

This is exactly the case with Firebase’s databases (both Firestore and its Realtime Database).

You see, Firebase don’t offer any “native” location searching capabilities. Whilst they effortlessly provide for 99% of our needs (as software developers), it seems that location queries just didn’t make it into their to-do list.

Their best solution: Geofire

Why Geofire just doesn’t hit the mark

Disclaimer: No disrespect to the developers of Geofire — I love you and appreciate your work. It’s just we need to do some heavy lifting here.

The team at Firebase propose that we integrate Geofire into our workflow. Which works by doing the following:

  • every time you want to save a location (let’s say, the user’s current location), I tell Geofire to save the data to a collection on the server.
  • and whenever I want to get a list of the nearest users to a specific location, I ask it to query the same collection for all the users who are within a certain radius of my chosen location.

Sounds great. But here’s the problem. This returns ALL THE USERS within that radius! That’s not what I want at all. I want to be able to add extra where clauses and return just a specific subset of users who are nearby. I want users who have similar interests, and who have logged in to the platform within the last 30 days, and who have a profile photo, and who have agreed to be shown in the app to people that they don’t know. If I use Geofire, I need to pull EVERY USER within my radius, and then filter out the ones that don’t meet my requirements. That could mean pulling thousands of users, with only a small handful that are relevant.

What I want is to write my usual compound queries — with multiple where clauses — but adding to it a simple where clause for the location part.

That’s not too much to ask, is it?

Why not just query directly on the coordinates…

OK, good question. So, I’m currently sat here in Barcelona, soaking up the wonderful Catalan lifestyle, at coordinates 41.387, 2.1850.

This could be me. Except I’m in a basement office, writing about location queries.

If I were using the Rally app, I’d want to see all the people that are also using the app inside a radius of 10km from my location(which more or less covers the city).

10km is a standard radius when querying for “local” objects

As long as I can estimate the coordinate range, I should be able to do a simple query on the latitude and longitude values for my location:

let barcelonaLat = 41.387
let barcelonaLong = 2.1850
let minLat = barcelonaLat - 0.1 // approx 10km south
let maxLat = barcelonaLat + 0.1 // approx 10km north
let minLong = barcelonaLong - 0.1 // approx 10km west
let maxLong = barcelonaLong + 0.1 // approx 10km east
Firestore.firestore().collection("users")
.whereField("latitude", isGreaterThan: minLat)
.whereField("latitude", isLessThan: maxLat)
.whereField("longitude", isGreaterThan: minLong)
.whereField("longitude", isLessThan: maxLong)
.getDocuments()

But alas 😕 Firebase won’t allow me to perform a greaterThan or lessThan query on different fields (in this case, I’m trying to do both “>” and “<” on both “latitude” and “longitude”). I’m only allowed to use those operators on one single field per query. And in actual fact, I need to save that type of comparison for my “where lastSeenDate is within 30 days” clause.

Also the +0.1 and -0.1 modifiers for the latitude and longitude are very inaccurate and require some hardcore maths to get right.

I need to find a better solution. Maybe if I dig into Geofire and see how they do it, I can get some more ideas.

Welcome to the wonderful world of the “Geohash”

Taking a look at the Geofire code, we can see that alongside each set of coordinates, it also stores a “geohash”.

Sagrada Familia    
Latitude: 41.4036
Longitude: 2.1744
Geohash: sp3e93r37nqn

You can think of a geohash as an “increasingly more precise address for any location on earth”.

Image that we divide the earth up into 16 buckets — something like this:

The bucket proportions on this map are all wrong, but give me a break, ok? 🙃

Now, let’s zoom in on the “s” bucket, and we can see that it divides up into 16 more buckets.

Closing in on the Eastern Mediterranean, we have the “sp” bucket…

… followed by the “sp3" bucket covering most of Catalunya…

…which gives us the “sp3e” bucket for Barcelona.

I live right there! I know, lucky me.

So this is pretty cool! If I’m right, we could definitely make use of this for our location queries.

For example, if I want to get a list of users within 10km of the Sagrada Familia, all I need to do is look for any users that fall inside the “sp3e” bucket. I can then manually remove any which are more than 10km from the location.

Firestore.firestore().collection("users")
.whereField("geohash", equalTo: "sp3e")

For this to work, I’ll need to save my geohashes with exactly the same precision (i.e. number of characters) as I am querying them, otherwise Firebase won’t recognise the match. So I commit to saving all my geohashes with 4 characters (for now…)

Wait a second. It can’t be that easy, can it?

Well, no. It isn’t that easy.

You see, the inherent problem with these buckets is that they divide the land up into… well… buckets. Whilst that might work fine a lot of the time, eventually we’re going to find a user who is located right next to the divide between two buckets.

Like my good friend, Queen Elizabeth II, who I’m sure will be a Rally user when it eventually gets launched. Let’s imagine the Queen for a moment, looking for other LGBTQ people to play sports with, from the comfort of her home in Buckingham Palace, London — right on the edge of bucket “gcpu”.

If I only show users who are in bucket “gcpu”, she’s only going to see users located in south London. But that’s only half the story! So to fix this we clearly need to fetch the users in the “gcpv” bucket as well.

To be able to do this programatically, we need to be able to identify all the buckets which fall within the radius I am interested in, and then fetch all the users that live inside all those buckets. Right?

We can find these buckets by using one of the many libraries that exist for geohashing (I used the awesome Geohashkit by Maxim Veksler for Swift). Here, I can give it a location (either as latitude and longitude coordinates, or directly as a geohash) and the library will tell me the 8 buckets which surround the one I am in (i.e. their neighbouring buckets).

Buckingham 
Palace: gcpu
Neighbours: gcpt, gcpv, u10j
gcps, u10h
gcpe, gcpg, u105

OK so now all I need to do is to call my query nine times, once for each bucket, and I’ll get all the users within my radius of Buckingham Palace.

Wait. NINE TIMES!!! That’s going to kill my app performance. And we all know what that means for monthly bill from Firebase. And anyway, the area that those 9 buckets cover is way too big for what I want. I only need it cover those 2 buckets marked by the circle with radius 10km.

What I need is to harness the neighbours algorithm to identify when my user is located on the border between two buckets, but instead of returning 9 different buckets, I just need the 2 that are on either side of the dividing line. In my example above, I only really want to query the database twice: once for above the line (“gcpv”) and once for below the line (“gcpu”).

Hacking the neighbours algorithm

Ideally what I want is to identify where my radius crosses over into another bucket. And we can do this by zooming down one more level, and finding the neighbouring buckets for the smaller bucket (in this case — “gcpuu”).

Let’s ask Geohashkit for the neighbours to that smaller bucket:

Buckingham 
Palace: gcpuu
Neighbours: gcpv5, gcpvh, gcpvj
gcpug, gcpuv
gcpue, gcpus, gcput

And now let’s remove the last character from each result (which will tell us which bucket each one belongs inside).

Parent 
buckets: gcpv, gcpv, gcpv
gcpu, gcpu
gcpu, gcpu, gcpu
Deduplicated: gcpv, gcpu

This is it! We’ve found the two buckets which cover the entire range of query space. Now we can get all of the users near to Buckingham Palace with just two queries. You’re in luck, Queen Elizabeth.

So, does this work where 4 buckets meet at a point?

Sure it does!

I mean, this place looks pretty cool, not gonna lie.

Deep in the heart of Wisconsin, USA, we can find the lovely town of Poniatowski, which sits right between 4 of the world’s top level (i.e. biggest) buckets: c, f, 9 and d.

Let’s zoom in on the town, and see what we get. I’ve marked on the 4 buckets that meet just outside of the town, and the 10km radius of users we are interested in.

So we know in advance we will need to query 4 buckets this time. Will our algorithm return the correct results???

Poniatowski, Wiscosin:   Coordinates: 45.0329, -90.0740
Geohash: dpbpbph4ekz6
Central bucket: dpbpb
Neighbours: cbpbp, f0000, f0001
9zzzz, dpbpc
9zzzx, dpbp8, dpbp9
Parent buckets: cbpb, f000, f000
9zzz, dpbp
9zzz, dpbp, dpbp
Deduplicated: dpbp, cbpb, f000, 9zzz

There you have it. 4 buckets to fetch, and we get all the users within the 10km radius.

Perfect. 😊

What if we want to change the size of our catchment radius?

So up until now, I’ve been exclusively working with a 4-character geohash (i.e. “sp3e” for Barcelona), which is an area approximately 20km². This is a convenient size because it is more-or-less about the size of a small town and probably about the max distance that a user would want to be from someone they might one day want to meet in person.

But here’s the thing: Imagine my (probably quite lonely) user from Poniatowski, Wisconsin. There’s probably not any other users within 20km² of their location. And I don’t want them to have nobody to hang out with. I’m going to need to re-fetch the query results, but this time stepping up to the parent buckets to catch a wider-area. These 3-character hashes cover an area of approximately 80km² each, so for sure we’ll get some users this time. In the case of Poniatowski, Wisconsin, we would need to fetch for cbp, f00, 9zz and dbp.

But, my Firebase query isn’t going to work anymore. Annoyingly, I’ve saved all my geohashes as 4-character hashes, meaning this won’t return any results. And before you suggest it, Firebase don’t offer a “startsWith” function (instead it asks you to use greaterThan and lessThan on the text, which we have already ruled out as an option above)

Firestore.firestore().collection("users")
.whereField("geohash", equalTo: "dpb") // no results 😕

So, whilst it isn’t the most elegant solution, the only option we are left with is to store each geohash bucket separately. Which to be honest isn’t so bad. I’m not really interested in anything more accurate than 4 characters, or anything bigger than 2. So we just need to add 3 searchable geohashes and one exact one (for our algorithm to use).

users
name: Jack
latitude: 45.00001
longitude: -89.9991
geohashExact: dpbpbph4ekz6
geohash4: dpbp
geohash3: dpb
geohash2: dp
Firestore.firestore().collection("users")
.whereField("geohash3", equalTo: "dpb")

In fact, the code we use for Rally takes it a step further and uses some very handy recursion to keep searching at larger and larger buckets until it finds some data to return. We start by querying the neighbouring 4-character buckets, and if we don’t get more than 20 users returned, we’ll then query the neighbouring 3-character buckets. Then again the 2-character buckets if necessary. We could query the 1-character buckets, but these are so large that it is actually quite unlikely we’ll encounter any useful data — these buckets are almost continental in size.

And of course, the whole reason for saving these geoHashes as direct, equatable strings against each user (or landmark, or restaurant, or event) is so that we can write lovely large compound queries like this:

let buckets = neighbouringBuckets("dpbpbph4ekz6", radiusKM: 70)
for bucket in buckets {
Firestore.firestore().collection("events")
.whereField("geohash3", equalTo: bucket)
.whereField("startDate", greatherThan: today)
.whereField("isPublished", equalTo: true)
.whereField("sports", in: ["football", "american football"])
.whereField("organiserID", isNotEqualTo: currentUserID)
.whereField("isDeleted", equalTo: false)
.getDocuments()
}

Give me a nudge you if want to see the full code, and if enough people are interested I’ll publish it for everyone.

Thanks for reading, and if you’re interested in exploring geohashes a little more, have a play with this mapping tool. Comment the best hashes you can find…

CEO at Gorilla Arm - Mobile App Hotshots

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store