Jekyll2024-03-27T14:04:10-05:00https://www.gpxz.io/feed.xmlgpxzBuilding an open global elevation dataset.Managing API keys in django2024-03-10T00:00:00-06:002024-03-10T00:00:00-06:00https://www.gpxz.io/blog/api-keys<ul>
<li>Keys are models with their own properties, not just strings.</li>
<li>Users may have multiple keys.</li>
<li>Deleted keys are retained in the DB.</li>
</ul>
<hr />
<h2 id="users-have-multiple-keys">Users have multiple keys</h2>
<p>At some point, one of your customers is going to want to regenerate a new API key. Maybe the old one was exposed in a breach, or they have a policy of rotating all credentials whenever an employee leaves.</p>
<p>If you only have a single key per user (like <code class="language-plaintext highlighter-rouge">user.api_key</code>) the customer won’t be able to access your API from the time the new key is generated up until they have deployed a new release with those new credentials on their end. Giving customers two keys means they can create a new key, build and deploy a new release while requests with the old key are still served, then delete the old key once they’ve confirmed it’s no longer being used.</p>
<p>Two keys are enough to get you started. But I’d still recommend against hardcoding that number (like <code class="language-plaintext highlighter-rouge">user.api_key_1, user.api_key_2</code>): as your customer-base grows, you’ll eventually get an email from someone who wants more than two keys. Perhaps unique keys for development, staging, and production, plus an extra slot to allow for rotation.</p>
<p>So to begin with, our <code class="language-plaintext highlighter-rouge">APIKey</code> model gets a foreign key <code class="language-plaintext highlighter-rouge">user</code> property (which allows a user to have many keys), an <code class="language-plaintext highlighter-rouge">id</code> separate from the access string (so as not to leak database identifiers), as well as <code class="language-plaintext highlighter-rouge">key_value</code> which will store the actual key used for authenticating with the API.</p>
<p>I recommend using <code class="language-plaintext highlighter-rouge">TimeStampedModel</code> as a base class for all django models, it adds <code class="language-plaintext highlighter-rouge">created</code> and <code class="language-plaintext highlighter-rouge">modified</code> timestamps which are super helpful for debugging and also sorting objects in a meaningful way.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">model_utils.models</span> <span class="kn">import</span> <span class="n">TimeStampedModel</span>
<span class="k">class</span> <span class="nc">APIKey</span><span class="p">(</span><span class="n">TimeStampedModel</span><span class="p">):</span>
<span class="nb">id</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">AutoField</span><span class="p">(</span><span class="n">primary_key</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">,</span> <span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="p">.</span><span class="n">PROTECT</span><span class="p">)</span>
<span class="n">key_value</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">unique</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">ordering</span> <span class="o">=</span> <span class="p">[</span><span class="s">"created"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">key_value</span>
</code></pre></div></div>
<h2 id="key-format">Key format</h2>
<p>A GPXZ API key looks something like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ak_e6YkbUK9Fovx8aErYBPm04
</code></pre></div></div>
<p>It starts with a fixed <code class="language-plaintext highlighter-rouge">ak</code> prefix. This helps keys stand out from other (less important) random strings.</p>
<p>The remainder is a string of random alphanumeric characters. 22 characters gives 130 bits of randomness, which is <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions">generally considered sufficient</a> to be unguessable.</p>
<p>This mathematics assumes keys are case sensitive on validation!</p>
<p>There are many ways to generate random strings in Python, but when dealing with security-critical stuff it’s best to outsource to a library that deals with security. In django there’s <code class="language-plaintext highlighter-rouge">django.utils.crypto.get_random_string</code> for this.</p>
<p>Putting that all together gives something like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">django.utils.crypto</span> <span class="kn">import</span> <span class="n">get_random_string</span>
<span class="k">def</span> <span class="nf">_generate_api_key</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">get_random_string</span><span class="p">(</span><span class="mi">22</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">APIKey</span><span class="p">(</span><span class="n">TimeStampedModel</span><span class="p">):</span>
<span class="n">key_value</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">CharField</span><span class="p">(</span>
<span class="n">max_length</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
<span class="n">unique</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">default</span><span class="o">=</span><span class="n">_generate_api_key</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">...</span>
</code></pre></div></div>
<p>Note that <code class="language-plaintext highlighter-rouge">default=_generate_api_key</code> will fail if added in a migration: the default generator is only evaluated once, tripping the uniqueness constraint. Handling this requires some <a href="https://stackoverflow.com/q/29787853">manual migration management</a>.</p>
<h3 id="aside-user-ids-in-api-keys">Aside: user IDs in API keys</h3>
<p>A GPXZ API key <em>actually</em> looks something like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ak_5uOU_e6YkbUK9Fovx8aErYBPm04
</code></pre></div></div>
<p>The extra <code class="language-plaintext highlighter-rouge">5uOU</code> bit is a fixed random unique ID for each user. Including that in the key gives a few tiny benefits:</p>
<ul>
<li>When authenticating a request, both the user details and the key details can be queried in a single round trip to redis.</li>
<li>When we get a key attached to a Sentry error, figuring out the impacted user is a simple lookup.</li>
</ul>
<p>but also has some drawbacks:</p>
<ul>
<li>Key is longer.</li>
<li>Extra complexity on key generation and authentication, meaning more places for things to go wrong in security-critical code.</li>
<li>You have to be careful not to use any information derived from the user ID until the whole key is authenticated.</li>
</ul>
<p>In hindsight, the benefits are so small that this approach isn’t one I’d recommend.</p>
<h2 id="track-deleted-keys">Track deleted keys</h2>
<p>Customers should be able to delete API keys, for example to prevent access in the case of a breach. But while we do want to prevent those keys from being able to access the application, we don’t want to remove them from the database. Keeping a record of old keys can help debugging historical bugs, and diagnose when customers are using outdated keys.</p>
<p>It also lets us show recently deleted GPXZ API keys, which can help a customer do their own debugging before reaching out to you!</p>
<div>
<br />
<a href="/static/blog/img/deleted-key-dashboard.png"><img src="/static/blog/img/deleted-key-dashboard.png" /></a>
<br />
<br />
</div>
<p>We’ll add a couple new properties to our <code class="language-plaintext highlighter-rouge">APIKey</code> model:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">APIKey</span><span class="p">(</span><span class="n">TimeStampedModel</span><span class="p">):</span>
<span class="n">is_active</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">deactivated_at</span> <span class="o">=</span> <span class="n">models</span><span class="p">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">null</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="p">...</span>
<span class="k">def</span> <span class="nf">save</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="p">.</span><span class="n">is_active</span> <span class="ow">and</span> <span class="ow">not</span> <span class="bp">self</span><span class="p">.</span><span class="n">deactivated_at</span><span class="p">:</span>
<span class="bp">self</span><span class="p">.</span><span class="n">deactivated_at</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">(</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">)</span>
<span class="nb">super</span><span class="p">().</span><span class="n">save</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</code></pre></div></div>
<p>then “deleting” a key looks like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">key</span><span class="p">.</span><span class="n">is_active</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">key</span><span class="p">.</span><span class="n">save</span><span class="p">()</span>
</code></pre></div></div>
<p>and you need to filter active keys only when querying for authentication.</p>
<h2 id="api-keys-arent-just-strings">API keys aren’t just strings</h2>
<p>To start with, the main properties on your API key might be <code class="language-plaintext highlighter-rouge">is_active</code> and <code class="language-plaintext highlighter-rouge">created</code>.</p>
<p>But over time, you’re likely to need further functionality to keys. For example:</p>
<ul>
<li>Permission scopes to limit the access of keys, e.g. a readonly key for testing.</li>
<li>IP / hostname restrictions, e.g. a key that can be used in browser applications that won’t allow cross-origin requests.</li>
<li>Rate limits, e.g. so a test key doesn’t cause quota errors in production.</li>
<li>A name for each key to keep track of all of these keys with different settings!</li>
</ul>
<p>Even if you don’t expose this functionality yet, you should convert the input key string to a full <code class="language-plaintext highlighter-rouge">APIKey</code> model as early as possible.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">api_key_str</span> <span class="o">=</span> <span class="n">find_api_key_str</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="n">api_key</span> <span class="o">=</span> <span class="n">APIKey</span><span class="p">.</span><span class="n">from_key_value</span><span class="p">(</span><span class="n">api_key_str</span><span class="p">)</span>
<span class="n">api_key</span><span class="p">.</span><span class="n">authorise</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
</code></pre></div></div>
<p>This also lets you keep all the key logic in one place (as methods on <code class="language-plaintext highlighter-rouge">APIKey</code>).</p>
<h2 id="work-with-us">Work with us</h2>
<p>When we’re not busy working on GPXZ we help other companies with Python web development. If you need someone to set up your django API, <a href="/hire">get in touch</a>.</p>Keys are models with their own properties, not just strings. Users may have multiple keys. Deleted keys are retained in the DB.EUDEM download2024-01-20T00:00:00-06:002024-01-20T00:00:00-06:00https://www.gpxz.io/blog/eudem<h2 id="tldr">TLDR</h2>
<p>Download here (23GB): <a href="https://files.gpxz.io/eudem_buffered.zip">https://files.gpxz.io/eudem_buffered.zip</a></p>
<h2 id="eu-dem">EU-DEM</h2>
<p>As of January 2024, the 25m <a href="https://www.eea.europa.eu/en/datahub/datahubitem-view/d08852bc-7b5f-4835-a776-08362e2fbf4b">EU-DEM</a> dataset is no longer available to download via the Copernicus Land Monitoring Service.</p>
<blockquote>
<p>EU-DEM is not maintained anymore by the Copernicus Land Monitoring Service. You can still request access to the archived version by contacting the service desk of the Copernicus Land Monitoring Service at copernicus@eea.europa.eu. We recommend users to check the Copernicus DEM product publicly available at 30 m spatial resolution.</p>
</blockquote>
<p>As mentioned on their website, the Copernicus DEM is a higher quality replacement! But if you still need EU-DEM (e.g. to continue with a project you already started) you can download a version of the dataset I archived earlier.</p>
<p>Note that compared to the source data, it has been buffered by a few pixels (to avoid gaps between files) and the files renamed to match the SRTM-style naming required by <a href="https://github.com/ajnisbet/opentopodata">opentopodata</a> using <a href="https://github.com/ajnisbet/opentopodata/blob/13123436cecc656af994f378ca7534b4199c9910/docs/datasets/eudem.md">this script</a>.</p>
<p>Download a zip of the files here (23GB): <a href="https://files.gpxz.io/eudem_buffered.zip">https://files.gpxz.io/eudem_buffered.zip</a></p>
<h2 id="europe-dems-at-gpxz">Europe DEMs at GPXZ</h2>
<p><a href="/">GPXZ</a> is an API for high-resolution elevation data. We no longer use EU-DEM: we’ve found the 30m Copernicus dataset to be of higher quality as a base dataset. Plus, for many <a href="/dataset">regions</a> in Europe GPXZ has high-resolution lidar data.</p>
<p>If you need elevation data or help processing it, get in touch at <a href="mailto:andrew@gpxz.io">andrew@gpxz.io</a>!</p>TLDRRaster API2023-12-05T00:00:00-06:002023-12-05T00:00:00-06:00https://www.gpxz.io/blog/raster-api<p>The GPXZ API now supports 2D extracts!</p>
<p>Define a bounding box:</p>
<p><br />
<a href="/static/blog/img/lausanne-bbox.png"><img src="/static/blog/img/lausanne-bbox.png" /></a>
<em class="blog-caption">© OpenStreetMap</em>
<br />
<br /></p>
<p>and provide the bounding coordinates to the new <code class="language-plaintext highlighter-rouge">/v1/elevation/hires-raster</code>, along with your desired resolution</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Build request.
</span><span class="n">query_params</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"bbox_top"</span><span class="p">:</span> <span class="s">"46.531"</span><span class="p">,</span>
<span class="s">"bbox_bottom"</span><span class="p">:</span> <span class="s">"46.518"</span><span class="p">,</span>
<span class="s">"bbox_left"</span><span class="p">:</span> <span class="s">"6.629"</span><span class="p">,</span>
<span class="s">"bbox_right"</span><span class="p">:</span> <span class="s">"6.648"</span><span class="p">,</span>
<span class="s">"res_m"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span> <span class="c1"># Metres.
</span> <span class="s">"api-key"</span><span class="p">:</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="s">"GPXZ_API_KEY"</span><span class="p">],</span>
<span class="p">}</span>
<span class="c1"># Query data.
</span><span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="n">get</span><span class="p">(</span>
<span class="s">"https://api.gpxz.io/v1/elevation/hires-raster"</span><span class="p">,</span>
<span class="n">params</span><span class="o">=</span><span class="n">query_params</span><span class="p">,</span>
<span class="n">stream</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">response</span><span class="p">.</span><span class="n">raise_for_status</span><span class="p">()</span>
<span class="c1"># Save to file.
</span><span class="n">dest_path</span> <span class="o">=</span> <span class="n">Path</span><span class="p">(</span><span class="s">"data/lausanne.geotiff"</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">dest_path</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">shutil</span><span class="p">.</span><span class="n">copyfileobj</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">raw</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
</code></pre></div></div>
<p>The resulting raster can be used for e.g., 2D analysis, mapping, or offline data caching.</p>
<p><br />
<a href="/static/blog/img/lausanne-elevation.png"><img src="/static/blog/img/lausanne-elevation.png" /></a>
<em class="blog-caption">Geotiff rendered as a heatmap. The axes are in pixels.</em>
<br />
<br /></p>
<p>For more details, head over to the <a href="/docs#elevation-hires-raster">raster API documentation</a>.</p>The GPXZ API now supports 2D extracts!Which open elevation dataset should I use?2023-11-28T00:00:00-06:002023-11-28T00:00:00-06:00https://www.gpxz.io/blog/which-dataset<h2 id="which-open-elevation-dataset-should-i-use">Which open elevation dataset should I use?</h2>
<p>Download all the datasets covering your domain and compare each one against known ground-truth values, using measurements of quality that align with your usecase.</p>
<h2 id="that-sounds-like-a-lot-of-work">That sounds like a lot of work.</h2>
<p>You could hire a <a href="/hire">geospatial consultant</a> to do this for you!</p>
<h2 id="ok-but-which-open-dataset-should-i-use">OK but which open dataset should I use?</h2>
<p>The best global open elevation dataset (in general, for most usecases, at time of writing) is the <a href="https://spacedata.copernicus.eu/collections/copernicus-digital-elevation-model">Copernicus Elevation Model</a>. It’s 30m resolution, and covers all latitudes. It’s of vastly higher quality than other global datasets (such as SRTM, Mapzen, Aster, AW3D30). At GPXZ we use Copernicus as our base data layer for land elevation.</p>
<p>You can download (a slightly outdated version of) the dataset from <a href="https://registry.opendata.aws/copernicus-dem/">AWS</a> and host your own API with <a href="https://github.com/ajnisbet/opentopodata">OpenTopoData</a>.</p>
<h2 id="which-open-elevation-api-should-i-use">Which open elevation API should I use?</h2>
<p>The GPXZ API serves up the best global dataset of open elevation data. We’re a paid service, but the data you get is open so you’re free to store, modify, and use the results for commercial purposes.</p>
<h2 id="which-free-open-elevation-api-should-i-use">Which free open elevation API should I use?</h2>
<p><a href="https://www.opentopodata.org">OpenTopoData</a> is our sister project: we run a free public elevation API with a small daily quota. You should probably use the Mapzen data source: it’s the best-quality source on the public API.</p>Which open elevation dataset should I use?RAID disk monitoring with postfix and mailgun2023-11-01T00:00:00-05:002023-11-01T00:00:00-05:00https://www.gpxz.io/blog/raid-monitoring<p>The redundancy of RAID buys you time between disk failure and server failure. But a default RAID setup will happily function with a failed disk, until the next disk fails and your data is lost.</p>
<p>Email notifications need to be configured manually so you can intervene (replace a harddrive) after a disk failure.</p>
<p>At GPXZ we use RAID heavily to work with large datasets. This is how we handle monitoring of RAID arrays, using Mailgun to send email alerts.</p>
<p>These instructions use Ubuntu 20.04: different linux distributions may have config files in different locations.</p>
<h2 id="overview">Overview.</h2>
<p><a href="https://raid.wiki.kernel.org/index.php/A_guide_to_mdadm">mdadm</a> and <a href="https://linux.die.net/man/8/smartd">smartd</a> can’t send email directly: they instead pass emails to <a href="https://en.wikipedia.org/wiki/Message_transfer_agent">mail relay software</a> running on your server. This software can send emails directly, but ensuring reliable email delivery is non-trivial, and it’s extra important to have reliable delivery for critical alerts. The <a href="https://www.postfix.org/">postfix</a> mail relay software can accept emails from mdadm/smartd and pass them off to an email API service such as <a href="https://www.mailgun.com/">mailgun</a>.</p>
<p>So our setup will look like</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mdadm/smartd → postfix → mailgun
</code></pre></div></div>
<h2 id="postfix-email-relay">Postfix email relay</h2>
<p>Before getting started, log into the mailgun web UI. Under the SMTP tab of your domain’s settings you’ll see your mailgun SMTP domain (like <code class="language-plaintext highlighter-rouge">smtp.mailgun.org</code>) and your SMTP login (like <code class="language-plaintext highlighter-rouge">postmaster@example.com</code>).</p>
<p>First install postfix</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>postfix
</code></pre></div></div>
<p>There are some options to select during installation:</p>
<ul>
<li>Choose <code class="language-plaintext highlighter-rouge">Satellite System</code> as the mailer type.</li>
<li>Use your server’s <code class="language-plaintext highlighter-rouge">$HOSTNAME</code> as the mail name.</li>
<li>Use your mailgun’s SMTP server as the relay host (e.g., <code class="language-plaintext highlighter-rouge">smtp.mailgun.org</code>).</li>
</ul>
<p>Create the file <code class="language-plaintext highlighter-rouge">/etc/postfix/sasl_passwd</code> to store your mailgun credentials:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano /etc/postfix/sasl_passwd
</code></pre></div></div>
<p>with the following contents:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{mailgun smtp domain} {mailgun smtp login}:{mailgun smtp password}
</code></pre></div></div>
<p>If you’ve never used SMTP before you may have to reset your SMTP password. This won’t impact your mailgun API key or web login password.</p>
<p>The config file might look like</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>smtp.mailgun.org postmaster@example.com: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxx-xxxxxxxx
</code></pre></div></div>
<p>Lock down the permissions of the credentials file then load it into postfix.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo chmod </span>600 /etc/postfix/sasl_passwd
<span class="nb">sudo </span>postmap /etc/postfix/sasl_passwd
</code></pre></div></div>
<p>Configure domain mapping</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano /etc/postfix/generic
</code></pre></div></div>
<p>with your domain and your mailgun SMTP server.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@example.com no-reply@[smtp.mailgun.org]:587
</code></pre></div></div>
<p>then load <em>that</em> into postfix</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>postmap /etc/postfix/generic
</code></pre></div></div>
<p>Finally, configure postfix by adding these lines to <code class="language-plaintext highlighter-rouge">/etc/postfix/main.cf</code> (or editing the corresponding lines for any settings that already exist). Replace <code class="language-plaintext highlighter-rouge">smtp.mailgun.org</code> with your mailgun SMTP domain.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano /etc/postfix/main.cf
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>relayhost = [smtp.mailgun.org]:587
mydestination = localhost.localdomain, localhost
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
smtp_sasl_mechanism_filter = AUTH LOGIN
smtp_tls_note_starttls_offer = yes
smtp_generic_maps = hash:/etc/postfix/generic
</code></pre></div></div>
<p>Reload this new config into postfix with</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>systemctl restart postfix
</code></pre></div></div>
<p>You should be all set up to send mail from your server! To test it’s working, you can send a test email to <code class="language-plaintext highlighter-rouge">recipient@example.com </code> with the <code class="language-plaintext highlighter-rouge">mail</code> command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s2">"Test message from postfix"</span> | mail <span class="nt">-s</span> <span class="s2">"Test message"</span> recipient@example.com
</code></pre></div></div>
<p>If you don’t get an email within a few seconds, something’s broken! Check your spam email folder, the Mailgun UI, and logs in <code class="language-plaintext highlighter-rouge">/var/loc/mail*</code></p>
<h2 id="mdadm">mdadm</h2>
<p>Now that we can send email from our server, the next step is to tell our disk monitoring software to use it. We’ll start with mdadm, which monitors the health of your RAID array.</p>
<p>Edit the mdadm config</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano /etc/mdadm/mdadm.conf
</code></pre></div></div>
<p>and add/modify the <code class="language-plaintext highlighter-rouge">MAILADDR</code> setting to the recipient address alerts should be sent to:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MAILADDR recipient@example.com
</code></pre></div></div>
<p>then do a quick test with</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mdadm <span class="nt">--monitor</span> <span class="nt">--test</span> <span class="nt">--oneshot</span> /dev/md0
</code></pre></div></div>
<p>(where <code class="language-plaintext highlighter-rouge">/dev/md0</code> is a RAID array). You should get an email in your <code class="language-plaintext highlighter-rouge">recipient@example.com</code> inbox.</p>
<p>That’s all you need to do to have mdadm send email notification of any errors found during a check. However, it’s not uncommon for mdadm to be configured incorrectly and not be performing checks! So it’s worth checking regular checks are set up. Unfortunately this depends on your OS, but on Ubuntu 22.04 you can check for an entry under</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>systemctl list-timers mdcheck_start
</code></pre></div></div>
<h2 id="smartd">smartd</h2>
<p>While mdadm will tell you when a disk has failed, there might be advanced warning of this in the disk’s SMART statistics. smartd is a service that can monitor these statistics and alert you if any fall out of compliance.</p>
<p>You may need to install smartmontools first if the <code class="language-plaintext highlighter-rouge">smartd</code> command isn’t found:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install </span>smartmontools
</code></pre></div></div>
<p>Modify the configuration file:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>nano /etc/smartd.conf
</code></pre></div></div>
<p>The file contains lots of commented example configurations, plus potentially an uncommented line beginning with <code class="language-plaintext highlighter-rouge">DEVICESCAN</code>. smartd can only handle a single <code class="language-plaintext highlighter-rouge">DEVICESCAN</code> directive, so comment any existing lines out then add</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DEVICESCAN -o on -H -l error -l selftest -t -M test -m recipient@example.com
</code></pre></div></div>
<p>This setting will do the following</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">-o on</code>: Enable monitoring.</li>
<li><code class="language-plaintext highlighter-rouge">-H</code>: Check SMART attributes for pre-failure conditions.</li>
<li><code class="language-plaintext highlighter-rouge">-l error -l selftest</code>: Check for errors as well as failed test results.</li>
<li><code class="language-plaintext highlighter-rouge">-t</code>: Check changes in SMART attributes.</li>
<li><code class="language-plaintext highlighter-rouge">-m recipient@example.com</code>: Send email alerts to this address.</li>
<li><code class="language-plaintext highlighter-rouge">-M test</code>: Send a test email when smartd is started.</li>
</ul>
<p>To test this setup restart the smartd service:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>systemctl restart smartd
</code></pre></div></div>
<p>You should get one email for each disk. You can leave config setting as-is, or remove ` -M test<code class="language-plaintext highlighter-rouge"> from </code>/etc/smartd.conf` to get email alerts only for errors (not for service restarts).</p>The redundancy of RAID buys you time between disk failure and server failure. But a default RAID setup will happily function with a failed disk, until the next disk fails and your data is lost.Upcoming GPXZ dataset update2023-10-05T00:00:00-05:002023-10-05T00:00:00-05:00https://www.gpxz.io/blog/v2023-1<p>The new dataset is nearly here!</p>
<p>Queries to <code class="language-plaintext highlighter-rouge">api.gpxz.io</code> will return data from the new dataset beginning 2023-10-09 23:59 UTC.</p>
<p>Most customers need to do nothing: new queries will soon start to return improved data. Some customers may want to rebuild processed datasets or re-fetch cached data.</p>
<h2 id="api-changes">API changes</h2>
<p>There is only one API change: all requests will now return a <code class="language-plaintext highlighter-rouge">X-DATASET-VERSION</code> header. For now the header will have the value <code class="language-plaintext highlighter-rouge">2023.1</code> and this will be updated for future dataset releases.</p>
<p>The new sources will be added to the <code class="language-plaintext highlighter-rouge">/v1/elevation/sources</code> endpoint. Sources that are no longer used in the v2023.1 dataset will still appear in the endpoint.</p>
<h2 id="dataset-changes">Dataset changes</h2>
<p>This is a big update that adds new hires coverage areas, and improves data for existing areas of hires coverage.</p>
<h3 id="new-hi-res-coverage">New hi-res coverage</h3>
<ul>
<li>Denmark (whole country, at 1.6m)</li>
<li>Finland (75% of the country, at 2m)</li>
<li>France (most of Metropolitan France plus most overseas territories, at 1m)</li>
<li>Germany (Bayern and NWR, at 1m)</li>
<li>Hong Kong (whole country, at 0.5m)</li>
<li>Mexico (small parts, at 5m)</li>
<li>Norway (most of the country, at 10m)</li>
<li>Wales (whole country, at 1m)</li>
<li>Scotland (Edinburgh at 50cm; parts at 2m)</li>
</ul>
<h3 id="improved-hi-res-coverage">Improved hi-res coverage</h3>
<ul>
<li>New Zealand (more 1m lidar datasets added)</li>
<li>USA
<ul>
<li>The 10m base coverage for the USA has been updated, resolving many noise issues present outside 1m coverage zones.</li>
<li>1m lidar coverage for all of Nebraska, Iowa, Indiana, Mississippi, Tennessee.</li>
<li>90%+ 1m lidar coverage for Arkansas, Alabama, Georgia, Florida,</li>
<li>Expanded 1m lidar coverage across the country.</li>
</ul>
</li>
<li>Canada (expanded 1m lidar coverage, notably New Brunswick and Vancouver Island).</li>
<li>GEBCO (wordwide) and EMOD (Europe) bathymetry have been updated to their latest versions (2023, and 2022 respectively).</li>
<li>England (1m lidar coverage now covers most of the country)</li>
<li>Spain (5m coverage now covers the whole country)</li>
</ul>
<h3 id="other-coverage-changes">Other coverage changes</h3>
<ul>
<li>The 10m Iceland data source has been removed, and Iceland coverage now comes from the 30m Copernicus dataset. The 10m data had <a href="/blog/iceland-10m-dtm">numerous issues</a> so while this is technically a reduction in resolution, the new dataset has much better quality in Iceland.</li>
<li>Many more areas of localised noise/issues have been identified and removed from sources.</li>
</ul>
<h3 id="methodology-changes">Methodology changes</h3>
<ul>
<li>A new algorithm for coastline detection and lidar ocean removal fixes quality issues within 30m of ocean (where sometimes 30m Copernicus land data is incorrectly used where lidar data indicates water is).</li>
</ul>
<h2 id="new-coverage">New coverage</h2>
<div class="coverage-widetainer">
<a class="coverage-map-link" href="/static/img/res-v2023.1-global.png">
<div class="coverage-map-container">
<img src="/static/img/res-v2023.1-global.png" alt="Global coverage map." />
<div class="coverage-map-legend-vtainer">
<div class="coverage-map-legend">
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #df16df"></div>
<div class="coverage-legend-label">0.5m → 2m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #ffccff"></div>
<div class="coverage-legend-label">5m → 10m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #feffe5"></div>
<div class="coverage-legend-label">30m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #dfe7ff"></div>
<div class="coverage-legend-label">110m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #eef2ff"></div>
<div class="coverage-legend-label">450m</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div class="coverage-widetainer">
<a class="coverage-map-link" href="/static/img/res-v2023.1-usa.png">
<div class="coverage-map-container">
<img src="/static/img/res-v2023.1-usa.png" alt="Global coverage map." />
<div class="coverage-map-legend-vtainer">
<div class="coverage-map-legend">
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #df16df"></div>
<div class="coverage-legend-label">1m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #ffccff"></div>
<div class="coverage-legend-label">10m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #feffe5"></div>
<div class="coverage-legend-label">30m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #eef2ff"></div>
<div class="coverage-legend-label">450m</div>
</div>
</div>
</div>
</div>
</a>
</div>
<div class="coverage-widetainer coverage-grid">
<a class="coverage-map-link" href="/static/img/res-v2023.1-eu.png">
<div class="coverage-map-container">
<img src="/static/img/res-v2023.1-eu.png" alt="Global coverage map." />
<div class="coverage-map-legend-vtainer">
<div class="coverage-map-legend">
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #df16df"></div>
<div class="coverage-legend-label">1m → 2m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #ffccff"></div>
<div class="coverage-legend-label">5m → 10m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #feffe5"></div>
<div class="coverage-legend-label">30m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #dfe7ff"></div>
<div class="coverage-legend-label">110m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #eef2ff"></div>
<div class="coverage-legend-label">450m</div>
</div>
</div>
</div>
</div>
</a>
<a class="coverage-map-link" href="/static/img/res-v2023.1-aunz.png">
<div class="coverage-map-container">
<img src="/static/img/res-v2023.1-aunz.png" alt="Global coverage map." />
<div class="coverage-map-legend-vtainer">
<div class="coverage-map-legend">
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #df16df"></div>
<div class="coverage-legend-label">1m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color" style="background-color: #ffccff"></div>
<div class="coverage-legend-label">5m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #feffe5"></div>
<div class="coverage-legend-label">30m</div>
</div>
<div class="coverage-legend-entry">
<div class="coverage-legend-color coverage-legend-color-border" style="background-color: #eef2ff"></div>
<div class="coverage-legend-label">450m</div>
</div>
</div>
</div>
</div>
</a>
</div>
<p>Basemaps thanks to <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a>.</p>The new dataset is nearly here!Migrating from Open Topo Data2023-07-06T00:00:00-05:002023-07-06T00:00:00-05:00https://www.gpxz.io/blog/open-topo-data-compatibility<p>GPXZ has a new endpoint that’s compatibile with the <a href="https://www.opentopodata.org/">Open Topo Data</a> elevation API.</p>
<p>To migrate from Open Topo Data to GPXZ, you can replace your Open Topo Data url</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://api.opentopodata.org/v1/<dataset_name>
</code></pre></div></div>
<p>with this GPXZ endpoint:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://api.gpxz.io/v1/elevation/otd-compat
</code></pre></div></div>
<p>Most features are supported, check out the <a href="/docs#elevation-otd">documentation</a> for more details!</p>GPXZ has a new endpoint that’s compatibile with the Open Topo Data elevation API.Python task queue latency2023-06-16T00:00:00-05:002023-06-16T00:00:00-05:00https://www.gpxz.io/blog/queue-latency<p>Python’s task queue libraries differ greatly in how long it takes to get a result back.</p>
<p>Most of this variation in latency comes from two design decisions:</p>
<ul>
<li>Whether to use polling (slow) or signals/blocking (fast) to communicate with workers.</li>
<li>How much acknowledgements, heartbeats, backoff and retrying to do. These features add reliability, but also add latency due to additional round trips.</li>
</ul>
<p>For GPXZ I sometimes need to wait for task results in the context of an HTTP API request. In this case, latency is very important as there’s a user waiting for the result on the other end! Similarly I don’t care about graceful failover or preventing double-calculation or retires: if the task times-out or fails I’ll just fail the API request. Any recovery will be too late.</p>
<p>So I wanted to know which of the Python task queue libraries would be best for this usecase.</p>
<h2 id="source-code">Source code</h2>
<p>The source code for this benchmark is here: <a href="https://github.com/gpxz/queue-latency-benchmark">gpxz/queue-latency-benchmark</a>.</p>
<h2 id="results">Results</h2>
<ul>
<li>If every nanosecond counts, consider using redis blocking operations directly. In theory, you can get latency down to one RTT along your <code class="language-plaintext highlighter-rouge">client -> backend -> worker</code>.</li>
<li>Otherwise, dramatiq+rabbitmq isn’t much slower than the theoretical limit.</li>
<li>Celery+redis is a good option if you’re already using redis.</li>
</ul>
<h2 id="candidates">Candidates</h2>
<p>I tried the most popular libraries:</p>
<ul>
<li>rq <span class="blog-queue-v">v1.15.0</span></li>
<li>huey <span class="blog-queue-v">v2.4.5</span></li>
<li>dramatiq <span class="blog-queue-v">v1.14.2</span> with both redis <span class="blog-queue-v">v7.0.11</span> and rabbitmq <span class="blog-queue-v">v3.12.0</span> backends</li>
<li>celery <span class="blog-queue-v">v5.3.0</span> with both redis <span class="blog-queue-v">v7.0.11</span> and rabbitmq <span class="blog-queue-v">v3.12.0</span> backends</li>
</ul>
<p>rq and huey only support redis <span class="blog-queue-v">v7.0.11</span> for job queues. I ran two instances of dramatiq and celery twice: one with redis, and a second with rabbitmq <span class="blog-queue-v">v3.12.0</span>.</p>
<p>All candidates used redis as their result backend. Celery is supposed to be able to store results in rabbitmq also, but I couldn’t get that to work.</p>
<h2 id="reckless-queue">Reckless queue</h2>
<p>In addition to the four mature python packages above, I wrote a quick redis task queue optimised for maximum speed. This isn’t too hard when you don’t care about persistence, acknowledgements, features, stability, or correctness.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Result</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">job_name</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">result_key</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'job_result:</span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">job_name</span><span class="si">}</span><span class="s">'</span>
<span class="k">def</span> <span class="nf">block_for_return_value</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">redis_client</span><span class="p">.</span><span class="n">brpop</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">result_key</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">save_return_value</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">return_value</span><span class="p">):</span>
<span class="n">redis_client</span><span class="p">.</span><span class="n">rpush</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">result_key</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">return_value</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">enqueue</span><span class="p">(</span><span class="n">queue_name</span><span class="o">=</span><span class="s">'default'</span><span class="p">):</span>
<span class="c1"># Prepare job.
</span> <span class="n">job_name</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="p">.</span><span class="n">uuid4</span><span class="p">())</span>
<span class="n">job</span> <span class="o">=</span> <span class="p">{</span><span class="s">'name'</span><span class="p">:</span> <span class="n">job_name</span><span class="p">}</span>
<span class="c1"># Push job to queue.
</span> <span class="n">redis_client</span><span class="p">.</span><span class="n">rpush</span><span class="p">(</span><span class="sa">f</span><span class="s">'queue:</span><span class="si">{</span><span class="n">queue_name</span><span class="si">}</span><span class="s">'</span><span class="p">,</span> <span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">job</span><span class="p">))</span>
<span class="k">return</span> <span class="n">Result</span><span class="p">(</span><span class="n">job_name</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">work</span><span class="p">(</span><span class="n">queue_name</span><span class="o">=</span><span class="s">'default'</span><span class="p">):</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">_</span><span class="p">,</span> <span class="n">encoded_job</span> <span class="o">=</span> <span class="n">redis_client</span><span class="p">.</span><span class="n">brpop</span><span class="p">(</span><span class="sa">f</span><span class="s">'queue:</span><span class="si">{</span><span class="n">queue_name</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
<span class="n">job</span> <span class="o">=</span> <span class="n">json</span><span class="p">.</span><span class="n">loads</span><span class="p">(</span><span class="n">encoded_job</span><span class="p">)</span>
<span class="c1"># Our worker only does one thing.
</span> <span class="n">return_value</span> <span class="o">=</span> <span class="n">tracked_sleep_task</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">Result</span><span class="p">(</span><span class="n">job</span><span class="p">[</span><span class="s">'name'</span><span class="p">])</span>
<span class="n">result</span><span class="p">.</span><span class="n">save_return_value</span><span class="p">(</span><span class="n">return_value</span><span class="p">)</span>
<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span>
<span class="n">work</span><span class="p">()</span>
</code></pre></div></div>
<h2 id="setup">Setup</h2>
<p>The exact syntax was different for each queue library, but general approach was:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">def</span> <span class="nf">tracked_sleep_task</span><span class="p">():</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">result</span><span class="p">[</span><span class="s">'started_at'</span><span class="p">]</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">result</span><span class="p">[</span><span class="s">'ended_at'</span><span class="p">]</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="k">return</span> <span class="n">result</span>
<span class="n">enqueued_at</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="n">job</span> <span class="o">=</span> <span class="n">queue</span><span class="p">.</span><span class="n">submit</span><span class="p">(</span><span class="n">tracked_sleep_task</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">job</span><span class="p">.</span><span class="n">block_for_result</span><span class="p">()</span>
<span class="n">returned_at</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
</code></pre></div></div>
<p>This gives us two different latency values:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">latency_enqueue = started_at - enqueued_at</code>: The time between the client submitting the job and the worker starting it.</li>
<li><code class="language-plaintext highlighter-rouge">latency_result = returned_at - ended_at</code>: The time between the worker finishing the job and the client getting the result.</li>
</ul>
<p>I repeated the task loop 100 times for each library. The first result was discarded (to avoid high-latencies due connection establishment and cache population).</p>
<h2 id="enqueue-latency">Enqueue latency</h2>
<p>First the latency between submitting a job and the worker running it.</p>
<div>
<br />
<a href="/static/blog/img/q_enqueue.png"><img src="/static/blog/img/q_enqueue.png" /></a>
<em class="blog-caption">Latency to enqueue a job.</em>
<br />
<br />
</div>
<p>The redis broker in dramatiq has a number of timeouts and sleeps when the queue is idle, and these cause huge latency. These timeouts can’t be easily set with the CLI or the high-level API.</p>
<p>You could subclass or hack the sleeps them if suficiently motivated, but even then, <code class="language-plaintext highlighter-rouge">_RedisConsumer</code> uses polling to check for new jobs so there will always be significantly more latency than blocking approaches.</p>
<p>Lets drop dramatiq+redis and have another look at the results:</p>
<div>
<br />
<a href="/static/blog/img/q_enqueue_fast.png"><img src="/static/blog/img/q_enqueue_fast.png" /></a>
<em class="blog-caption">Latency to enqueue a job (fast results only).</em>
<br />
<br />
</div>
<p>Huey does well but still can’t beat using raw redis <code class="language-plaintext highlighter-rouge">rpush</code>, <code class="language-plaintext highlighter-rouge">brpop</code> commands.</p>
<h2 id="result-latency">Result latency</h2>
<p>This is the time taken to return the result from the worker.</p>
<div>
<br />
<a href="/static/blog/img/q_result.png"><img src="/static/blog/img/q_result.png" /></a>
<em class="blog-caption">Latency to return a job result.</em>
<br />
<br />
</div>
<p>Huey struggles here: it does polling on a hardcoded 100ms loop. You can manually poll for results (as I did for rq) which would make the latency similar to rq’s result.</p>
<p>rq doesn’t have an API for blocking results so you have to do your own polling. The result shown above is with 10ms polling: it’s a performance improvement on huey because you can choose your own interval. But to get down to the ~1ms median return time of the other libraries you’ll be thrashing your redis instance.</p>
<p>I made a <a href="https://github.com/rq/rq/pull/1939">pull request</a> to remove the need for polling with rq: the results should be much better once that is merged.</p>
<p>Here are the results without rq and huey</p>
<div>
<br />
<a href="/static/blog/img/q_result_fast.png"><img src="/static/blog/img/q_result_fast.png" /></a>
<em class="blog-caption">Latency to return a job result.</em>
<br />
<br />
</div>
<p>There’s no difference here between the <code class="language-plaintext highlighter-rouge">_rmq</code> and <code class="language-plaintext highlighter-rouge">_redis</code> functions, as all approaches use redis for result serialization, even if the job backend is rabbitmq.</p>
<p>Dramatiq is basically as fast as possible, which is awesome cause you get a lot more with dramatiq compared to my hacky script!</p>
<h2 id="total-latency">Total latency</h2>
<p>Overall: huey, and dramatiq+redis aren’t the best choice for low-latency task queues. Which is a shame as these are the simplest to configure (without relying on celery or rabbitmq)!</p>
<div>
<br />
<a href="/static/blog/img/q_total.png"><img src="/static/blog/img/q_total.png" /></a>
<em class="blog-caption">Total latency (not including time spend working)</em>
<br />
<br />
</div>
<p>Without those libraries there are some decent options:</p>
<div>
<br />
<a href="/static/blog/img/q_total_fast.png"><img src="/static/blog/img/q_total_fast.png" /></a>
<em class="blog-caption">Total latency (not including time spend working)</em>
<br />
<br />
</div>
<p>rq offers good performance for such a simple deployment experience, and should be a bit faster still once blocking results are released.</p>
<p>Dramatiq+rabbitmq was the fastest off-the-shelf task queue tested. Dramatiq is much easier to work with than celery, but rabbitmq is a pain to set up and run.</p>
<p>If speed is important above all else, consider a DIY approach with redis!</p>Python’s task queue libraries differ greatly in how long it takes to get a result back.How to configure ruff2023-05-31T00:00:00-05:002023-05-31T00:00:00-05:00https://www.gpxz.io/blog/ruff<p>Given the <a href="https://beta.ruff.rs/docs/rules/">smörgåsbord</a> of rules and plugins that <a href="https://github.com/charliermarsh/ruff">ruff</a> supports, it’s hard to figure out which settings are good for your average python project.</p>
<h2 id="configuration">Configuration</h2>
<p>First, configure ruff with a <code class="language-plaintext highlighter-rouge">pyproject.toml</code> file at the root of your project.</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">[tool.ruff]</span>
<span class="py">line-length</span> <span class="p">=</span> <span class="mi">100</span>
<span class="py">target-version</span> <span class="p">=</span> <span class="s">"py311"</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">pyproject.toml</code> is the future of python tool configuration: it’s easy to see that your line length and python version are set consistently across your linter, auto-formatter, and import sorter.</p>
<p>All commandline options go into the config file. Then call ruff like this (perhaps in a Makefile or a github actions workflow) with the folders that need checking:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ruff check ./src ./tests
</code></pre></div></div>
<h2 id="how-rules-work">How rules work</h2>
<p>Ruff (and python linting in general) is based around rules. Each rule is known by a code: for example the <code class="language-plaintext highlighter-rouge">F841</code> rule checks for a variable that has a result saved to it but then never used.</p>
<p>Many of these codes are standardised across different linters, so you can typically google a lint code to find a <a href="https://www.flake8rules.com/rules/F841.html">description of the rule</a>. You can also search the <a href="https://beta.ruff.rs/docs/rules/">ruff rules page</a>.</p>
<p>The letter(s) at the start of the rule code are used to group similar rules together. For example, all rules starting with <code class="language-plaintext highlighter-rouge">DJ</code> relate to best practices when working with the django web framework. In many parts of ruff, you can use either rule codes like <code class="language-plaintext highlighter-rouge">DJ01</code> to refer to a single rule, or the prefix like <code class="language-plaintext highlighter-rouge">DJ</code> to apply to all rules in the <code class="language-plaintext highlighter-rouge">DJ</code> group.</p>
<h2 id="enabled-rules">Enabled rules</h2>
<p>Rules are enabled with the <code class="language-plaintext highlighter-rouge">select</code> option. It’s best to start with just a few rule groups enabled: run ruff and fix any issues before adding any more rules.</p>
<p>Rule groups <code class="language-plaintext highlighter-rouge">E</code> and <code class="language-plaintext highlighter-rouge">F</code> are enabled by default, and cover the most bang-for-your-buck python issues. Add a comment for every rule you put in the config file, as the letter system is easy to forget long-term.</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">select</span> <span class="p">=</span> <span class="p">[</span>
<span class="s">"E"</span><span class="p">,</span> <span class="c"># pycodestyle</span>
<span class="s">"F"</span><span class="p">,</span> <span class="c"># pyflakes</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Once any issues are fixed, you can add more groups. The list of <a href="https://beta.ruff.rs/docs/rules/">supported groups</a> is long and increasing, but many are only for specific frameworks or so picky that they’re unlikely to materially improve your code. It’s fine to choose a few that look helpful. This is a good start:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">select</span> <span class="p">=</span> <span class="p">[</span>
<span class="s">"A"</span><span class="p">,</span> <span class="c"># prevent using keywords that clobber python builtins</span>
<span class="s">"B"</span><span class="p">,</span> <span class="c"># bugbear: security warnings</span>
<span class="s">"E"</span><span class="p">,</span> <span class="c"># pycodestyle</span>
<span class="s">"F"</span><span class="p">,</span> <span class="c"># pyflakes</span>
<span class="s">"ISC"</span><span class="p">,</span> <span class="c"># implicit string concatenation</span>
<span class="s">"UP"</span><span class="p">,</span> <span class="c"># alert you when better syntax is available in your python version</span>
<span class="s">"RUF"</span><span class="p">,</span> <span class="c"># the ruff developer's own rules</span>
<span class="p">]</span>
</code></pre></div></div>
<h2 id="ignored-rules">Ignored rules</h2>
<p>Ruff isn’t perfect: linters can throw errors for perfectly fine code, or may include rules that you disagree with.</p>
<p>Your linter works for you! The ultimate goal is to improve your code not to make a linter happy. So ruff provides a few different ways to suppress errors.</p>
<p>You can suppress a lint error using a comment with the format <code class="language-plaintext highlighter-rouge"># noqa: <rule_code></code>. You should accompany any <code class="language-plaintext highlighter-rouge">noqa</code> override with an explanation for why the rule ought to be ignored.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Configure logging before importing the rest of the app, so import errors
# and logs are correctly handled. Supress import order lint rule.
</span><span class="kn">from</span> <span class="nn">myapp</span> <span class="kn">import</span> <span class="n">config</span>
<span class="n">logging</span><span class="p">.</span><span class="n">dictConfig</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">myapp</span> <span class="kn">import</span> <span class="n">backend</span><span class="p">,</span> <span class="n">models</span> <span class="c1"># noqa: E402
</span></code></pre></div></div>
<p>Ruff lint rules can also be disabled project-wide. Disable rules that you disagree with, rules that conflict with a third-party API you have no control over, or rules that would be too time-consuming to fix right now (you can always come back to them later!)</p>
<p>For example, if you use <a href="https://github.com/psf/black">black</a> or a similar code formatter, you may want to skip any format-related rules (and just trust your formatter). Rules can be disabled using the <code class="language-plaintext highlighter-rouge">ignore</code> key in <code class="language-plaintext highlighter-rouge">pyproject.toml</code>. I start most projects with:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">ignore</span> <span class="p">=</span> <span class="p">[</span>
<span class="s">"E712"</span><span class="p">,</span> <span class="c"># Allow using if x == False, as it's not always equivalent to if x.</span>
<span class="s">"E501"</span><span class="p">,</span> <span class="c"># Supress line-too-long warnings: trust black's judgement on this one.</span>
<span class="s">"UP017"</span><span class="p">,</span> <span class="c"># Allow timezone.utc instead of datetime.UTC.</span>
<span class="p">]</span>
</code></pre></div></div>
<p>and for django projects add</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">ignore</span> <span class="p">=</span> <span class="p">[</span>
<span class="s">"E712"</span><span class="p">,</span> <span class="c"># Allow using if x == False, as it's not always equivalent to if x.</span>
<span class="s">"E501"</span><span class="p">,</span> <span class="c"># Supress line-too-long warnings: trust black's judgement on this one.</span>
<span class="s">"A003"</span><span class="p">,</span> <span class="c"># Allow shawoding class attribute: django uses id.</span>
<span class="s">"B904"</span><span class="p">,</span> <span class="c"># Allow unchained exceptions: it's fine to raise 404 in django.</span>
<span class="p">]</span>
</code></pre></div></div>
<p>Before disabling a rule project-wide, it’s helpful to run the linter as normal and read through the warnings, in case the project-level <code class="language-plaintext highlighter-rouge">ignore</code> is suppressing a useful warning.</p>
<h2 id="fixing">Fixing</h2>
<p>Unlike most linters, ruff can automatically fix your code! However, unlike code formatters, ruff performs non-cosmetic changes which may subtly change how your code functions.</p>
<p>If you have complete faith in your test suite, great! Don’t just add <code class="language-plaintext highlighter-rouge">--fix</code> to your ruff command though: that could break some code and it’s corresponding test in the same way, hiding the failing code behind a passing test. Instead, adopt ruff fixing like this</p>
<ul>
<li>Run ruff with fixing <em>just</em> on your code, not on your tests: <code class="language-plaintext highlighter-rouge">ruff check --fix ./src</code>.</li>
<li>Check the un-ruffed tests still pass.</li>
<li>Now ruff all your stuff: <code class="language-plaintext highlighter-rouge">ruff check --fix ./src ./tests</code></li>
<li>Check the tests still pass.</li>
<li>Use ruff fixing going forward.</li>
</ul>
<p>If you only “mostly but not entirely” trust your test suite, a more realistic approach is to first enable ruff’s fixing (by adding <code class="language-plaintext highlighter-rouge">--fix</code> to your <code class="language-plaintext highlighter-rouge">ruff</code> command, but without any fixable rules in <code class="language-plaintext highlighter-rouge">pyproject.toml</code></p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">fixable</span> <span class="p">=</span> <span class="p">[]</span>
</code></pre></div></div>
<p>No fixing will be done yet with this config. But going forward, when you get a ruff warning that you know can be unambigously fixed on your codebase, add it to the fixable list. I often end up with these rules:</p>
<div class="language-toml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">fixable</span> <span class="p">=</span> <span class="p">[</span>
<span class="s">"F401"</span><span class="p">,</span> <span class="c"># Remove unused imports.</span>
<span class="s">"NPY001"</span><span class="p">,</span> <span class="c"># Fix numpy types, which are removed in 1.24.</span>
<span class="s">"RUF100"</span><span class="p">,</span> <span class="c"># Remove unused noqa comments.</span>
<span class="p">]</span>
</code></pre></div></div>
<h2 id="formatting-fixing-vs-testing-checking">Formatting (fixing) vs testing (checking)</h2>
<p>So how does this fit into other parts of your python tooling?</p>
<p>With fixing enabled, ruff is playing two roles here:</p>
<ul>
<li>As a formatter we want ruff to modify (fix) our code, but don’t care if the code will pass testing.</li>
<li>As a linter we want ruff to test our code, but any issues should be raised as errors and not silently fixed (otherwise running the linter on bad code in CI might pass, as ruff might fix some warnings locally that aren’t in the repo!)</li>
</ul>
<p>Using <code class="language-plaintext highlighter-rouge">pyproject.toml</code> makes it easy to have ruff do these dual duties with a consistent configuration. I like to have a <code class="language-plaintext highlighter-rouge">make fmt</code> command for code formatting and a separate <code class="language-plaintext highlighter-rouge">make test</code> command in a <code class="language-plaintext highlighter-rouge">Makefile</code></p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fmt:
isort ./src ./tests
ruff check --fix-only ./src ./tests
black ./src ./tests
test:
ruff check ./src ./tests
black --check ./src ./tests
pytest
</code></pre></div></div>
<p>In CI you can just run <code class="language-plaintext highlighter-rouge">make test</code>.</p>
<p>If you’re not using a <code class="language-plaintext highlighter-rouge">Makefile</code> you can specify the commands out. For example, your GitHub <code class="language-plaintext highlighter-rouge">ci.yaml</code> might look like:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">build</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">]</span>
<span class="na">jobs</span><span class="pi">:</span>
<span class="na">test</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v1</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Setup</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">pip install -r requirements.txt</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Lint</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">ruff check ./src ./tests</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Check formatting</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">black --check ./src ./tests</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Test</span>
<span class="na">run</span><span class="pi">:</span> <span class="s">pytest</span>
</code></pre></div></div>
<p>In all of the above examples I have <code class="language-plaintext highlighter-rouge">ruff</code> as the first test command: you want to have your fastest tests first so you get early feedback on any issues, and save on CI minutes for failed tests. Some kinds of issues (like python syntax errors) will be detected by basically any tool in your test chain. I typically go in the following order:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ruff</code></li>
<li><code class="language-plaintext highlighter-rouge">black</code></li>
<li><code class="language-plaintext highlighter-rouge">isort</code></li>
<li><code class="language-plaintext highlighter-rouge">mypy</code></li>
<li><code class="language-plaintext highlighter-rouge">pytest</code> or <code class="language-plaintext highlighter-rouge">manage.py test</code></li>
</ul>
<h2 id="versioning">Versioning</h2>
<p>You should pin the versions for all your dependencies (using a tool like <a href="https://github.com/jazzband/pip-tools">pip-compile</a>).</p>
<p>But you should especially pin the version for ruff. New rules are being added: so without version pinning you tests will just start failing one day soon, and you’ll have to be fixing lint errors on previously-fine code while trying to rush out an unrelated quick fix.</p>
<p>Ruff doesn’t have any dependencies, so just adding</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ruff==0.0.270
</code></pre></div></div>
<p>to your <code class="language-plaintext highlighter-rouge">requirements.txt</code> file will lock things down without contributing to dependency hell.</p>
<p>Every few months, replace <code class="language-plaintext highlighter-rouge">0.0.270</code> with the <a href="https://pypi.org/project/ruff/">latest version on pypy</a> then fix any new issues in one sitting.</p>Given the smörgåsbord of rules and plugins that ruff supports, it’s hard to figure out which settings are good for your average python project.Hong Kong elevation data guide2023-05-25T00:00:00-05:002023-05-25T00:00:00-05:00https://www.gpxz.io/blog/hong-kong-dem-guide<p>There is high-resolution elevation data available for Hong Kong at a 50cm resolution. The data isn’t too difficult to download, and captures many of the country’s recent land reclamation efforts.</p>
<p>If high-resolution and terrain data aren’t needed, the 30m Copernicus dataset is a high-quality global alternative that’s even easy to work with.</p>
<h2 id="high-resolution-datasets">High-resolution datasets</h2>
<p>The Hong Kong government has released four DEMs with nationwide coverage:</p>
<ul>
<li>A 2020 DSM (recording the values of treetops, rooftops, etc)</li>
<li>A 2020 DTM (recording the elevation of the terrain)</li>
<li>A 2010 DSM</li>
<li>A 2010 DTM</li>
</ul>
<p>All datasets are at a 50cm resolution in <a href="https://epsg.io/2326">Hong Kong 1980 Grid System</a> coordinates. The dataset is split into a few thousand tiles, though the 2020 and 2010 datasets have different tiling schemes.</p>
<h2 id="downloading-the-hong-kong-dems">Downloading the Hong Kong DEMs</h2>
<p>The datasets are available to download through the <a href="https://geodata.gov.hk/">Hong Kong Geodata Store</a>.</p>
<p>There’s no option to bulk download all of the actual elevation data. But you can download a GeoJSON file for each dataset, which the tile urls can be extracted:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">json</span>
<span class="n">urls</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"CEDD_DTM_2020_20230410.geojson"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="k">for</span> <span class="n">feature</span> <span class="ow">in</span> <span class="n">json</span><span class="p">.</span><span class="n">load</span><span class="p">(</span><span class="n">f</span><span class="p">)[</span><span class="s">"features"</span><span class="p">]:</span>
<span class="n">urls</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">feature</span><span class="p">[</span><span class="s">"properties"</span><span class="p">][</span><span class="s">"URL"</span><span class="p">])</span>
</code></pre></div></div>
<p>then the tiles can be downloaded individually (with a time delay to avoid overloading their servers):</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">shutil</span>
<span class="kn">import</span> <span class="nn">requests</span>
<span class="kn">import</span> <span class="nn">time</span>
<span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">:</span>
<span class="n">time</span><span class="p">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="k">with</span> <span class="n">requests</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">stream</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">60</span><span class="p">,</span> <span class="n">verify</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">as</span> <span class="n">r</span><span class="p">:</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">url</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">'/'</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">shutil</span><span class="p">.</span><span class="n">copyfileobj</span><span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">raw</span><span class="p">,</span> <span class="n">f</span><span class="p">)</span>
</code></pre></div></div>
<p>The 2020 files have parentheses in their filenames like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>9SE24B(e812n812,e813n813).tif
</code></pre></div></div>
<p>While these are technically valid filenames, they also broke some of the lower-quality software that I typically use to analyse elevation data. I’d recommend renaming them to remove the brackets!</p>
<h2 id="2010-vs-2020-data">2010 vs 2020 data</h2>
<p>At the terrain level, little differs between the 2010 and 2020 data: some noise reduction and processing improvements on the order of a few metres.</p>
<div class="blog-widetainer">
<br />
<a href="/static/blog/img/hk-lantau.png"><img src="/static/blog/img/hk-lantau.png" /></a>
<em class="blog-caption">A 400m square, centred at 22.2707, 113.9537</em>
<br />
<br />
</div>
<p>However, Hong Kong has done a lot of construction since 2010, and as a result the 2020 dataset has much better coverage of these infrastructure projects. Here’s the Chek Lap Kok Link, which was completed in 2020:</p>
<div class="blog-widetainer">
<br />
<a href="/static/blog/img/hk-airport.png"><img src="/static/blog/img/hk-airport.png" /></a>
<em class="blog-caption">A 800m square, centred at 22.3148, 113.9450</em>
<br />
<br />
</div>
<h2 id="dtm-vs-dsm">DTM vs DSM</h2>
<p>The DSM (Digital Surface Model) does a good job at capturing building footprints and canopy height. In the example below you can see the DSM partially picks up the powerlines running SW to NE: very impressive and not something you often see in country-wide elevation datasets! However the representation isn’t perfect (as expected for something this small), so you may want to further process the DSM to remove noise.</p>
<p>The DTM does a good job of removing buildings and getting down to the terrain: I’d call it Analysis Ready.</p>
<div class="blog-widetainer">
<br />
<a href="/static/blog/img/hk-leipui.png"><img src="/static/blog/img/hk-leipui.png" /></a>
<em class="blog-caption">A 700m square, centred at 22.3633, 114.1415</em>
<br />
<br />
</div>
<h2 id="licence">Licence</h2>
<p>Hong Kong elevation data is offered under a <a href="https://geodata.gov.hk/gs/?p=terms_and_conditions">permissive open licence</a>. Commercial use and modification is allowed, and attribution is required.</p>
<h2 id="elevation-api">Elevation API</h2>
<p>GPXZ is an API for high-quality global elevation data. If you need elevation data for Hong Kong, check out our API <a href="/docs">documentation</a> or reach out at <a href="mailto:andrew@gpxz.io">andrew@gpxz.io</a>!</p>There is high-resolution elevation data available for Hong Kong at a 50cm resolution. The data isn’t too difficult to download, and captures many of the country’s recent land reclamation efforts.