<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Service-Enhancements | 2i2c</title><link>https://deploy-preview-609--2i2c-org.netlify.app/category/service-enhancements/</link><atom:link href="https://deploy-preview-609--2i2c-org.netlify.app/category/service-enhancements/index.xml" rel="self" type="application/rss+xml"/><description>Service-Enhancements</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 04 May 2026 00:00:00 +0000</lastBuildDate><image><url>https://deploy-preview-609--2i2c-org.netlify.app/media/sharing.png</url><title>Service-Enhancements</title><link>https://deploy-preview-609--2i2c-org.netlify.app/category/service-enhancements/</link></image><item><title>Protecting our hubs against the CopyFail kernel exploit</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/copyfail-mitigation/</link><pubDate>Mon, 04 May 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/copyfail-mitigation/</guid><description>&lt;p>The recently disclosed
&lt;a href="https://copy.fail/" target="_blank" rel="noopener" >CopyFail Linux kernel zero-day&lt;/a> (CVE-2026-31431) opens up a way for code running inside a container to break out onto the underlying node.
We took a close look at our hubs to confirm whether they were exposed, confirmed that our hubs are likely not at risk, and added another layer of protection just in case.&lt;/p>
&lt;h3 id="are-2i2cs-hubs-at-risk">
Are 2i2c&amp;rsquo;s hubs at risk?
&lt;a class="header-anchor" href="#are-2i2cs-hubs-at-risk">#&lt;/a>
&lt;/h3>&lt;p>No - based on our testing and mitigation efforts, our hubs are not vulnerable to CopyFail.&lt;/p>
&lt;h3 id="why-do-we-think-were-not-at-risk">
Why do we think we&amp;rsquo;re not at risk?
&lt;a class="header-anchor" href="#why-do-we-think-were-not-at-risk">#&lt;/a>
&lt;/h3>&lt;ul>
&lt;li>We tried to reproduce the exploit on a staging hub by following the
&lt;a href="https://github.com/Percivalll/Copy-Fail-CVE-2026-31431-Kubernetes-PoC" target="_blank" rel="noopener" >public Kubernetes proof-of-concept&lt;/a> on both AWS and EKS, and the exploit was unable to break out of the container.&lt;/li>
&lt;li>Existing JupyterHub hardening on Kubernetes from
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/545" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> jupyterhub/kubespawner#545&lt;/a> (originally added by Yuvi in 2021 in response to a different security issue) had already significantly reduced our risk exposure, and the exposure of anyone else running
&lt;a href="https://z2jh.jupyter.org" target="_blank" rel="noopener" >Z2JH&lt;/a> (the standard way to deploy JupyterHub on Kubernetes).&lt;/li>
&lt;li>As an extra layer of protection, we deployed
&lt;a href="https://github.com/iwanhae/copyfail-ebpf-k8s" target="_blank" rel="noopener" >&lt;code>copyfail-ebpf-k8s&lt;/code>&lt;/a> as a daemonset across all of our clusters in
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/8227" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#8227&lt;/a>. This runs on every node and covers all of our hubs (including those on non-commercial cloud infrastructure, like JetStream2). It blocks the specific kernel features that CopyFail depends on. See
&lt;a href="https://github.com/iwanhae/copyfail-ebpf-k8s#quick-start" target="_blank" rel="noopener" >the project&amp;rsquo;s explanation&lt;/a> for how that works.&lt;/li>
&lt;li>We&amp;rsquo;ve upgraded all GKE clusters to use
&lt;a href="https://docs.cloud.google.com/kubernetes-engine/security-bulletins" target="_blank" rel="noopener" >a patched image&lt;/a> in
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/8230" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#8230&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h3 id="what-else-did-we-look-into">
What else did we look into
&lt;a class="header-anchor" href="#what-else-did-we-look-into">#&lt;/a>
&lt;/h3>&lt;ul>
&lt;li>
&lt;a href="https://github.com/deckhouse/d8-copy-fail-mitigation" target="_blank" rel="noopener" >Deckhouse&amp;rsquo;s mitigation&lt;/a> was too platform-specific for us.&lt;/li>
&lt;li>
&lt;a href="https://blog.ovhcloud.com/copy-fail-cve-2026-31431-how-to-rapidly-protect-ovhcloud-mks-clusters-from-the-linux-kernel-zero-day/" target="_blank" rel="noopener" >OVHcloud&amp;rsquo;s &lt;code>modprobe&lt;/code> blocking&lt;/a> likely
&lt;a href="https://github.com/aws/containers-roadmap/issues/2808" target="_blank" rel="noopener" >won&amp;rsquo;t work on Amazon Linux 2023&lt;/a>, since the relevant module is built into the kernel image.&lt;/li>
&lt;li>
&lt;a href="https://alas.aws.amazon.com/alas2023.html" target="_blank" rel="noopener" >AL2023 security advisories&lt;/a> - no patched AL2023 image is available yet, so we can&amp;rsquo;t rely on a kernel-level fix from AWS for now.&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Huge thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/georgiana-dolocan/" >Georgiana&lt;/a> for the deep dive into the exploit and whether we&amp;rsquo;re exposed here.&lt;/li>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/yuvaraj-yuvi/" >Yuvi&lt;/a> for the PR that reduces JupyterHub&amp;rsquo;s exposure to this back in 2021!&lt;/li>
&lt;li>Thanks to
&lt;a href="https://github.com/iwanhae/copyfail-ebpf-k8s" target="_blank" rel="noopener" >iwanhae&lt;/a> for the eBPF daemonset we deployed in Kubernetes, and to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/jupyterhub/" >JupyterHub&lt;/a> for the upstream kubespawner hardening that lowered our exposure.&lt;/li>
&lt;li>Thanks to our collaborators at
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> for the ongoing conversations about hub security.&lt;/li>
&lt;li>Thanks to our collaborators at
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/pythia/" >Pythia&lt;/a> for supporting ongoing work around security in JupyterHub and BinderHub, especially on non-commercial cloud like JetStream.&lt;/li>
&lt;/ul></description></item><item><title>Upgrading community infrastructure to Kubernetes 1.34 and JupyterHub 4.3.3</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/infra-upgrades-k8s-jupyterhub/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/infra-upgrades-k8s-jupyterhub/</guid><description>&lt;p>We&amp;rsquo;ve completed a major round of infrastructure upgrades across all 2i2c-managed hubs - every hub is now running
&lt;a href="https://kubernetes.io/releases/" target="_blank" rel="noopener" >Kubernetes 1.34&lt;/a> and
&lt;a href="https://z2jh.jupyter.org/en/stable/changelog.html" target="_blank" rel="noopener" >Z2JH helm chart 4.3.3&lt;/a>.&lt;/p>
&lt;p>Running up-to-date versions of both Kubernetes and the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/jupyterhub/" >JupyterHub&lt;/a> helm chart ensures that our communities get the best support and reliability, both in terms of features and security.&lt;/p>
&lt;h2 id="a-new-approach-to-infrastructure-upgrades-upgrading-in-rounds">
A new approach to infrastructure upgrades: upgrading in rounds
&lt;a class="header-anchor" href="#a-new-approach-to-infrastructure-upgrades-upgrading-in-rounds">#&lt;/a>
&lt;/h2>&lt;p>This was the first time we rolled out JupyterHub helm chart upgrades &lt;strong>in rounds&lt;/strong> rather than all at once. By upgrading a subset of hubs at a time, we could identify and fix issues in isolation before they affected the broader network. This made the process safer and more predictable.&lt;/p>
&lt;p>We&amp;rsquo;re planning to perform these kinds of upgrades on a regular schedule for our member communities. Around &lt;strong>every 6 months&lt;/strong> we&amp;rsquo;ll create an issue to make sure nothing falls through the cracks (here&amp;rsquo;s
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/main/.github/workflows/recurrent-k8s-gcp-upgrades.yaml" target="_blank" rel="noopener" >example config for creating our reminder issues&lt;/a>).&lt;/p>
&lt;p>Check out our
&lt;a href="https://compass.2i2c.org/services/interactive-computing/multiple-hub-upgrades/#making-changes-to-multiple-hubs" target="_blank" rel="noopener" >process docs for multi-hub upgrades&lt;/a> for more information.&lt;/p>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;p>Check out these pages for what kinds of improvements we&amp;rsquo;ve brought into our clusters / hubs with these latest updates.&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://z2jh.jupyter.org/en/stable/changelog.html" target="_blank" rel="noopener" >Z2JH Helm Chart Changelog&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md" target="_blank" rel="noopener" >Kubernetes 1.34 Changelog&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/georgiana-dolocan/" >Georgiana Dolocan&lt;/a> for leading this upgrade effort and establishing the rounds-based approach.&lt;/li>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/chris-holdgraf/" >Chris Holdgraf&lt;/a> for adapting and editing Georgiana&amp;rsquo;s notes into a blog post.&lt;/li>
&lt;/ul></description></item><item><title>How regularly upgrading core infrastructure leads to upstream improvements and better infrastructure</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/why-upgrade-regularly/</link><pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/why-upgrade-regularly/</guid><description>&lt;p>Our collaborators at
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> recently asked us about the rationale behind policies for upgrading our infrastructure relatively quickly when new versions come out. Here&amp;rsquo;s the explanation that we shared with them, in case it&amp;rsquo;s useful for others as well.&lt;/p>
&lt;p>In this case, the decision was whether to upgrade to Helm 4, and you can find our
&lt;a href="https://github.com/2i2c-org/initiatives/issues/4" target="_blank" rel="noopener" >rationale in the &lt;code>/initiatives&lt;/code> repository&lt;/a>. Here&amp;rsquo;s a brief summary from Yuvi:&lt;/p>
&lt;p>Fundamentally, it helps keep moving us and the ecosystem forward, and drive improvements upstream, in both JupyterHub and Helm.&lt;/p>
&lt;p>It has driven these PRs in
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/jupyterhub/" >JupyterHub&lt;/a>:&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/jupyterhub/action-k3s-helm/pull/126" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> jupyterhub/action-k3s-helm#126&lt;/a> (merged)&lt;/li>
&lt;li>
&lt;a href="https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/3797" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> jupyterhub/zero-to-jupyterhub-k8s#3797&lt;/a> (validated, but not merged yet)&lt;/li>
&lt;/ul>
&lt;p>It&amp;rsquo;s also driven improvements to helm itself - see this bug report that is being worked on:&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/helm/helm/issues/31919" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> helm/helm#31919&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Upgrading helm versions can break things (and it has for some of our other communities in the past - see
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/7886#issuecomment-4031310423" target="_blank" rel="noopener" >this example&lt;/a>). So it&amp;rsquo;s important we do that on a reasonable timeframe and carefully, to avoid disruptions.&lt;/p>
&lt;p>We&amp;rsquo;re also discovering for example that potentially the new &lt;code>nginx-ingress&lt;/code> controller we had to move to has some issues working with older helm versions (ongoing WIP in
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/7995%29" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#7995)&lt;/a>. That feels much more tractable because we can now go &amp;lsquo;ok, let us just apply a quick fix now, and wait for the helm 4 rollout, and try again&amp;rsquo; instead of being totally stuck.&lt;/p>
&lt;p>This is similar to the other part of [/our VEDA objective] - rolling out new versions of jupyterhub. If we need to roll out security fixes, it&amp;rsquo;s much easier now because we already did the hard work of being up to date:&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/7996" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#7996&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>This isn&amp;rsquo;t the case quite yet for helm v3, as it&amp;rsquo;s still supported, but it&amp;rsquo;s much better to do this work earlier than wait.&lt;/p>
&lt;p>If you encounter a bug in a popular open source software, often you can just &amp;lsquo;wait&amp;rsquo; for it to be fixed. But this isn&amp;rsquo;t just about time - someone somewhere has to put in the &lt;em>effort&lt;/em> of getting it fixed, filing helpful upstream bug reports, and testing to make sure it works. This is an example of 2i2c continuing to contribute this &lt;em>effort&lt;/em> upstream wherever we can.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> for collaborating deeply with us on infrastructure questions like this.&lt;/li>
&lt;/ul></description></item><item><title>Enabling CloudBank to safely manage their own cluster infrastructure</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloudbank-self-service/</link><pubDate>Tue, 20 Jan 2026 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloudbank-self-service/</guid><description>&lt;p>We recently enabled
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cloudbank/" >CloudBank&lt;/a> to run Terraform changes for their cluster without needing to wait on 2i2c engineers for each request. They run 50+ hubs for various community colleges, and we want to enable them to self serve as much of that as possible. When we introduced home directory quotas, they were no longer able to set up hubs by themselves without help from 2i2c engineers. Our goal was to empower them to be able to set up new hubs in a safe way while still benefiting from the home directory limits work.&lt;/p>
&lt;figure id="figure-cloudbank-simplifies-cloud-access-for-computer-science-research-and-education">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="CloudBank simplifies cloud access for computer science research and education." srcset="
/blog/cloudbank-self-service/featured_hu47b0024f802a2569dc8459bb45285f77_14544_3e2af71d895a3af46826ba1d224a2bf2.webp 400w,
/blog/cloudbank-self-service/featured_hu47b0024f802a2569dc8459bb45285f77_14544_d054f36eb6161bf5a999ff8a409ac162.webp 760w,
/blog/cloudbank-self-service/featured_hu47b0024f802a2569dc8459bb45285f77_14544_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/cloudbank-self-service/featured_hu47b0024f802a2569dc8459bb45285f77_14544_3e2af71d895a3af46826ba1d224a2bf2.webp"
width="411"
height="88"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
CloudBank simplifies cloud access for computer science research and education.
&lt;/figcaption>&lt;/figure>
&lt;p>To do this safely, we needed to avoid granting access to shared Terraform state that could impact other communities. Following
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/6797#pullrequestreview-3246004031" target="_blank" rel="noopener" >Yuvi&amp;rsquo;s suggestion&lt;/a>, we migrated CloudBank&amp;rsquo;s Terraform state to CloudBank’s own GCP project so that infrastructure changes from the CloudBank team are isolated to their cluster only, making this safe to try. This unblocks CloudBank to run changes like &lt;code>terraform plan&lt;/code> and &lt;code>terraform apply&lt;/code> themselves, meaning that CloudBank can deploy and update a hub without 2i2c engineers in the loop.&lt;/p>
&lt;p>This is a good example of how we aim to balance &lt;strong>community autonomy&lt;/strong> with &lt;strong>infrastructure safety&lt;/strong>. CloudBank can now self-serve routine operations while our broader infrastructure remains protected.&lt;/p>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/6795" target="_blank" rel="noopener" >The infrastructure issue describing this work&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/7339" target="_blank" rel="noopener" >A hub deployed by CloudBank using this workflow&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to Sean Morris and the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cloudbank/" >CloudBank&lt;/a> team at
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/bids/" >UC Berkeley&lt;/a> for collaborating on this workflow.&lt;/li>
&lt;/ul></description></item><item><title>Improving our community hub reliability and stability in Q4 2025</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/infrastructure-reliability-q4-2025/</link><pubDate>Tue, 16 Dec 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/infrastructure-reliability-q4-2025/</guid><description>&lt;p>This year we&amp;rsquo;ve prioritized &lt;strong>making the cloud safe to try&lt;/strong> for our member communities. This has driven work in monitoring, alerting, and automating infrastructure so that we resolve small problems before they become big problems. In the last quarter of 2025, we wrapped up this effort by testing the following hypothesis:&lt;/p>
&lt;blockquote>
&lt;p>We can reduce P1 incidents if we shorten the time to act on current alerts and learnings from prior incidents.&lt;/p>
&lt;/blockquote>
&lt;p>Here&amp;rsquo;s what we accomplished and what we learned.&lt;/p>
&lt;h2 id="what-we-accomplished">
What we accomplished
&lt;a class="header-anchor" href="#what-we-accomplished">#&lt;/a>
&lt;/h2>&lt;p>In short: we&amp;rsquo;re now much more confident in the stability of community infrastructure.
Here&amp;rsquo;s a snapshot of our new incident dashboard, which shows high-level trends for the stability of our infrastructure:&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Dashboard of pagerduty status page for 2i2c" srcset="
/blog/infrastructure-reliability-q4-2025/featured_hu04df3383ec51b90b248012f6472de1e6_185237_a47d9c707f54757cba94700be6c3c216.webp 400w,
/blog/infrastructure-reliability-q4-2025/featured_hu04df3383ec51b90b248012f6472de1e6_185237_a6c12809ca27d3fc4c1c81f7b28ea33a.webp 760w,
/blog/infrastructure-reliability-q4-2025/featured_hu04df3383ec51b90b248012f6472de1e6_185237_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/infrastructure-reliability-q4-2025/featured_hu04df3383ec51b90b248012f6472de1e6_185237_a47d9c707f54757cba94700be6c3c216.webp"
width="760"
height="394"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;em>See the real-time status of our community hubs at
&lt;a href="http://status.2i2c.org" target="_blank" rel="noopener" >status.2i2c.org&lt;/a>&lt;/em>&lt;/p>
&lt;h3 id="we-improved-infrastructure-reliability-for-our-communities">
We improved infrastructure reliability for our communities
&lt;a class="header-anchor" href="#we-improved-infrastructure-reliability-for-our-communities">#&lt;/a>
&lt;/h3>&lt;p>We made several technology and team process improvements that led to these benefits for our communities:&lt;/p>
&lt;ol>
&lt;li>We are now more likely to catch outages before a community reports them to us.&lt;/li>
&lt;li>We are now less likely to have an outage happen more than once, or affect more than one community, because we consistently fix the issues that cause outages.&lt;/li>
&lt;/ol>
&lt;p>We saw a consistent drop in critical alerts that required immediate response:&lt;/p>
&lt;ul>
&lt;li>For August and September we had an average of 7 outages/month (6 from alerts, 1 from community)&lt;/li>
&lt;li>In October, November, and December we had an average of 3 outages/month (9 in October, 0 in November, 1 in December, with only one of these being reported by a community)&lt;/li>
&lt;/ul>
&lt;h3 id="we-became-more-efficient-responsive-and-focused">
We became more efficient, responsive, and focused
&lt;a class="header-anchor" href="#we-became-more-efficient-responsive-and-focused">#&lt;/a>
&lt;/h3>&lt;p>We also got several team benefits from this work:&lt;/p>
&lt;ol>
&lt;li>We get fewer interruptions and distractions from deeper work.&lt;/li>
&lt;li>We have clear assignment policies to make it clear who is responsible for acting in response to alerts.&lt;/li>
&lt;li>We avoid invisible work from falling down rabbit-holes when responding to outages.&lt;/li>
&lt;li>We decreased the stress and pressure of doing upgrades, making them easier to split into sprint items and more likely to get done consistently.&lt;/li>
&lt;/ol>
&lt;h2 id="the-improvements-we-made">
The improvements we made
&lt;a class="header-anchor" href="#the-improvements-we-made">#&lt;/a>
&lt;/h2>
&lt;h3 id="infrastructure-improvements">
Infrastructure improvements
&lt;a class="header-anchor" href="#infrastructure-improvements">#&lt;/a>
&lt;/h3>&lt;ul>
&lt;li>Created a
&lt;a href="http://status.2i2c.org" target="_blank" rel="noopener" >status page for all 2i2c community hubs&lt;/a>, giving our team and communities visibility into the status of our infrastructure.&lt;/li>
&lt;li>Created an alert that triggers when two servers fail to start consecutively in a 30-minute time window.&lt;/li>
&lt;li>Improved deployment infrastructure so that we can roll out sub-chart upgrades to individual clusters, allowing us to roll out major changes in batches.&lt;/li>
&lt;li>Removed our &amp;ldquo;configurator&amp;rdquo; application from community hubs, because it was causing more confusion than it was resolving.&lt;/li>
&lt;li>Allowed servers to start even when users hit their storage quotas.&lt;/li>
&lt;li>Provided a number of upgrades to Kubernetes and the support services that we run alongside each community hub.&lt;/li>
&lt;/ul>
&lt;h3 id="process-improvements">
Process improvements
&lt;a class="header-anchor" href="#process-improvements">#&lt;/a>
&lt;/h3>&lt;ul>
&lt;li>Made a team commitment to prioritize issues from
&lt;a href="https://2i2c.org/incident-reports" target="_blank" rel="noopener" >incident reports&lt;/a> and other stability-related problems.&lt;/li>
&lt;li>Defined incident
&lt;a href="https://infrastructure.2i2c.org/topic/monitoring-alerting/escalation-policies/" target="_blank" rel="noopener" >escalation policies&lt;/a> using the
&lt;a href="http://status.2i2c.org" target="_blank" rel="noopener" >status page&lt;/a> to calibrate the urgency of our response to the severity of incidents.&lt;/li>
&lt;li>Defined &amp;ldquo;on-call&amp;rdquo; procedures so our team knows when and how to be more responsive to outages.&lt;/li>
&lt;li>Time-boxed our alert response process to avoid accidentally falling down rabbit holes for non-urgent problems.&lt;/li>
&lt;li>Created a more reliable process for
&lt;a href="https://infrastructure.2i2c.org/topic/monitoring-alerting/escalation-policies/" target="_blank" rel="noopener" >responding to incidents&lt;/a> and writing
&lt;a href="https://2i2c.org/incident-reports" target="_blank" rel="noopener" >incident reports&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2 id="looking-forward">
Looking forward
&lt;a class="header-anchor" href="#looking-forward">#&lt;/a>
&lt;/h2>&lt;p>After this push around infrastructure reliability, we&amp;rsquo;re significantly more confident in the stability and transparency of our community hub infrastructure. This will deliver better service for our member communities and free up more of our time to engage with them instead of fighting infrastructure fires.&lt;/p>
&lt;p>We will continue to improve our infrastructure, and have a better foundation to do so incrementally in the coming quarters. Here are a few things we&amp;rsquo;d still like to improve:&lt;/p>
&lt;ol>
&lt;li>We still need to improve how reliably we complete follow-up actions from incidents (e.g., writing incident reports). When a process doesn&amp;rsquo;t fit into planning &amp;amp; scoping ceremonies, we struggle to follow it consistently.&lt;/li>
&lt;li>We&amp;rsquo;d like to improve our testing framework for major upgrades across all hubs (e.g., Kubernetes version upgrades) to catch bugs before communities do.&lt;/li>
&lt;/ol>
&lt;h2 id="learn-more">
Learn More
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="http://status.2i2c.org/" target="_blank" rel="noopener" >2i2c Status Page&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://infrastructure.2i2c.org/hub-deployment-guide/runbooks/on-call/" target="_blank" rel="noopener" >On-call procedures documentation&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure" target="_blank" rel="noopener" >Infrastructure repository&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Faster reporting of user home directory sizes</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/faster-home-directory-reporting/</link><pubDate>Tue, 09 Dec 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/faster-home-directory-reporting/</guid><description>&lt;p>Storage quotas help users avoid running out of space unexpectedly and give administrators visibility into capacity planning. However, storage usage can change rapidly, and it&amp;rsquo;s important to have quick information so that administrators know whether they are close to hitting limits.&lt;/p>
&lt;p>We&amp;rsquo;ve improved how quickly hub administrators can see user home directory sizes across our JupyterHubs. This makes monitoring more responsive and adds quota limit visibility that wasn&amp;rsquo;t possible before.&lt;/p>
&lt;h2 id="using-jupyterhub-home-nfs-for-near-instant-disk-usage-metrics">
Using &lt;code>jupyterhub-home-nfs&lt;/code> for near-instant disk usage metrics
&lt;a class="header-anchor" href="#using-jupyterhub-home-nfs-for-near-instant-disk-usage-metrics">#&lt;/a>
&lt;/h2>&lt;p>Our existing storage monitoring tool,
&lt;a href="https://github.com/2i2c-org/prometheus-dirsize-exporter" target="_blank" rel="noopener" >&lt;code>prometheus-dirsize-exporter&lt;/code>&lt;/a>, deliberately runs slowly to avoid excessive disk I/O. This meant home directory metrics could be &lt;strong>hours out of date&lt;/strong> on systems with many users or large directories. Plus, there was no way to report user quota limits at all.&lt;/p>
&lt;p>Our home directory storage is managed by
&lt;a href="https://github.com/2i2c-org/jupyterhub-home-nfs/" target="_blank" rel="noopener" >&lt;code>jupyterhub-home-nfs&lt;/code>&lt;/a>, which enforces per-user quotas. It could also expose usage and limit information as Prometheus metrics using data from the underlying filesystem quota system. Because this information is already tracked by the filesystem, it&amp;rsquo;s available immediately without scanning individual files.&lt;/p>
&lt;p>We made two key improvements:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Make disk usage reporting almost instantaneous&lt;/strong>. We made &lt;code>jupyterhub-home-nfs&lt;/code> export &lt;code>total_size_bytes&lt;/code> and &lt;code>hard_limit_bytes&lt;/code> metrics to Prometheus for near-instant reporting. We used the same metric names and namespace as &lt;code>prometheus-dirsize-exporter&lt;/code> for compatibility. See
&lt;a href="https://github.com/2i2c-org/jupyterhub-home-nfs/pull/76" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/jupyterhub-home-nfs#76&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Allow this to be used upstream in JupyterHub Grafana Dashboards&lt;/strong> so that it can support both types of disk usage reporting. This means users of the upstream
&lt;a href="https://github.com/jupyterhub/grafana-dashboards" target="_blank" rel="noopener" >JupyterHub Grafana dashboards&lt;/a> get the same useful view about home directory usage, regardless of whether the metric comes from &lt;code>prometheus-dirsize-exporter&lt;/code> or &lt;code>jupyterhub-home-nfs&lt;/code>. See
&lt;a href="https://github.com/2i2c-org/prometheus-dirsize-exporter/pull/29" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/prometheus-dirsize-exporter#29&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>These changes were
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/7261" target="_blank" rel="noopener" >deployed across all our communities&lt;/a>, so administrators can now access current home directory information &lt;strong>within minutes&lt;/strong> regardless of directory size.&lt;/p>
&lt;figure id="figure-home-directory-usage-dashboard-showing-total-size-metrics-from-jupyterhub-home-nfs-and-other-data-from-prometheus-dirsize-exporter">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Home Directory Usage dashboard showing total size metrics from jupyterhub-home-nfs and other data from prometheus-dirsize-exporter" srcset="
/blog/faster-home-directory-reporting/featured_hu5e6047328de0a056370b6f6f7ca4f2f4_42503_ededa5ff37780d5501ea74e6e73f6926.webp 400w,
/blog/faster-home-directory-reporting/featured_hu5e6047328de0a056370b6f6f7ca4f2f4_42503_a995b186c4e39c1fd078545f235e8394.webp 760w,
/blog/faster-home-directory-reporting/featured_hu5e6047328de0a056370b6f6f7ca4f2f4_42503_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/faster-home-directory-reporting/featured_hu5e6047328de0a056370b6f6f7ca4f2f4_42503_ededa5ff37780d5501ea74e6e73f6926.webp"
width="760"
height="152"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Home Directory Usage dashboard showing total size metrics from jupyterhub-home-nfs and other data from prometheus-dirsize-exporter
&lt;/figcaption>&lt;/figure>
&lt;h2 id="try-it-out">
Try it out
&lt;a class="header-anchor" href="#try-it-out">#&lt;/a>
&lt;/h2>&lt;p>2i2c member organizations can try this out now. If you have access to your hub&amp;rsquo;s Grafana instance, you can see these new metrics in the &lt;em>Home Directory Usage&lt;/em> dashboard:&lt;/p>
&lt;ol>
&lt;li>Open your hub&amp;rsquo;s
&lt;a href="https://docs.2i2c.org/admin/monitoring/grafana-dashboards/" target="_blank" rel="noopener" >Grafana dashboard&lt;/a>.&lt;/li>
&lt;li>Go to &lt;code>Dashboards&lt;/code> -&amp;gt; &lt;code>JupyterHub Default Dashboards&lt;/code> -&amp;gt; &lt;code>Home Directory Usage&lt;/code>.&lt;/li>
&lt;li>Check the table for up-to-date &lt;em>total size&lt;/em> and &lt;em>quota limit&lt;/em> values.&lt;/li>
&lt;/ol>
&lt;p>For more details, see our
&lt;a href="https://docs.2i2c.org/admin/monitoring/disk-usage/" target="_blank" rel="noopener" >docs on filesystem and disk dashboards&lt;/a>.&lt;/p>
&lt;h2 id="coming-next">
Coming next
&lt;a class="header-anchor" href="#coming-next">#&lt;/a>
&lt;/h2>&lt;p>We&amp;rsquo;d like to build on this work to enable &lt;strong>alerting when individual users near their disk quotas&lt;/strong>. This will make it easier to more reliably track user disk usage across a community. See this issue for tracking:
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/7166" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#7166&lt;/a>&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>This was a directed contribution supported by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> to enable more proactive monitoring and alerting for hub administrators.&lt;/li>
&lt;/ul></description></item><item><title>Adding User Group Insights to Cloud Cost Dashboards with Grafana</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-groups/</link><pubDate>Mon, 24 Nov 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-groups/</guid><description>&lt;p>We are excited to announce that we have extended our cloud cost dashboards to support display costs filtered by user groups using Grafana! This new feature allows administrators to monitor and manage cloud expenses based on user group memberships in JupyterHub.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Group cloud cost dashboard showing cost breakdowns by user groups" srcset="
/blog/cloud-cost-groups/featured_hu34fd6e3a049030056ef3072c1a0427ac_131153_c2b7e8d83fe14bfbc24fc804e952e390.webp 400w,
/blog/cloud-cost-groups/featured_hu34fd6e3a049030056ef3072c1a0427ac_131153_694d12ebbde0c6b897972885357ca71d.webp 760w,
/blog/cloud-cost-groups/featured_hu34fd6e3a049030056ef3072c1a0427ac_131153_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-groups/featured_hu34fd6e3a049030056ef3072c1a0427ac_131153_c2b7e8d83fe14bfbc24fc804e952e390.webp"
width="760"
height="388"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;div class="alert alert-">
&lt;div>
Available for dedicated AWS clusters only (and excluding CloudBank managed accounts). Other deployments on GCP will be supported in the future.
&lt;/div>
&lt;/div>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Take a look at the
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users-groups/#group-cloud-costs" target="_blank" rel="noopener" >Community Hub Guide&lt;/a> to see what&amp;rsquo;s new&lt;/li>
&lt;li>Check out the documentation of the
&lt;a href="https://jupyterhub-cost-monitoring.readthedocs.io/en/latest/" target="_blank" rel="noopener" >2i2c-org/jupyterhub-cost-monitoring&lt;/a> project to see how it all works&lt;/li>
&lt;li>
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/jenny-wong/" >Jenny&lt;/a> recently presented her work on the cost monitoring system at
&lt;a href="https://events.linuxfoundation.org/jupytercon/" target="_blank" rel="noopener" >JupyterCon 2025&lt;/a> earlier this month. Watch a
&lt;a href="https://youtu.be/M5x3bTgRzVs?si=P2c3Ngb8v7f4ks0I" target="_blank" rel="noopener" >video&lt;/a> or look at the
&lt;a href="https://docs.google.com/presentation/d/1N8V7dna1atpRmcbpgZ0-VL5cbOQfwYfXTstudT2ierY/edit?usp=sharing" target="_blank" rel="noopener" >slides&lt;/a>.&lt;/li>
&lt;/ul>
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSff-u-sWFuwO1-VTgk2Ir7f1nfUUlLevQk_Vkk_jnmcI1nJnw/viewform?usp=pp_url&amp;amp;entry.648332035=https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-groups/" target="_blank" rel="noopener" class="text-decoration-none">
&lt;div class="alert alert-info d-flex align-items-start p-3" role="button" style="transition: all 0.2s ease; box-shadow: 0 2px 4px rgba(0,0,0,0.1);" onmouseover="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onmouseout="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'" onfocus="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onblur="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'">
&lt;div class="fw-bold mb-1">&lt;span style="font-weight:bold">Give us feedback!&lt;/span> Click here to provide feedback that will help us make this more impactful.&lt;/div>
&lt;/div>
&lt;/a>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://github.com/sunu" target="_blank" rel="noopener" >Tarashish&lt;/a> @
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/devseed/" >Development Seed&lt;/a> for collaborating on this project with us.&lt;/li>
&lt;li>
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> and the DSE Team at NASA MSFC ODSI for funding much of this work.&lt;/li>
&lt;li>
&lt;a href="https://github.com/kyle-lesinger" target="_blank" rel="noopener" >Kyle Lesinger&lt;/a> from the NASA MSFC Office of Data Science and Informatics for providing valuable feedback and bug reports during development.&lt;/li>
&lt;/ul></description></item><item><title>Enabling transparent cloud cost monitoring with user-level dashboards</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-monitoring/</link><pubDate>Tue, 30 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-monitoring/</guid><description>&lt;p>We are excited to announce that &lt;strong>dashboards to monitor cloud usage and costs at a per-user level&lt;/strong> are now available! See the
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users" target="_blank" rel="noopener" >cost monitoring documentation&lt;/a> for more information.&lt;/p>
&lt;p>A key goal of 2i2c is to make the cloud safe for science. By providing transparent cost monitoring, we give communities the confidence that they won&amp;rsquo;t face unexpected bills and can better understand how their usage patterns translate to cloud costs. This visibility is especially valuable in our shared platform model, where each community gets their own independent hub while benefiting from shared infrastructure expertise.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Cloud cost monitoring dashboard showing user-level usage and cost breakdowns" srcset="
/blog/cloud-cost-monitoring/featured_huf9d99e2c2d9f92e6fb1ca11e35925122_105209_ad88b7920c8a8ef63cb6031d96b8917d.webp 400w,
/blog/cloud-cost-monitoring/featured_huf9d99e2c2d9f92e6fb1ca11e35925122_105209_cb20ec3b020712afb2b8d73e53f13db8.webp 760w,
/blog/cloud-cost-monitoring/featured_huf9d99e2c2d9f92e6fb1ca11e35925122_105209_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-monitoring/featured_huf9d99e2c2d9f92e6fb1ca11e35925122_105209_ad88b7920c8a8ef63cb6031d96b8917d.webp"
width="760"
height="427"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The user-level cost breakdown allows communities to identify individual usage trends and manage their resources more effectively. Communities can now see exactly how their computational work translates to cloud spending, enabling better resource planning and budget management.&lt;/p>
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSff-u-sWFuwO1-VTgk2Ir7f1nfUUlLevQk_Vkk_jnmcI1nJnw/viewform?usp=pp_url&amp;amp;entry.648332035=https://deploy-preview-609--2i2c-org.netlify.app/blog/cloud-cost-monitoring/" target="_blank" rel="noopener" class="text-decoration-none">
&lt;div class="alert alert-info d-flex align-items-start p-3" role="button" style="transition: all 0.2s ease; box-shadow: 0 2px 4px rgba(0,0,0,0.1);" onmouseover="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onmouseout="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'" onfocus="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onblur="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'">
&lt;div class="fw-bold mb-1">&lt;span style="font-weight:bold">Give us feedback!&lt;/span> Click here to provide feedback that will help us make this more impactful.&lt;/div>
&lt;/div>
&lt;/a>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users" target="_blank" rel="noopener" >Cost monitoring documentation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Tarashish @
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/devseed/" >Development Seed&lt;/a> for working on this with us.&lt;/li>
&lt;li>
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> for funding much of this work.&lt;/li>
&lt;li>Andy @
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/openscapes/" >Openscapes&lt;/a>, Alex @ Development Seed and Sarah @
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/earthscope/" >Earthscope&lt;/a> for giving us close feedback.&lt;/li>
&lt;/ul></description></item><item><title>Demonstrating our infrastructure's reliability with a hub status page for our communities</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/status-page/</link><pubDate>Tue, 23 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/status-page/</guid><description>&lt;p>One of 2i2c&amp;rsquo;s goals is to &lt;strong>make the cloud safe for science&lt;/strong>.
A big part of this is making the black box of commercial cloud infrastructure more predictable and reliable for our member communities, across our network of community hubs that all operate autonomously.&lt;/p>
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSff-u-sWFuwO1-VTgk2Ir7f1nfUUlLevQk_Vkk_jnmcI1nJnw/viewform?usp=pp_url&amp;amp;entry.648332035=https://deploy-preview-609--2i2c-org.netlify.app/blog/status-page/" target="_blank" rel="noopener" class="text-decoration-none">
&lt;div class="alert alert-info d-flex align-items-start p-3" role="button" style="transition: all 0.2s ease; box-shadow: 0 2px 4px rgba(0,0,0,0.1);" onmouseover="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onmouseout="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'" onfocus="this.style.backgroundColor='#b3e5fc'; this.style.boxShadow='0 4px 8px rgba(0,0,0,0.15)'; this.style.transform='translateY(-1px)'" onblur="this.style.backgroundColor=''; this.style.boxShadow='0 2px 4px rgba(0,0,0,0.1)'; this.style.transform='translateY(0)'">
&lt;div class="fw-bold mb-1">&lt;span style="font-weight:bold">Give us feedback!&lt;/span> Click here to provide feedback that will help us make this more impactful.&lt;/div>
&lt;/div>
&lt;/a>
&lt;p>To that end, we&amp;rsquo;ve created a &lt;strong>status page for 2i2c&amp;rsquo;s network of community hubs&lt;/strong>. This is a source of truth to provide a high-level picture of the stability of our infrastructure, let a community know if their hub is experiencing a problem, and to give us a heads up when things aren&amp;rsquo;t working as expected. You can check it out at:&lt;/p>
&lt;p>👉
&lt;a href="http://status.2i2c.org" target="_blank" rel="noopener" >&lt;strong>&lt;code>status.2i2c.org&lt;/code>&lt;/strong>&lt;/a>&lt;/p>
&lt;figure id="figure-the-2i2c-status-page-gives-communities-a-high-level-view-of-the-uptime-for-our-entire-network-of-community-hubs">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The 2i2c Status Page gives communities a high-level view of the uptime for our entire network of community hubs." srcset="
/blog/status-page/featured_hudcfdf27a10fbca3598dd77978ccf2720_32254_9ef7032d23926ce36366369d281d2883.webp 400w,
/blog/status-page/featured_hudcfdf27a10fbca3598dd77978ccf2720_32254_5436d606ae4a0e6afa5aa56893ba0c9d.webp 760w,
/blog/status-page/featured_hudcfdf27a10fbca3598dd77978ccf2720_32254_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/status-page/featured_hudcfdf27a10fbca3598dd77978ccf2720_32254_9ef7032d23926ce36366369d281d2883.webp"
width="760"
height="476"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The 2i2c Status Page gives communities a high-level view of the uptime for our entire network of community hubs.
&lt;/figcaption>&lt;/figure>
&lt;p>While we make status more visible, we&amp;rsquo;re also
&lt;a href="https://github.com/2i2c-org/team-compass/pull/1021" target="_blank" rel="noopener" >streamlining our incident response processes&lt;/a> in order to more quickly respond to outages when they occur (ideally, before a community has even noticed!).&lt;/p>
&lt;p>There are still plenty of improvements we&amp;rsquo;d like to make: for example, we&amp;rsquo;re focusing on major outages right now, but would like to extend some level of reporting for &lt;em>degraded&lt;/em> service, like unexpectedly slow start times.&lt;/p>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>👉
&lt;a href="https://2i2c-hubs.trust.pagerduty.com/posts/dashboard" target="_blank" rel="noopener" >The status page&lt;/a>&lt;/li>
&lt;li>👉
&lt;a href="https://docs.2i2c.org/admin/reliability/status-page" target="_blank" rel="noopener" >The status page documentation&lt;/a>&lt;/li>
&lt;li>👉
&lt;a href="https://github.com/2i2c-org/team-compass/pull/1021" target="_blank" rel="noopener" >Our new process for incident response&lt;/a>&lt;/li>
&lt;li>👉 Follow an
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/6417" target="_blank" rel="noopener" >in-progress initiative to improve the reliability of our infrastructure&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Reducing base infrastructure costs on AWS with smarter instance types</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-reduction/</link><pubDate>Wed, 17 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-reduction/</guid><description>&lt;p>We&amp;rsquo;ve been working to reduce the base costs of running our cloud infrastructure on AWS by switching to more efficient instance types for our core nodes. This is the core infrastructure we use to ensure hubs are &amp;ldquo;always available&amp;rdquo; for users, even when no one is actively using a hub. By moving from older &lt;code>r5.xlarge&lt;/code> instances to newer, more efficient &lt;code>r8i-flex.large&lt;/code> instances, we&amp;rsquo;ve significantly reduced these baseline costs while maintaining the same level of service. Here&amp;rsquo;s a plot of daily savings for the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/geojupyter/" >GeoJupyter community&lt;/a>.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="EC2 cost reduction over time for one community" srcset="
/blog/aws-cost-reduction/featured_hu7020069b8cb86380af07e0c76d10ae22_44063_19edd4b646cb3f1a6d166bea2bffbdfb.webp 400w,
/blog/aws-cost-reduction/featured_hu7020069b8cb86380af07e0c76d10ae22_44063_68563ef0e9111d0c57347d0d3d8f5a32.webp 760w,
/blog/aws-cost-reduction/featured_hu7020069b8cb86380af07e0c76d10ae22_44063_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-reduction/featured_hu7020069b8cb86380af07e0c76d10ae22_44063_19edd4b646cb3f1a6d166bea2bffbdfb.webp"
width="760"
height="394"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>The graph above shows the impact on EC2 node costs specifically (this doesn&amp;rsquo;t include the entire cost of always-on infrastructure, but represents a significant portion). We are rolling out this change to all &lt;em>new&lt;/em> clusters, and starting to work through our pre-existing AWS clusters.&lt;/p>
&lt;h2 id="learn-more">
Learn more
&lt;a class="header-anchor" href="#learn-more">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/6721" target="_blank" rel="noopener" >Pull request implementing the instance type changes&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/6756" target="_blank" rel="noopener" >Our rollout plan for existing clusters&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Incident report: UC Merced user throttling during class startup</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/incident-ucmerced-user-throttling/</link><pubDate>Tue, 16 Sep 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/incident-ucmerced-user-throttling/</guid><description>&lt;p>On August 29, 2025 our cloud infrastructure team experienced an incident with the UC Merced community hub when students tried to login simultaneously at the start of class. For more detailed technical information about this incident, see our
&lt;a href="https://github.com/2i2c-org/incident-reports/blob/main/reports/2025-08-29-ucmerced-too-many-users-throttled.pdf" target="_blank" rel="noopener" >full incident report&lt;/a>.&lt;/p>
&lt;h2 id="what-happened">
What happened
&lt;a class="header-anchor" href="#what-happened">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Students experienced issues when trying to login to the hub at the same time during the start of class.&lt;/li>
&lt;li>The concurrent spawn limit was reached quickly due to the large number of users starting up simultaneously.&lt;/li>
&lt;li>New nodes had to be brought up by the autoscaler, which took roughly 10 minutes from start to end.&lt;/li>
&lt;li>Users who tried again after 1 minute weren&amp;rsquo;t guaranteed to get their servers started immediately since new nodes were still spinning up.&lt;/li>
&lt;li>This was an &amp;ldquo;expected&amp;rdquo; scale-up event but the lack of clear messaging caused users to interpret it as instability.&lt;/li>
&lt;/ul>
&lt;h2 id="what-we-learned">
What we learned
&lt;a class="header-anchor" href="#what-we-learned">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>We need better communication so users understand when infrastructure slowness is &amp;ldquo;expected&amp;rdquo; vs. &amp;ldquo;unstable&amp;rdquo;.&lt;/li>
&lt;li>We need better alerting for concurrent user startup throttling - we found out about this issue from users rather than automated monitoring.&lt;/li>
&lt;li>We learned that JupyterHub&amp;rsquo;s metrics don&amp;rsquo;t properly expose &lt;code>429 status&lt;/code> codes in our dashboards.&lt;/li>
&lt;li>This will happen again if we don&amp;rsquo;t have proper scaling limits and node provisioning strategies for sudden user influxes.&lt;/li>
&lt;/ul>
&lt;h2 id="resolution">
Resolution
&lt;a class="header-anchor" href="#resolution">#&lt;/a>
&lt;/h2>&lt;p>We implemented several fixes:&lt;/p>
&lt;ul>
&lt;li>Increased the concurrent spawn limit from 64 to 100.&lt;/li>
&lt;li>Put UC Merced users on larger nodes to reduce the number of node spinups needed. this will cost more in cloud but result in fewer scale-up events.&lt;/li>
&lt;li>Created action items to improve logging, alerting, and monitoring for similar incidents&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to UC Merced students and instructors for reporting the issue through our support system.&lt;/li>
&lt;/ul></description></item><item><title>Our Product and Service goals for Q3 2025</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/q3-goals/</link><pubDate>Tue, 22 Jul 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/q3-goals/</guid><description>&lt;p>As we enter Q3 2025, our focus remains on enabling better cost controls for our communities and increasing flexibility for end-users. In line with our commitment to transparency, we’re sharing our platform and service objectives for the quarter and inviting feedback to ensure our direction reflects what matters most to the communities we serve. See our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/q2-product-goals/" >product goals from the previous quarter here&lt;/a>.&lt;/p>
&lt;p>The themes below offer a high-level snapshot of where we aim to evolve our offerings in the coming months.&lt;/p>
&lt;blockquote>
&lt;p>⭐ Connect with us&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://docs.google.com/forms/u/1/d/1PXGM9_j0nyFLR1FEs7fz9x1vUuhs-J4IvoWGy5r-2aE/edit?fromCopy=true&amp;amp;ct=2" target="_blank" rel="noopener" >Give us feedback&lt;/a> about our direction and how it can improve.&lt;/li>
&lt;li>
&lt;a href="https://forms.fillout.com/t/uQHVMkgvsuus" target="_blank" rel="noopener" >Fund parts of this work&lt;/a> if you&amp;rsquo;re interested in making something happen.&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;h2 id="demonstrable-reliability-of-our-infrastructure">
Demonstrable reliability of our infrastructure
&lt;a class="header-anchor" href="#demonstrable-reliability-of-our-infrastructure">#&lt;/a>
&lt;/h2>&lt;p>Hub management has many moving parts, and things can go wrong. We want tighter control over the reliability of our infrastructure and better visibility into the status of our community hubs for administrators and members. We will take steps to improve &lt;strong>alerting&lt;/strong>, &lt;strong>uptime&lt;/strong>, and &lt;strong>overall platform reliability&lt;/strong>, as well as review our internal &lt;strong>incident response practices&lt;/strong>. Our goal is to improve the reliability and responsivity of our interactive computing hubs for both administrators and users throughout their community hub lifecycle.&lt;/p>
&lt;h2 id="user-level-costs-monitoring-and-group-level-usage-monitoring">
User-level costs monitoring and group-level usage monitoring
&lt;a class="header-anchor" href="#user-level-costs-monitoring-and-group-level-usage-monitoring">#&lt;/a>
&lt;/h2>&lt;p>We&amp;rsquo;ll build on our recent
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-attribution/" >Grafana dashboard usage monitoring&lt;/a> to add &lt;strong>usage monitoring for user groups&lt;/strong> and begin work on &lt;strong>user-level costs monitoring&lt;/strong>. These are steps toward group-level cost tracking, which will enable better management of team and departmental expenses, especially in large institutions.&lt;/p>
&lt;h2 id="improving-our-incident-response-capability">
Improving our Incident Response capability
&lt;a class="header-anchor" href="#improving-our-incident-response-capability">#&lt;/a>
&lt;/h2>&lt;p>As a special-case of doubling down on infrastructure reliability, we&amp;rsquo;re making a concerted effort to improve our incident response processes. This will allow us to respond more reliably and transparently to issues as they arise, and to give our communities confidence about the steps we&amp;rsquo;re taking to resolve them. Our goal is to improve response time and quality, and connect this with infrastructure reliability improvements.&lt;/p>
&lt;h2 id="piloting-a-feature-co-funding-model">
Piloting a feature co-funding model
&lt;a class="header-anchor" href="#piloting-a-feature-co-funding-model">#&lt;/a>
&lt;/h2>&lt;p>This year, 2i2c has experimented with a &lt;strong>collaborative community funding model for platform development&lt;/strong>. This is a way to share funding across communities, and to invite community champions to co-fund projects on the 2i2c roadmap. Our goal is to accelerate upstream development with funded time while minimizing costs for each community. Over the next quarter, we’ll share more details on how the program works, how communities can participate, and what we&amp;rsquo;re learning along the way.&lt;/p>
&lt;p>&lt;strong>Early candidates for community funding:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Compute quotas&lt;/strong>: Building on our user and group management foundation, we aim to let admins assign compute quotas to users or groups, increasing cost control and easing departmental budgeting.&lt;/li>
&lt;li>&lt;strong>Canvas authentication&lt;/strong>: Integrating with Canvas will enable management of users and groups within Canvas, while using the same data for usage and cost tracking.&lt;/li>
&lt;li>&lt;strong>nbgitpuller UX improvements&lt;/strong>: Enhance nbgitpuller for sharing interactive projects via a simple link, focusing on error handling, conflict resolution, and per-user authentication for content retrieval.&lt;/li>
&lt;/ul>
&lt;p>As we develop this initiative, we’ll need to learn how to credit community partners who co-sponsor work and align our roadmap with the interests of the upstream communities we support.&lt;/p>
&lt;p>If you have suggestions for improving this process, or if any features interest your community,
&lt;a href="https://forms.fillout.com/t/uQHVMkgvsuus" target="_blank" rel="noopener" >reach out to us&lt;/a> to discuss joining other 2i2c communities in funding and accelerating their development.&lt;/p>
&lt;h2 id="another-update-coming-in-q4">
Another update coming in Q4
&lt;a class="header-anchor" href="#another-update-coming-in-q4">#&lt;/a>
&lt;/h2>&lt;p>Each quarter, we’ll share an update like this to outline our product priorities and track progress. When planning for Q4, we’ll review what we’ve accomplished and provide a community update. Stay tuned!&lt;/p>
&lt;p>Meanwhile,
&lt;a href="https://docs.google.com/forms/u/1/d/1PXGM9_j0nyFLR1FEs7fz9x1vUuhs-J4IvoWGy5r-2aE/edit?fromCopy=true&amp;amp;ct=2" target="_blank" rel="noopener" >let us know what you think&lt;/a> about our direction. Your feedback helps us provide the best value to our communities. Thank you!&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Our strategic and organization-level work is supported by a grant from
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/navigation/" >The Navigation Fund&lt;/a> and fees from
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/members/" >our member organizations&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Announcing `jupyterhub-groups-exporter`: monitor usage based on JupyterHub group membership with Prometheus and Grafana</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-groups-exporter/</link><pubDate>Wed, 11 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-groups-exporter/</guid><description>&lt;p>Managing user groups in JupyterHub can be a challenging task, especially in environments with dynamic user bases and complex group structures. This post describes how we can leverage the latest group management features in JupyterHub, along with Prometheus and Grafana, to monitor group-level resource usage effectively.&lt;/p>
&lt;blockquote>
&lt;p>⭐ &lt;strong>Members of 2i2c&amp;rsquo;s community network&lt;/strong> can use this feature in their hubs by
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users" target="_blank" rel="noopener" >following our cost attribution documentation&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./featured.png" alt="Grafana User Group Diagnostics Dashboard showing a memory usage over time with each line aggregating usage over a different user group." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="motivation">
Motivation
&lt;a class="header-anchor" href="#motivation">#&lt;/a>
&lt;/h2>&lt;p>Hub admins have a strong impetus to monitor usage and costs by user groups
because it allows them to advocate for better funding and cost recovery models based on data-driven insights. Group-level resource monitoring can help them to answer questions like:&lt;/p>
&lt;ul>
&lt;li>How many people participated in our workshop group?&lt;/li>
&lt;li>How much GPU compute is our power user group using?&lt;/li>
&lt;li>Is our resource usage cost-effective for X group persona or Y group persona?&lt;/li>
&lt;/ul>
&lt;p>Current methods and workarounds include:&lt;/p>
&lt;ul>
&lt;li>ring-fencing resources for specific user groups personas, e.g. creating a separate hub for a workshop group, or creating a separate Dask cluster for a power user group, which increases the admin burden of managing multiple hub instances&lt;/li>
&lt;li>writing custom scripts to aggregate per user metrics, that are already available, into groups – which can be time-consuming and error-prone&lt;/li>
&lt;/ul>
&lt;h2 id="jupyterhub-and-user-groups">
JupyterHub and user groups
&lt;a class="header-anchor" href="#jupyterhub-and-user-groups">#&lt;/a>
&lt;/h2>&lt;p>Recent key developments upstream in JupyterHub for groups management, such as
&lt;a href="https://jupyterhub.readthedocs.io/en/latest/reference/authenticators.html#authenticator-managed-group-membership" target="_blank" rel="noopener" >Authenticator managed group membership&lt;/a>, makes this piece of work a prime and timely opportunity to be tackled. For more technical details of these upstream contributions, see GitHub PRs
&lt;a href="https://github.com/jupyterhub/oauthenticator/pull/735" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> jupyterhub/oauthenticator#735&lt;/a> and
&lt;a href="https://github.com/jupyterhub/oauthenticator/pull/498" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> jupyterhub/oauthenticator#498&lt;/a>.&lt;/p>
&lt;p>Users can access JupyterHub using a variety of authentication methods. Authentication providers like GitHub have built-in user management features that allow admins to create and manage user groups. These groups can then be configured in JupyterHub to authorize access to the hub, as well as control access to certain hardware profiles.&lt;/p>
&lt;p>Following the key upstream contributions above, we can leverage
&lt;a href="https://jupyterhub.readthedocs.io/en/stable/reference/authenticators.html#authenticator-managed-group-membership" target="_blank" rel="noopener" >Authenticator-managed group membership&lt;/a> to automatically pass user group memberships from the authentication layer to JupyterHub itself. This allows us to capitalize on JupyterHub&amp;rsquo;s REST API to retrieve user group memberships from other
&lt;a href="https://jupyterhub.readthedocs.io/en/latest/reference/services.html" target="_blank" rel="noopener" >services&lt;/a>, such as exporting them as Prometheus metrics.&lt;/p>
&lt;h2 id="exporting-user-group-memberships-to-prometheus">
Exporting user group memberships to Prometheus
&lt;a class="header-anchor" href="#exporting-user-group-memberships-to-prometheus">#&lt;/a>
&lt;/h2>&lt;p>The
&lt;a href="https://github.com/2i2c-org/jupyterhub-groups-exporter" target="_blank" rel="noopener" >&lt;code>jupyterhub-groups-exporter&lt;/code>&lt;/a> project provides a
&lt;a href="https://jupyterhub.readthedocs.io/en/latest/reference/services.html" target="_blank" rel="noopener" >service&lt;/a> that integrates with JupyterHub to export user group memberships as Prometheus metrics. This component is readily deployable as part of any JupyterHub instance, such as a standalone deployment or a Zero to JupyterHub deployment on Kubernetes.&lt;/p>
&lt;p>The exporter provides a
&lt;a href="https://prometheus.io/docs/concepts/metric_types/" target="_blank" rel="noopener" >Gauge metric&lt;/a> called &lt;code>jupyterhub_user_group_info&lt;/code>, which contain the following labels:&lt;/p>
&lt;ul>
&lt;li>&lt;code>namespace&lt;/code> – the Kubernetes namespace where the JupyterHub is deployed&lt;/li>
&lt;li>&lt;code>usergroup&lt;/code> – the name of the user group&lt;/li>
&lt;li>&lt;code>username&lt;/code> – the unescaped username of the user&lt;/li>
&lt;li>&lt;code>username_escape&lt;/code> – the escaped username&lt;/li>
&lt;/ul>
&lt;p>Escaped usernames are useful because Kubernetes pods have characterset limits for valid pod label names (this limit does not apply to pod annotations). Storing both types of usernames allows us to join escaped versions with their more human-readable unescaped usernames.&lt;/p>
&lt;p>Exposing this metric as an endpoint for Prometheus to scrape allows us to query and join groups data with a range of usage metrics to gain powerful group-level insights. Here is an example PromQL query that retrieves the memory usage by user group:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-promql" data-lang="promql">&lt;span class="line">&lt;span class="cl">&lt;span class="k">sum&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nv">container_memory_working_set_bytes&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nl">name&lt;/span>&lt;span class="o">!=&lt;/span>&lt;span class="p">&amp;#34;&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">pod&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">jupyter-.*&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">namespace&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">$hub_name&lt;/span>&lt;span class="p">&amp;#34;}&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">on&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">namespace&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">pod&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">group_left&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">usergroup&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">group&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nv">kube_pod_annotations&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nl">namespace&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">$hub_name&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">.*&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">pod&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">jupyter-.*&lt;/span>&lt;span class="p">&amp;#34;}&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">pod&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">namespace&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">on&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">namespace&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">group_left&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">usergroup&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="k">group&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="kr">label_replace&lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">jupyterhub_user_group_info&lt;/span>&lt;span class="p">{&lt;/span>&lt;span class="nl">namespace&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">$hub_name&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">username&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">.*&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nl">usergroup&lt;/span>&lt;span class="o">=~&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">$user_group&lt;/span>&lt;span class="p">&amp;#34;},&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">$1&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">username&lt;/span>&lt;span class="p">&amp;#34;,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="s">(.+)&lt;/span>&lt;span class="p">&amp;#34;&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">annotation_hub_jupyter_org_username&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">usergroup&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">namespace&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">by&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">(&lt;/span>&lt;span class="nv">usergroup&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nv">namespace&lt;/span>&lt;span class="o">)&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;h2 id="visualizing-user-group-resource-usage-with-grafana">
Visualizing user group resource usage with Grafana
&lt;a class="header-anchor" href="#visualizing-user-group-resource-usage-with-grafana">#&lt;/a>
&lt;/h2>&lt;p>The PromQL query above is rather long and complex to construct! However, you can benefit from an
&lt;a href="https://github.com/jupyterhub/grafana-dashboards/pull/149" target="_blank" rel="noopener" >upstream contribution&lt;/a> to the
&lt;a href="https://github.com/jupyterhub/grafana-dashboards" target="_blank" rel="noopener" >jupyterhub/grafana-dashboards&lt;/a> project where we have encapsulated the PromQL queries as Jsonnet code and represented them as Grafana Dashboard visualizations (also known as
&lt;a href="https://grafana.github.io/grafonnet/index.html" target="_blank" rel="noopener" >Grafonnet&lt;/a>). If you have a Kubernetes cluster running JupyterHub, try deploying these Grafana Dashboards and let us know what you think!&lt;/p>
&lt;p>Our particular PromQL query above is visualized in the Grafana Dashboard &lt;strong>User Groups Diagnostics&lt;/strong> under the &lt;strong>Memory Usage&lt;/strong> panel (see also the corresponding screenshot at the top of this post). This is equivalent to its counterpart &lt;strong>User Diagnostics&lt;/strong> dashboard, but with resource usage visualized on a &lt;em>per-group&lt;/em> level rather than a per-user level &amp;#x1f389;&lt;/p>
&lt;h2 id="future-work">
Future work
&lt;a class="header-anchor" href="#future-work">#&lt;/a>
&lt;/h2>&lt;p>We have laid the foundation for joining user group data to usage metrics with Prometheus by extracting memberships from JupyterHub&amp;rsquo;s database. This unlocks potent ways in which observability systems can be extended to group-level reporting and monitoring.&lt;/p>
&lt;p>Future directions for this work include:&lt;/p>
&lt;ul>
&lt;li>visualising cloud cost by user group in Grafana&lt;/li>
&lt;li>developing more group-level reporting and monitoring dashboards&lt;/li>
&lt;li>introducing group-level resource quotas.&lt;/li>
&lt;/ul>
&lt;p>What do you think? How would you like to see JupyterHub&amp;rsquo;s group management features evolve? Have you tried deploying this yourself?
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSff-u-sWFuwO1-VTgk2Ir7f1nfUUlLevQk_Vkk_jnmcI1nJnw/viewform?usp=header" target="_blank" rel="noopener" >We welcome your feedback&lt;/a> and feel free to open GitHub issues or make contributions to the repositories mentioned in this post.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;p>Thanks to the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/jupyterhub/" >JupyterHub project&lt;/a> for their collaboration and review of this work.&lt;/p></description></item><item><title>Solving classes of problems, rather than just an instance of a problem (with an example)</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/automating-support-upgrades/</link><pubDate>Mon, 09 Jun 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/automating-support-upgrades/</guid><description>
&lt;h2 id="the-problem">
The Problem
&lt;a class="header-anchor" href="#the-problem">#&lt;/a>
&lt;/h2>&lt;p>Two of our the communities we serve (
&lt;a href="https://nmfs-openscapes.github.io/" target="_blank" rel="noopener" >NMFS Openscapes&lt;/a> and
&lt;a href="https://book.cryointhecloud.com" target="_blank" rel="noopener" >CryoCloud&lt;/a>) reported issues with starting GPU nodes on their hubs. Upon investigation, I discovered that the
&lt;a href="https://github.com/kubernetes/autoscaler" target="_blank" rel="noopener" >cluster autoscaler&lt;/a> seems to not recognize that GPUs were available in the cluster at all suddenly, and hence wasn&amp;rsquo;t provisioning the nodes. A restart of the cluster-autoscaler pod fixed the issue for both these communities.&lt;/p>
&lt;h2 id="an-incomplete-solution">
An incomplete solution
&lt;a class="header-anchor" href="#an-incomplete-solution">#&lt;/a>
&lt;/h2>&lt;p>But is that the end of the story? Not if we want to provide reliable long term infrastructure to communities with minimal
&lt;a href="https://sre.google/sre-book/eliminating-toil/" target="_blank" rel="noopener" >toil&lt;/a> on the part of 2i2c engineers!&lt;/p>
&lt;p>One of the engineering principles I&amp;rsquo;m trying to have us more intentionally and structurally embody is the idea that we don&amp;rsquo;t fix individual instances of problems, but &lt;strong>whole classes of problems, rather than just an individual instance of the problem&lt;/strong>. Fixing the immediate issue is &lt;em>not enough&lt;/em> - we need to understand what &lt;strong>class of issues&lt;/strong> was manifesting itself in this particular fashion, and fix &lt;em>that&lt;/em>.&lt;/p>
&lt;h2 id="what-was-the-class-of-issues-we-could-fix-here">
What was the &lt;strong>class of issues&lt;/strong> we could fix here?
&lt;a class="header-anchor" href="#what-was-the-class-of-issues-we-could-fix-here">#&lt;/a>
&lt;/h2>&lt;p>Digging in, I realized that our version of cluster-autoscaler was a little behind and not the latest. I &lt;em>presumed&lt;/em> this was a bug in cluster-autoscaler (given a restart fixed it, implying it is a bug about state). To me, the &lt;em>class of problem&lt;/em> here is that we were not rolling out releases to our &amp;ldquo;supporting infrastructure&amp;rdquo; fast enough. Perhaps if we were on the most recent cluster-autoscaler release, this issue would have never happened.&lt;/p>
&lt;p>Additionally, this failure to scale up was reported to us by the community rather than by an automated alert. We should change that too!&lt;/p>
&lt;h2 id="structured-solutions">
Structured solutions
&lt;a class="header-anchor" href="#structured-solutions">#&lt;/a>
&lt;/h2>&lt;p>We follow a two week sprint cycle, and I love the (hard won) structure it provides us. I don&amp;rsquo;t want to arbitrarily start doing work that upsets prior committed work from that structure. However, we also treat support requests seriously and try to work them into the sprint. So I timeboxed myself for one hour, and saw what I could accomplish. Turns out, a lot!&lt;/p>
&lt;ol>
&lt;li>I
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/6183" target="_blank" rel="noopener" >upgraded all our support components&lt;/a>, tested them, and rolled them out to &lt;em>all&lt;/em> our communities! This included upgrading Grafana, Prometheus, nginx-ingress as well as the cluster-autoscaler. This also restarts the cluster-autoscaler across our clusters, fixing this issue for other communities (if any had it).&lt;/li>
&lt;li>I
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/6182" target="_blank" rel="noopener" >re-enabled&lt;/a> the automatic once a month PR for upgrading these support tasks. We had switched to doing them on a manual sprint cadence, but clearly that was not fast enough nor automated enough. We will instead work these into the sprint once the bot opens the PR. Credit to
&lt;a href="https://github.com/consideratio" target="_blank" rel="noopener" >Erik Sundell&lt;/a> for initially setting this up&lt;/li>
&lt;li>Create
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/6185" target="_blank" rel="noopener" >an issue&lt;/a> to track the alert creation, and put it in our sprint backlog.&lt;/li>
&lt;li>(In an additional fifteen minute timebox) Write this blog post, to communicate out both to the affected communities and others what we have done.&lt;/li>
&lt;/ol>
&lt;p>By timeboxing myself, I didn&amp;rsquo;t upset our sprint cadence and was able to continue doing other work I had committed to in the sprint, while also fixing this &lt;em>class of issues&lt;/em> to the best of my ability.&lt;/p>
&lt;h2 id="moving-forward">
Moving forward
&lt;a class="header-anchor" href="#moving-forward">#&lt;/a>
&lt;/h2>&lt;p>While we have been &lt;em>implicitly&lt;/em> trying to solve whole classes of issues rather than individual instances of an issue as a team for a while, I want us to &lt;em>explicitly&lt;/em> do it from now on. Communicating this out to our communities is an important part of that, as is internal team training. This blog post is the former, and we are continually working on the latter :)&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/openscapes/" >OpenScapes&lt;/a> and
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cryocloud/" >CryoCloud&lt;/a> communities for working with us closely on infrastructure to identify improvements like this.&lt;/li>
&lt;/ul></description></item><item><title>Launching Jupyter Book for 2i2c Communities</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jb-for-communities/</link><pubDate>Thu, 08 May 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jb-for-communities/</guid><description>&lt;p>We&amp;rsquo;re excited to announce out-of-the-box support for
&lt;a href="https://next.jupyterbook.org" target="_blank" rel="noopener" >Jupyter Book 2&lt;/a> for our community members. This allows communities to create and share knowledge bases together for their community workflows. This post describes the motivation behind this new functionality, and how you can learn more about the project.&lt;/p>
&lt;blockquote>
&lt;p>⭐ &lt;strong>Members of 2i2c&amp;rsquo;s community network&lt;/strong> can use this feature in their hubs by following
&lt;a href="https://docs.2i2c.org/user/sharing/documentation" target="_blank" rel="noopener" >our documentation and sharing guide&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;p>A core component of our mission to make research and education more &lt;em>impactful&lt;/em>, &lt;em>accessible&lt;/em>, and &lt;em>delightful&lt;/em> is leveraging our unique
&lt;a href="https://2i2c.org/communities/" target="_blank" rel="noopener" >global network of communities&lt;/a> to make meaningful improvements to the open-source tools that power their work. Learning from one community can then provide value to our entire network, e.g.,
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/pace-hackweek/" >our work with PACE on speeding up their CNN model training&lt;/a>.&lt;/p>
&lt;p>Central to our communities&amp;rsquo; work is the importance of sharing new findings, best practices, and community resources. Across our network, we have seen communities creating their own &amp;ldquo;books&amp;rdquo; that provide a home for this kind of content. Many of these books feature the concept of a &amp;ldquo;landing page&amp;rdquo; that welcomes new members, establishes an identity, and provides jumping-off points (or &amp;ldquo;calls to action&amp;rdquo;) to more detailed resources.&lt;/p>
&lt;p>Until now, each community has been required to undertake this work independently. 2i2c believes that by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/community-ownership/" >building upon existing open-source tools&lt;/a> like
&lt;a href="https://next.jupyterbook.org" target="_blank" rel="noopener" >Jupyter Book 2&lt;/a>, we can help communities focus on the &lt;em>content&lt;/em> of their home, rather than spending time worrying about its &lt;em>appearance&lt;/em>. To that end, we have been working on
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/5045" target="_blank" rel="noopener" >an initiative&lt;/a> to allow communities to rapidly build interactive starter documentation and provide users with a rich, interactive, and informative onboarding experience. Through this initiative, we have:&lt;/p>
&lt;ul>
&lt;li>Improved the user experience of launching into interactive compute environments from a Jupyter Book.&lt;/li>
&lt;li>Built components into the Jupyter Book &amp;ldquo;book theme&amp;rdquo; for low-density landing page content like call-to-action blocks.&lt;/li>
&lt;li>Extended our service to co-locate community documentation alongside community hubs (i.e., &lt;code>docs.hub.2i2c.cloud&lt;/code>).&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./landing-page.png" alt="Screenshot of the 2i2c Showcase Hub landing page" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
(A screenshot of the 2i2c
&lt;a href="https://docs.showcase.2i2c.cloud/" target="_blank" rel="noopener" >Showcase Hub&lt;/a> landing page, featuring a simple banner image and call-to-action.)&lt;/p>
&lt;p>To take advantage of this feature, communities can use the
&lt;a href="https://github.com/2i2c-org/community-docs-template" target="_blank" rel="noopener" >&lt;code>2i2c-org/community-docs-template&lt;/code>&lt;/a> to deploy a Jupyter Book site to GitHub Pages. This template demonstrates simple usage of Jupyter Book 2 for computational content and landing page creation, and establishes the necessary CD workflows for web publication. Meanwhile, 2i2c can update our domain name management to point the &lt;code>docs.hub.2i2c.cloud&lt;/code> nested subdomain to the newly deployed documentation.&lt;/p>
&lt;p>For more information, see
&lt;a href="https://docs.2i2c.org/user/sharing/documentation" target="_blank" rel="noopener" >our community documentation for deploying Jupyter Books&lt;/a>.&lt;/p>
&lt;p>Developing these new capabilities taught us a lot about what makes building &amp;ldquo;good&amp;rdquo; community documentation so difficult. A wide range of bespoke website-building tools and integration quirks previously made it challenging for communities to both keep documentation current with internal changes and keep up with necessary software updates. We also learned that by trading bespoke complexity for simplicity and readability, we could build a solution that scales to multiple communities, with a consequently reduced maintenance burden.&lt;/p>
&lt;p>With these improvements, we have initiated a conversation about what a more unified &amp;ldquo;look and feel&amp;rdquo; for our network might entail, and how it might benefit our communities. Much more can be done to build on this first step, and we are eager to gather feedback on how to improve these features for users.&lt;/p>
&lt;p>To learn more about this work, consider exploring a minimal example on
&lt;a href="https://docs.showcase.2i2c.cloud/" target="_blank" rel="noopener" >our Showcase Hub&lt;/a>, and check out
&lt;a href="https://docs.2i2c.org/user/sharing/documentation" target="_blank" rel="noopener" >our service guide&lt;/a>.
&lt;a href="https://docs.google.com/forms/d/e/1FAIpQLSff-u-sWFuwO1-VTgk2Ir7f1nfUUlLevQk_Vkk_jnmcI1nJnw/viewform" target="_blank" rel="noopener" >Let us know&lt;/a> what you think!&lt;/p></description></item><item><title>Offering Jetstream2-powered hub support at 2i2c</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jetstream2-persistent-hub/</link><pubDate>Mon, 28 Apr 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jetstream2-persistent-hub/</guid><description>&lt;p>When we first committed to offer
&lt;a href="https://jetstream-cloud.org/index.html" target="_blank" rel="noopener" >Jetstream2&lt;/a> support at 2i2c, Jetstream2,
&lt;a href="https://docs.openstack.org/magnum/latest/" target="_blank" rel="noopener" >Magnum&lt;/a>,
&lt;a href="https://www.openstack.org/" target="_blank" rel="noopener" >OpenStack&lt;/a>,
&lt;a href="https://cluster-api.sigs.k8s.io/" target="_blank" rel="noopener" >ClusterAPI&lt;/a> were all new concepts that we hadn&amp;rsquo;t used at 2i2c before.
And although the initial exercise of reading about each of them independently was confusing, learning how they actually glued together was the key.
This post is about Jetstream2, 2i2c persistent hub offerings, and the learning that took place in the process.&lt;/p>
&lt;blockquote>
&lt;p>⭐ &lt;strong>Members of 2i2c&amp;rsquo;s community network&lt;/strong> can determine their eligibility and learn about JetStream2 in
&lt;a href="https://docs.2i2c.org/community-lead/about/cloud-providers#jetstream2" target="_blank" rel="noopener" >our supported cloud providers documentation&lt;/a>. If needed,
&lt;a href="https://docs.2i2c.org/support" target="_blank" rel="noopener" >reach out to 2i2c for support&lt;/a>.&lt;/p>
&lt;/blockquote>
&lt;h2 id="context">
Context
&lt;a class="header-anchor" href="#context">#&lt;/a>
&lt;/h2>&lt;p>At 2i2c, we want to be able to deploy k8s clusters on different cloud providers. In a very simplistic way, for this we use:&lt;/p>
&lt;ul>
&lt;li>&lt;code>Infrastructure as code&lt;/code> to describe, deploy and manage the actual physical infrastructure from the cloud providers&lt;/li>
&lt;li>Cloud specific CLI to authenticate to this infrastructure&lt;/li>
&lt;li>
&lt;a href="https://helm.sh/" target="_blank" rel="noopener" >&lt;code>Helm&lt;/code>&lt;/a> to deploy and manage k8s resources onto this infrastructure&lt;/li>
&lt;li>And finally
&lt;a href="https://kubernetes.io/docs/reference/kubectl/" target="_blank" rel="noopener" >&lt;code>kubectl&lt;/code>&lt;/a> to interact with all of these k8s resources&lt;/li>
&lt;/ul>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./2i2c-generic-infra.png" alt="image" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
(Main tools used at 2i2c to deploy and manage k8s clusters on different cloud providers)&lt;/p>
&lt;p>On cloud providers like GCP, AWS, Azure, the Kubernetes support feels like an atomic feature of the cloud provider and works out of the box. But on Jetstream2, k8s support is not such a solid feature anymore.&lt;/p>
&lt;h2 id="jetstream2-kubernetes-support-stack">
Jetstream2 Kubernetes support stack
&lt;a class="header-anchor" href="#jetstream2-kubernetes-support-stack">#&lt;/a>
&lt;/h2>&lt;p>Jetstream2 is a collection of supercomputers that are part of the
&lt;a href="https://access-ci.org/" target="_blank" rel="noopener" >ACCESS cyberinfrastructure&lt;/a>. This ACCESS infrastructure groups together super computers like Jetstream2 (but not limited to it), into a mesh that creates the impression of a single, virtual system that scientists can openly access and interactively use.&lt;/p>
&lt;p>It offers Infrastructure as a Service (IaaS), that allows users to deploy VMs and manage environments dynamically. And the piece that enables this Infrastructure as a Service feature is OpenStack.&lt;/p>
&lt;h3 id="openstack-and-magnum">
OpenStack and Magnum
&lt;a class="header-anchor" href="#openstack-and-magnum">#&lt;/a>
&lt;/h3>&lt;p>OpenStack is an open source platform made of multiple projects that help build and manage both private and public cloud infrastructure.&lt;/p>
&lt;p>For our use-case, one of the most relevant OpenStack sub-project is Magnum. Magnum offers container orchestration engines for deploying and managing containers, like Kubernetes, but not limited to it.&lt;/p>
&lt;p>Initially, Kubernetes support was provided through a project called
&lt;a href="https://wiki.openstack.org/wiki/Heat" target="_blank" rel="noopener" >HEAT&lt;/a>. However that has proven harder to manage and maintain, and it was extremely hard to upgrade a cluster. So, they’ve migrated towards a new driver called
&lt;a href="https://docs.openstack.org/magnum-capi-helm/latest/user_docs/index.html" target="_blank" rel="noopener" >Cluster API magnum driver&lt;/a>, which offers a more native k8s integration.&lt;/p>
&lt;h3 id="cluster-api-and-capi-helm-driver">
Cluster API and CAPI helm driver
&lt;a class="header-anchor" href="#cluster-api-and-capi-helm-driver">#&lt;/a>
&lt;/h3>&lt;p>CAPI itself is k8s project that allows declaring k8s clusters in an easy way.&lt;/p>
&lt;p>The helm driver on the other hand is what acts like a bridge between OpenStack’s Magnum and Kubernetes’ Cluster API (CAPI). Its main goal is to to manage the lifecycle (create, scale, upgrade, destroy) of Kubernetes-conformant clusters using a declarative API.&lt;/p>
&lt;p>In order to do this, Cluster API provides an API for being able to manage the various components of a Kubernetes cluster. This conceptually looks like a Kubernetes cluster managing other Kubernetes clusters; the former, named the ‘CAPI management cluster’, is the one providing the API for managing the latter workload clusters.&lt;/p>
&lt;h3 id="decomposing-the-previous-atomic-feature">
Decomposing the previous atomic feature
&lt;a class="header-anchor" href="#decomposing-the-previous-atomic-feature">#&lt;/a>
&lt;/h3>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./Jetstream2-and-tent.png" alt="image" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
(Comparison between Jetstream2 and other cloud providers when it comes to k8s support)&lt;/p>
&lt;p>Magnum is part of the OpenStack tent and it’s the first layer on top of Jetstream2 towards achieving k8s support.&lt;/p>
&lt;p>The CAPI helm driver is what’s offering CAPI support. This is the last piece that’s needed to link a k8s cluster down to the hardware where it’s deployed, on Jetstream2.&lt;/p>
&lt;h2 id="challenges">
Challenges
&lt;a class="header-anchor" href="#challenges">#&lt;/a>
&lt;/h2>&lt;p>The Jetstream2-OpenStack stack is not a simple one. It’s a complex stack of technologies and each of the connection points can be challenging to debug and fix when something doesn&amp;rsquo;t work. Especially when you are one of the first ones that pilots this new magnum driver setup.&lt;/p>
&lt;p>So, it was expected that we faced some issues along the way. However, we were able to go around them and add Jetstream2 to our service menu. Below is a list of some of the issues that we faced:&lt;/p>
&lt;ol>
&lt;li>We have to create terraform resource in sequence which takes longer because of a race condition that makes concurrent nodegroups creation requests to fail&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>
&lt;a href="https://bugs.launchpad.net/magnum/&amp;#43;bug/2097946" target="_blank" rel="noopener" >bugs.launchpad.net/magnum/+bug/2097946&lt;/a>&lt;/li>
&lt;/ul>
&lt;ol start="2">
&lt;li>The role and labels of the nodegroups don&amp;rsquo;t get propagated to the actual nodes, so we cannot put our own labels on nodes at once&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/azimuth-cloud/capi-helm-charts/issues/84" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> azimuth-cloud/capi-helm-charts#84&lt;/a>&lt;/li>
&lt;/ul>
&lt;ol start="3">
&lt;li>The node count and min node count cannot be set to 0 and each nodegroup has to have at least 1 node&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>
&lt;a href="https://bugs.launchpad.net/magnum/&amp;#43;bug/2098002" target="_blank" rel="noopener" >bugs.launchpad.net/magnum/+bug/2098002&lt;/a>&lt;/li>
&lt;/ul>
&lt;ol start="4">
&lt;li>A default-worker is created apart from the default-control plane nodegroup and we cannot delete it due to the same issue as in 2.&lt;/li>
&lt;li>Latest CAPI helm chart version causes autoscaling to stop working in a persistent hub setup, so we had to downgrade it to a previous version&lt;/li>
&lt;/ol>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/5601" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/infrastructure#5601&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion">
Conclusion
&lt;a class="header-anchor" href="#conclusion">#&lt;/a>
&lt;/h2>&lt;p>The biggest plus, is the people. We got support from
&lt;a href="https://github.com/julianpistorius" target="_blank" rel="noopener" >Julian Pistorius&lt;/a>, which has helped us a lot to both fix and validate some of the behaviours we were experiencing. Also, going through the
&lt;a href="https://jetstream-cloud.org/contact/index.html" target="_blank" rel="noopener" >Jetstream2 support process&lt;/a> was also a pleasant experience because they were super prompt in answering and they were very nice.&lt;/p>
&lt;p>Jetstream2 has a big plus over the other cloud providers with its openness thought the ACCESS program. This is something very handy to researchers and less costly than other cloud providers. 2i2c being able to offer hubs though this ACCESS program makes things more accessible to more researchers and more cost efficient.&lt;/p>
&lt;p>Higher complexity comes also with more control over the infrastructure which has its advantages.&lt;/p>
&lt;p>Leaving the challenges apart, the experience was a nice one and the outcome was positive -&amp;gt; 2i2c is now able to deploy both mybinder.org-like hubs as well as persistent storage hubs on Jetstream2 hardware, from the same cloud-agnostic infrastructure.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;p>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/pythia/" >Project Pythia&lt;/a> for funding and collaborating with us on this work.&lt;/p></description></item><item><title>Enforcing per-user storage quotas now available on GCP</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/per-user-storage-quota-gcp/</link><pubDate>Tue, 25 Feb 2025 14:18:04 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/per-user-storage-quota-gcp/</guid><description>&lt;p>Building upon our previous work developing
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/per-user-storage-quota/" >per-user storage quotas for our AWS infrastructure&lt;/a>, we are pleased to announce that this feature is now available for GCP-hosted hubs!&lt;/p>
&lt;p>To provide this feature on this vendor, we have updated our infrastructure provisioning system to create persistent disks, and enable automatic backups of the disk for disaster recovery purposes. However, the systems we had already developed for AWS, such as
&lt;a href="https://github.com/2i2c-org/jupyterhub-home-nfs" target="_blank" rel="noopener" >&lt;code>jupyterhub-home-nfs&lt;/code>&lt;/a> and our alerting system through
&lt;a href="https://prometheus.io/docs/alerting/latest/alertmanager/" target="_blank" rel="noopener" >Prometheus Alertmanager&lt;/a>, are vendor agnostic and work right out of the box with the new architecture!&lt;/p>
&lt;p>If you would like to try this feature on your 2i2c-managed JupyterHub,
&lt;a href="https://docs.2i2c.org/support" target="_blank" rel="noopener" >please get in touch&lt;/a>.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;p>This project was developed and deployed in collaboration with
&lt;a href="https://developmentseed.org/team/tarashish-mishra/" target="_blank" rel="noopener" >Tarashish Mishra&lt;/a> from
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/devseed/" >Development Seed&lt;/a>, funded through the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA project&lt;/a>.&lt;/p></description></item><item><title>Open infrastructure for collaborative geoscience with Project Pythia: Learning how to deploy a BinderHub on Jetstream2</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jetstream-binderhub/</link><pubDate>Wed, 12 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jetstream-binderhub/</guid><description>
&lt;h2 id="project-pythia-and-the-jupyter-notebook-obsolescence-problem">
Project Pythia and the &amp;ldquo;Jupyter notebook obsolescence&amp;rdquo; problem
&lt;a class="header-anchor" href="#project-pythia-and-the-jupyter-notebook-obsolescence-problem">#&lt;/a>
&lt;/h2>&lt;p>
&lt;a href="https://projectpythia.org/" target="_blank" rel="noopener" >Project Pythia&lt;/a> provides educational resources for essential software tools that enable open, reproducible and scalable geoscience, such as the
&lt;a href="https://pangeo.io" target="_blank" rel="noopener" >Pangeo&lt;/a> stack of packages (Xarray, Dask, Jupyter). Their &lt;em>Cookbooks&lt;/em> are crowdsourced, community-curated, and open-source collections of Jupyter notebooks that demonstrate how to use these tools for cloud-native, geoscientific workflows (see our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/contentblog/2024/project-pythia-cookoff/index.md" >Project Pythia Cookoff&lt;/a> blog post). However, &amp;ldquo;Jupyter notebook obsolescence&amp;rdquo; is a common problem: tutorials that were created a few years ago may no longer work due to changes in the software ecosystem and hampers the reproducibility of scientific results. A reproducible execution environment and the infrastructure to support it are essential for the long-term sustainability of these educational resources.&lt;/p>
&lt;h2 id="leveraging-nsf-funded-cyberinfrastructure-for-binderhub">
Leveraging NSF-funded cyberinfrastructure for BinderHub
&lt;a class="header-anchor" href="#leveraging-nsf-funded-cyberinfrastructure-for-binderhub">#&lt;/a>
&lt;/h2>&lt;p>A
&lt;a href="https://binderhub.readthedocs.io/en/latest/" target="_blank" rel="noopener" >BinderHub&lt;/a> allows users to dynamically create custom computing environments from
&lt;a href="https://mybinder.readthedocs.io/en/latest/introduction.html#what-is-a-binder" target="_blank" rel="noopener" >Binder-ready&lt;/a> repositories containing computational notebooks and configuration files that describe the software environment required to run them. A public Binder service exists at
&lt;a href="https://mybinder.org/" target="_blank" rel="noopener" >mybinder.org&lt;/a> (see our blog post about
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/binder-singlenode/" >joining the mybinder federation&lt;/a> 🎉) and is a successful example of how open cloud infrastructure can accommodate reproducible execution environments.&lt;/p>
&lt;p>The resources available on such a public service are limited therefore 2i2c, together with Project Pythia, have been exploring how to deploy a BinderHub backed by larger resources from the NSF-funded cloud computing platform
&lt;a href="https://jetstream-cloud.org/" target="_blank" rel="noopener" >Jetstream2&lt;/a>. This allows for larger simultaneous user loads, such as at workshops, as well as access to more powerful distributed and parallelized workflows required to process large geoscientific datasets, under a persistent resource allocation.&lt;/p>
&lt;h2 id="learning-how-to-deploy-on-openstack">
Learning how to deploy on OpenStack
&lt;a class="header-anchor" href="#learning-how-to-deploy-on-openstack">#&lt;/a>
&lt;/h2>&lt;p>Jetstream2 uses
&lt;a href="https://www.openstack.org" target="_blank" rel="noopener" >OpenStack&lt;/a> in order to manage pools of compute, storage and networking resources, and for our purposes we specifically make use of OpenStack
&lt;a href="https://docs.openstack.org/magnum/latest/" target="_blank" rel="noopener" >Magnum&lt;/a>
&lt;a href="https://specs.openstack.org/openstack//magnum-specs/specs/bobcat/clusterapi-driver.html" target="_blank" rel="noopener" >Cluster API driver&lt;/a> to manage Kubernetes for our deployment.&lt;/p>
&lt;p>Cluster API needs a &lt;code>CAPI management cluster&lt;/code> in order to manage other Kubernetes clusters, called workload clusters. On Jetstream2, this management cluster is gracefully created and operated by the Jetstream2 team, which means that the only task to worry about is creating and configuring the workload cluster.&lt;/p>
&lt;p>For the workload cluster we used the
&lt;a href="https://registry.terraform.io/providers/terraform-provider-openstack/openstack/latest/docs" target="_blank" rel="noopener" >Openstack Terraform provider&lt;/a> to define the cluster template, the cluster itself and the node groups in a reproducible way.&lt;/p>
&lt;p>After the cluster infrastructure was successfully created on Jetstream2, thanks to the 2i2c hub infrastructure being cloud agnostic as well, deploying BinderHub to Jetstream2, was a seamless experience and it was no different than on other cloud providers that we already supported.&lt;/p>
&lt;p>We also learnt about some limitations of the Openstack Magnum driver project, which were expected given it being a relatively recent project, slowly being adopted, but they were all reported upstream and hopefully will soon be fixed.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://jetstream-cloud.org/" target="_blank" rel="noopener" >Jetstream2&lt;/a>: Explore ACCESS allocation and Julian Pistorius for technical support&lt;/li>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/pythia/" >Project Pythia&lt;/a> for funding and collaborating with us on this work.&lt;/li>
&lt;li>
&lt;a href="https://www.zonca.dev/posts/2024-12-11-jetstream_kubernetes_magnum" target="_blank" rel="noopener" >Andrea Zonca&lt;/a> for preliminary work on Kubernetes deployments on Jetstream 2&lt;/li>
&lt;/ul></description></item><item><title>Towards frictionless, portable, and sustainable reproducibility with Binder</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/frictionless-reproducibility/</link><pubDate>Mon, 10 Feb 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/frictionless-reproducibility/</guid><description>&lt;p>Last December I had an opportunity to discuss the current and future state of the open publishing ecosystem at a workshop hosted by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/hhmi/" >HHMI&lt;/a>&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>. While 2i2c doesn&amp;rsquo;t primarily focus on &amp;ldquo;publishing&amp;rdquo; workflows, we do support communities on a journey that often &lt;em>leads to publishing&lt;/em>, and we make choices about technology in our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/platform/" >community hub platform&lt;/a> that can support different kinds of publishing outcomes.&lt;/p>
&lt;p>After listening to folks across the open science and publishing ecosystem, I noticed a common challenge:&lt;/p>
&lt;ul>
&lt;li>Publishers care about &lt;strong>reproducibility&lt;/strong> of computational narratives and the &lt;strong>interactivity&lt;/strong> that computation can provide.&lt;/li>
&lt;li>But they &lt;strong>lack the capacity to manage computational infrastructure&lt;/strong> in a way that is flexible enough for all of their authors.&lt;/li>
&lt;/ul>
&lt;p>This post is a reflection on how ecosystems like Jupyter and managed community hubs could solve some of these challenges.&lt;/p>
&lt;h2 id="a-community-experiment-to-provide-reproducible-environments-for-published-pre-prints">
A community experiment to provide reproducible environments for published pre-prints
&lt;a class="header-anchor" href="#a-community-experiment-to-provide-reproducible-environments-for-published-pre-prints">#&lt;/a>
&lt;/h2>&lt;p>Many of 2i2c&amp;rsquo;s communities already care about reproducibility and sharing their computational narratives. That&amp;rsquo;s one reason that we&amp;rsquo;ve
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/" >been improving reproducible environment sharing with Binder&lt;/a>,
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/project-pythia-cookoff/" >integrating Jupyter Book into our community cloud platform&lt;/a>, and
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/binder-singlenode/" >supporting the mybinder.org federation&lt;/a>.&lt;/p>
&lt;p>However, communities often want to &lt;strong>publish&lt;/strong> rather than just &lt;strong>share&lt;/strong>. Publishing is more structured, invites particular kinds of feedback, and requires more Quality Assurance. There&amp;rsquo;s a huge ecosystem of publishers and services that support formal publishing, and they ensure things like discoverability, long-term archivability, versioning, peer review, DOI referencing, etc.&lt;/p>
&lt;p>We recently piloted
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/hhmi-spyglass-mysql/" >running a BinderHub for Biorxiv publications with the Loren Frank lab and HHMI&lt;/a>, and found this to be a nice proof-of-concept. While the &amp;ldquo;published article&amp;rdquo; lives on
&lt;a href="https://www.biorxiv.org/" target="_blank" rel="noopener" >Biorxiv&lt;/a>, the computational infrastructure and environment is provided by a BinderHub (in this case managed by 2i2c, but anybody could manage a hub in this way).&lt;/p>
&lt;figure id="figure-the-biorxiv-binder-pilot-workflow-an-author-used-a-2i2c-managed-binderhub-to-generate-a-reproducible-environment-for-their-paper-they-included-a-link-to-this-environment-in-the-abstract-readers-could-click-this-link-and-be-taken-to-a-fully-interactive-environment-to-explore-the-ideas-in-the-paper-and-reproduce-its-computation">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="The Biorxiv Binder pilot workflow. An author used a 2i2c-managed BinderHub to generate a reproducible environment for their paper. They included a link to this environment in the abstract. Readers could click this link, and be taken to a fully interactive environment to explore the ideas in the paper and reproduce its computation." srcset="
/blog/frictionless-reproducibility/images/reproduce-biorxiv_hu4a264c61637efbf8d7f2942ea4842e16_81975_a42b91103eedf29a9950ce3ecc9b2396.webp 400w,
/blog/frictionless-reproducibility/images/reproduce-biorxiv_hu4a264c61637efbf8d7f2942ea4842e16_81975_846aca88b6b4c1f715e97860fd4515d0.webp 760w,
/blog/frictionless-reproducibility/images/reproduce-biorxiv_hu4a264c61637efbf8d7f2942ea4842e16_81975_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/frictionless-reproducibility/images/reproduce-biorxiv_hu4a264c61637efbf8d7f2942ea4842e16_81975_a42b91103eedf29a9950ce3ecc9b2396.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The Biorxiv Binder pilot workflow. An author used a 2i2c-managed BinderHub to generate a reproducible environment for their paper. They included a link to this environment in the abstract. Readers could click this link, and be taken to a fully interactive environment to explore the ideas in the paper and reproduce its computation.
&lt;/figcaption>&lt;/figure>
&lt;p>This was a nice proof-of-concept, though I think that broader adoption of this type of workflow would require a deeper connection between publisher workflows and open source communities.&lt;/p>
&lt;h2 id="could-we-enable-communities-to-bring-their-computational-environments-with-them-when-publishing">
Could we enable communities to &amp;ldquo;bring their computational environments with them&amp;rdquo; when publishing?
&lt;a class="header-anchor" href="#could-we-enable-communities-to-bring-their-computational-environments-with-them-when-publishing">#&lt;/a>
&lt;/h2>&lt;p>Currently, a community&amp;rsquo;s computational environment and data are &lt;em>not accessible to publishers&lt;/em>. Could we relax this by allowing JupyterHub and Binder to be re-used by external services like publishers? This could allow a community&amp;rsquo;s hub to act like an &lt;strong>external service for reproducibility&lt;/strong> that could be used by one of the many publishing platforms out there. It would require making improvements around a few different areas of Jupyter:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>JupyterHub&lt;/strong> and &lt;strong>BinderHub&lt;/strong> would need improved workflows around external authentication so that services could easily request kernels from a hub.&lt;/li>
&lt;li>&lt;strong>Jupyter Book&lt;/strong> and &lt;strong>MyST&lt;/strong> would need the ability to power computation on a page from a variety of computational back-ends, potentially defined by a user (e.g., via
&lt;a href="https://thebe.readthedocs.io/en/stable/" target="_blank" rel="noopener" >Thebe&lt;/a> and some UI design in Jupyter Book).&lt;/li>
&lt;li>&lt;strong>Jupyter Lab&lt;/strong> and other user interfaces may need improved ways for defining and sharing their environments and their content to use tools like Jupyter Book and Binder.&lt;/li>
&lt;/ul>
&lt;p>We&amp;rsquo;d also need &lt;strong>integration work&lt;/strong> for the various publishers to leverage this technology for their infrastructure. This is a significant lift - a lot of publishers use &lt;em>very old&lt;/em> and bespoke technology in their systems. However, there&amp;rsquo;s also hope that a subset of the publishing ecosystem is ready to try things like this.&lt;/p>
&lt;h2 id="there-are-many-publishing-organizations-innovating-with-open-source">
There are many publishing organizations innovating with open source
&lt;a class="header-anchor" href="#there-are-many-publishing-organizations-innovating-with-open-source">#&lt;/a>
&lt;/h2>&lt;p>I learned that &lt;strong>there&amp;rsquo;s a lot of interest in innovating around publishing workflows&lt;/strong>, as well as &lt;strong>building on top of open source communities and standards&lt;/strong>. We don&amp;rsquo;t need the whole industry to move at once (it won&amp;rsquo;t), but we do need a critical mass of organizations who are interested in innovating. This might be possible with more publishing-focused products that integrate heavily with open source.&lt;/p>
&lt;p>For example,
&lt;a href="https://curvenote.com" target="_blank" rel="noopener" >Curvenote&lt;/a> is a publishing and communication platform that builds heavily on top of the Jupyter and MyST ecosystems. They co-lead many of the open source projects they use in their platform. Curvenote builds largely around the
&lt;a href="https://mystmd.org" target="_blank" rel="noopener" >MyST Markdown document engine&lt;/a>, which means they could more easily integrate improvements around Portable Computation in the Jupyter ecosystem.&lt;/p>
&lt;p>I hope that the broader publishing ecosystem moves in this direction. Because Jupyter is largely based around open standards and protocols, it should be possible for publishers to leverage the BinderHub API and the
&lt;a href="https://repo2docker.readthedocs.io/en/latest/specification.html" target="_blank" rel="noopener" >Reproducible Execution Environment Specification&lt;/a> to integrate computation that powers their reproducible articles. This would allow a community&amp;rsquo;s members to connect their hub&amp;rsquo;s reproducible environment with each published article. Something like the figure below.&lt;/p>
&lt;figure id="figure-publishers-could-re-use-the-computational-environments-from-a-communitys-hub-resulting-in-a-de-duplication-of-infrastructure-and-effort-and-bridging-the-gap-between-where-a-community-does-its-work-and-where-it-submits-new-ideas-for-publication-note-these-are-hypothetical-for-now-but-we-think-publishing-platforms-like-these-are-a-good-starting-point">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Publishers could re-use the computational environments from a community&amp;#39;s hub, resulting in a de-duplication of infrastructure and effort, and bridging the gap between where a community does its work, and where it submits new ideas for publication. (note these are hypothetical for now, but we think publishing platforms like these are a good starting point!)" srcset="
/blog/frictionless-reproducibility/featured_hu264951e48eab912d65d96f304a781973_78955_1befbd691e5e09f18b88c1137e6fdfb0.webp 400w,
/blog/frictionless-reproducibility/featured_hu264951e48eab912d65d96f304a781973_78955_323b65966995a7d0bd4a17e65491d337.webp 760w,
/blog/frictionless-reproducibility/featured_hu264951e48eab912d65d96f304a781973_78955_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/frictionless-reproducibility/featured_hu264951e48eab912d65d96f304a781973_78955_1befbd691e5e09f18b88c1137e6fdfb0.webp"
width="760"
height="428"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Publishers could re-use the computational environments from a community&amp;rsquo;s hub, resulting in a de-duplication of infrastructure and effort, and bridging the gap between where a community does its work, and where it submits new ideas for publication. (note these are hypothetical for now, but we think publishing platforms like these are a good starting point!)
&lt;/figcaption>&lt;/figure>
&lt;p>Integrating with publishing in this way would allow communities to leverage their pre-existing infrastructure as part of their scholarly workflows. If a community had their own capacity to manage Binder infrastructure they could do so, or they could use a service provider like
&lt;a href="https://2i2c.org" target="_blank" rel="noopener" >2i2c&lt;/a> to manage it for them. This would distribute the responsibility of infrastructure management to those who are in the best position to do so - the communities that do the work.&lt;/p>
&lt;h2 id="how-could-we-sustain-the-cost-of-running-computation-for-published-articles">
How could we sustain the cost of running computation for published articles?
&lt;a class="header-anchor" href="#how-could-we-sustain-the-cost-of-running-computation-for-published-articles">#&lt;/a>
&lt;/h2>&lt;p>This raises an important question: how would you sustain services like these? Communities are already nervous about the cost of computation for their workflows. Public services like
&lt;a href="https://mybinder.org" target="_blank" rel="noopener" >mybinder.org&lt;/a> are free and accessible, but not scalable, nor suitable for complex or mission-critical workflows&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>. Would community stakeholders pay for privileged access to BinderHubs that could reproduce and share their computational narratives? Would publishers be willing to pay a percentage of the cloud and management costs associated with reproduction? Could we use this to sustain a larger public service like mybinder.org?&lt;/p>
&lt;p>We don&amp;rsquo;t have any answers yet but we&amp;rsquo;re keen to try. Our colleague Jim Colliander recently explored some of these ideas in a talk recorded for
&lt;a href="https://agu.confex.com/agu/fm24/meetingapp.cgi/Paper/100644" target="_blank" rel="noopener" >AGU 2024&lt;/a>.&lt;/p>
&lt;figure>
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/D5s2HbaulZw?si=GCeDPpr2WobIuu4w" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen>&lt;/iframe>
&lt;figcaption>
&lt;p>A talk from 2i2c team member
&lt;a href="https://2i2c.org/author/jim-colliander/" target="_blank" rel="noopener" >Jim Colliander&lt;/a> discussing the right to reproduce computational ideas, the importance of enabling frictionless reproducibility, and how we might sustain such a service.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;p>One thing seems clear - there is an imperative to make reproducibility and interaction frictionless. This is both to ensure the scientific integrity of the work being done, and also to make computational ideas more accessible to the world. Technologies and service partnerships like these can help ensure the broader community&amp;rsquo;s right to reproduce the work of others.&lt;/p>
&lt;h2 id="exploring-frictionless-portable-computing-at-2i2c">
Exploring Frictionless Portable Computing at 2i2c
&lt;a class="header-anchor" href="#exploring-frictionless-portable-computing-at-2i2c">#&lt;/a>
&lt;/h2>&lt;p>2i2c often plays a role in &lt;em>bridging user communities and open source communities&lt;/em> through cycles of development and collaboration, perhaps we could do the same for the publishing community. We&amp;rsquo;d like to explore some tooling improvements that lay a foundation for these workflows, and will report back on our experiments in the coming months.&lt;/p>
&lt;p>&lt;strong>If you are interested in collaborating&lt;/strong>, &lt;a href="mailto:hello@2i2c.org">please reach out&lt;/a>. We&amp;rsquo;d love to hear from organizations from the scholarly publishing community to understand where these ideas have holes or need significant new development. I&amp;rsquo;d also love feedback on sustainability models to ensure these services can be relied on as part of the publishing ecosystem. In the meantime, hopefully these ideas serve as an inspiration for what is possible, and where we might be heading with 2i2c&amp;rsquo;s service and the broader publishing ecosystem.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/hhmi/" >HHMI&lt;/a> for organizing and hosting this workshop.&lt;/li>
&lt;li>
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/jupyter-book/" >the Jupyter Book community&lt;/a> for their collaboration and feedback on these ideas.&lt;/li>
&lt;/ul>
&lt;div class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1">
&lt;p>There&amp;rsquo;s a
&lt;a href="https://incentivizingopen.org/2025/03/new-paradigms-in-research-communication-continuing-thediscussion/" target="_blank" rel="noopener" >follow-up meeting&lt;/a> for those who are interested.&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2">
&lt;p>The costs associated with running
&lt;a href="https://mybinder.org" target="_blank" rel="noopener" >mybinder.org&lt;/a> have historically been shouldered by donations from organizations such as
&lt;a href="https://ovhcloud.com" target="_blank" rel="noopener" >OVH&lt;/a>,
&lt;a href="https://google.com" target="_blank" rel="noopener" >Google&lt;/a>,
&lt;a href="https://notebooks.gesis.org/" target="_blank" rel="noopener" >GESIS&lt;/a>,
&lt;a href="https://curvenote.com" target="_blank" rel="noopener" >Curvenote&lt;/a>, and now
&lt;a href="https://2i2c.org" target="_blank" rel="noopener" >2i2c&lt;/a>. These donations are not guaranteed, and do not scale directly with the number of users.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/div></description></item><item><title>Announcing backups for GCP-hosted hubs!</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/gcp-filestore-backups/</link><pubDate>Fri, 07 Feb 2025 13:08:22 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/gcp-filestore-backups/</guid><description>&lt;p>2i2c are pleased to announce the development and deployment of automated backups of home directories on GCP-hosted hubs!&lt;/p>
&lt;p>We have developed the
&lt;a href="https://github.com/2i2c-org/gcp-filestore-backups" target="_blank" rel="noopener" >&lt;code>gcp-filestore-backups&lt;/code> project&lt;/a> that regularly creates backups of JupyterHub home directories for disaster recovery purposes. The project is a Python wrapper around the
&lt;a href="https://cloud.google.com/sdk/gcloud" target="_blank" rel="noopener" >&lt;code>gcloud&lt;/code> tool&lt;/a> to regularly request backups be made of the Filestore hosting JupyterHub&amp;rsquo;s user home directories, by default on a daily basis. The script also manages retention of these backups by checking how recently the last backup was made, and the age of existing backups, by default deleting any backup older than 5 days.&lt;/p>
&lt;p>Having these backups enabled means that, in the unlikely and unfortunate case of data loss or corruption, we can reinstate the home directories of the hub to a relatively recent state that is at a maximum of 1 day prior to the incident.&lt;/p>
&lt;p>We have deployed &lt;code>gcp-filestore-backups&lt;/code> to all our GCP hubs presently running, with a retention period of 2 days. If you would like to discuss this further with us,
&lt;a href="https://docs.2i2c.org/support" target="_blank" rel="noopener" >please get in touch!&lt;/a>&lt;/p>
&lt;p>As ever, this project has been developed openly in line with our
&lt;a href="https://2i2c.org/right-to-replicate/" target="_blank" rel="noopener" >Right to Replicate&lt;/a> so you can deploy it against your own infrastructure!&lt;/p></description></item><item><title>Enforcing per-user storage quotas with `jupyterhub-home-nfs`</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/per-user-storage-quota/</link><pubDate>Tue, 28 Jan 2025 09:57:28 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/per-user-storage-quota/</guid><description>&lt;p>When sharing a storage disk between users, as is usually the case in a JupyterHub deployment, it is important to put in guardrails so that one user cannot eat up the whole storage capacity from the rest of the users.
To this end, 2i2c in close collaboration with
&lt;a href="https://developmentseed.org" target="_blank" rel="noopener" >Development Seed&lt;/a> have developed the
&lt;a href="https://github.com/2i2c-org/jupyterhub-home-nfs" target="_blank" rel="noopener" >&lt;code>jupyterhub-home-nfs&lt;/code> project&lt;/a> which is a Helm chart that permits enforcing per-user quotas on the storage space.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note that this feature is currently available to AWS hosted hubs only and will be rolled out to other cloud providers in the future.
&lt;/div>
&lt;/div>
&lt;p>Under the hood, the Helm chart runs
&lt;a href="https://github.com/nfs-ganesha/nfs-ganesha" target="_blank" rel="noopener" >NFS Ganesha&lt;/a> as an in-cluster NFS server, backed by
&lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/storage_administration_guide/ch-xfs" target="_blank" rel="noopener" >XFS&lt;/a> as the underlying filesystem. Storage quota is enforced through XFS&amp;rsquo;s native quota management utility &lt;code>xfs_quota&lt;/code>.&lt;/p>
&lt;p>Since this feature moves our infrastructure away from managed filesystems (such as AWS&amp;rsquo;s Elastic File System) that cannot support per-user storage quotas, we have also developed monitoring and alerting mechanisms that will let us know when the disks are getting full, and automated back-ups for disaster recovery.&lt;/p>
&lt;p>If you would like to try this on your 2i2c-managed hub,
&lt;a href="https://docs.2i2c.org/support" target="_blank" rel="noopener" >please get in touch&lt;/a>.&lt;/p>
&lt;p>This project can also be used with &lt;em>any&lt;/em> Kubernetes-based JupyterHub, as per our
&lt;a href="https://2i2c.org/right-to-replicate/" target="_blank" rel="noopener" >Right to Replicate policy&lt;/a>, so please try it out on your own deployment and let us know what you think!&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;p>This project was developed and deployed in collaboration with
&lt;a href="https://developmentseed.org/team/tarashish-mishra/" target="_blank" rel="noopener" >Tarashish Mishra&lt;/a> from
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/devseed/" >Development Seed&lt;/a>, funded through the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA project&lt;/a>.&lt;/p></description></item><item><title>2i2c hubs now run JupyterHub 5.0</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub5-upgrade/</link><pubDate>Fri, 17 Jan 2025 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub5-upgrade/</guid><description>&lt;p>We are excited to announce that all 2i2c hubs now run JupyterHub 5.0!&lt;/p>
&lt;p>This is an upgrade that brings some exciting new features and improvements. Some of the highlights include:&lt;/p>
&lt;ol>
&lt;li>The possibility to enable
&lt;a href="https://jupyterhub.readthedocs.io/en/5.0.0/tutorial/sharing.html" target="_blank" rel="noopener" >user-initiated server sharing&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://jupyterhub.readthedocs.io/en/5.0.0/reference/authenticators.html#authenticator-managed-roles" target="_blank" rel="noopener" >Authenticator-managed roles&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>Also, JupyterHub 5 will enable us to offer per-group shared directories in the future!
&lt;a href="https://github.com/NASA-IMPACT/veda-jupyterhub/issues/61" target="_blank" rel="noopener" >Tracking Issue&lt;/a>.&lt;/p>
&lt;p>Checkout the
&lt;a href="https://jupyterhub.readthedocs.io/en/latest/howto/upgrading-v5.html" target="_blank" rel="noopener" >JupyterHub 5.0 migration&lt;/a> docs or the
&lt;a href="https://jupyterhub.readthedocs.io/en/5.0.0/reference/changelog.html#id3" target="_blank" rel="noopener" >changelog&lt;/a> for more details.&lt;/p></description></item><item><title>`frx-challenges`: A new tool to host data challenges for Frictionless Research Exchanges</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/frx/</link><pubDate>Fri, 06 Dec 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/frx/</guid><description>&lt;p>2i2c is pleased to announce the &lt;code>frx-challenges&lt;/code> project, a new open source tool to help communities host data challenges on shared infrastructure:&lt;/p>
&lt;p>
&lt;a href="https://github.com/2i2c-org/frx-challenges" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/frx-challenges&lt;/a>&lt;/p>
&lt;p>This project aims to make it easier for administrators to provide a service that enables users to &lt;strong>submit code and data&lt;/strong> that are &lt;strong>evaluated on secure infrastructure with access to private data and resources&lt;/strong>. It also provides a leaderboard that helps users compare their performance against others.&lt;/p>
&lt;figure id="figure-an-example-leaderboard-for-a-data-challenge-taken-from-the-cellmap-challengehttpscellmapchallengejaneliaorg-users-make-submissions-that-are-run-against-secure-and-private-infrastructure-and-data-and-provides-feedback-about-the-submissions-performance-learn-more-about-the-frx-challenges-project-here-https2i2corgfrx-challenges">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="An example leaderboard for a data challenge, taken from the [Cellmap Challenge](https://cellmapchallenge.janelia.org/). Users make submissions that are run against secure and private infrastructure and data, and provides feedback about the submission&amp;#39;s performance. Learn more about the FRX challenges project here: https://2i2c.org/frx-challenges/" srcset="
/blog/frx/images/leaderboard_hu1c5275577555814ddf920c106a29e815_883850_e8cf5edfc5977cc915c200d3d338ce2b.webp 400w,
/blog/frx/images/leaderboard_hu1c5275577555814ddf920c106a29e815_883850_64bd6d34f7f589cd624b5702b3fc5904.webp 760w,
/blog/frx/images/leaderboard_hu1c5275577555814ddf920c106a29e815_883850_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/frx/images/leaderboard_hu1c5275577555814ddf920c106a29e815_883850_e8cf5edfc5977cc915c200d3d338ce2b.webp"
width="75%"
height="417"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
An example leaderboard for a data challenge, taken from the
&lt;a href="https://cellmapchallenge.janelia.org/" target="_blank" rel="noopener" >Cellmap Challenge&lt;/a>. Users make submissions that are run against secure and private infrastructure and data, and provides feedback about the submission&amp;rsquo;s performance. Learn more about the FRX challenges project here:
&lt;a href="https://2i2c.org/frx-challenges/" target="_blank" rel="noopener" >2i2c.org/frx-challenges/&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>It is designed to be lightweight and flexible, and can be run on a variety of shared infrastructure. For those who wish to run this project on cloud infrastructure, we&amp;rsquo;ve also published a
&lt;a href="https://2i2c.org/frx-challenges-helm-chart/" target="_blank" rel="noopener" >Helm Chart to help you deploy &lt;code>frx-challenges&lt;/code> with Kubernetes&lt;/a>.&lt;/p>
&lt;p>While it can be run on its own, we believe that it naturally complements other tools and services for interactive computing and data, such as &lt;strong>JupyterHub&lt;/strong>, &lt;strong>Jupyter Book&lt;/strong>, and &lt;strong>Binder&lt;/strong>. More on that below.&lt;/p>
&lt;p>Below is a brief description of the motivation behind this project.&lt;/p>
&lt;h2 id="what-are-frictionless-research-exchanges">
What are Frictionless Research Exchanges
&lt;a class="header-anchor" href="#what-are-frictionless-research-exchanges">#&lt;/a>
&lt;/h2>&lt;p>The project is heavily inspired by David Donoho&amp;rsquo;s vision of &lt;strong>Frictionless Research Exchanges&lt;/strong> (FRX) as described in
&lt;a href="https://arxiv.org/abs/2310.00865" target="_blank" rel="noopener" >&lt;em>Data Science at the Singularity&lt;/em>&lt;/a>.&lt;/p>
&lt;p>In this article, Donoho describes three key pillars for Frictionless Research Exchanges:&lt;/p>
&lt;blockquote>
&lt;p>The three initiatives are related but separate; and all three have to come together, and in a particularly strong way, to provide the conditions for the new era. Here they are:&lt;/p>
&lt;ul>
&lt;li>[FR-1: Data] datafication of everything, with a culture of research data sharing. One can now find datasets publicly available online on a bewildering variety of topics, from chest x-rays to cosmic microwave background measurements to uber routes to geospatial crop identifications.&lt;/li>
&lt;li>[FR-2: Re-execution] research code sharing including the ability to exactly re-execute the same complete workflow by different researchers.&lt;/li>
&lt;li>[FR-3: Challenges] adopting challenge problems as a new paradigm powering scientific research. The paradigm includes: a shared public dataset, a prescribed and quantified task performance metric, a set of enrolled competitors seeking to outperform each other on the task, and a public leaderboard. Thousands of such challenges with millions of entries have now taken place, across many fields.&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;p>We considered the landscape of tools and services, and felt that [FR-1] and [FR-2] were already well-served by a variety of tools and services for community workspace infrastructure (e.g., JupyterHub:
&lt;a href="https://jupyterhub.readthedocs.io" target="_blank" rel="noopener" >jupyterhub.readthedocs.io&lt;/a>), sharable computational environments (e.g., BinderHub:
&lt;a href="https://binderhub.readthedocs.io" target="_blank" rel="noopener" >binderhub.readthedocs.io&lt;/a>), authoring and reading computational narratives (e.g., Jupyter Book:
&lt;a href="https://jupyterbook.org" target="_blank" rel="noopener" >jupyterbook.org&lt;/a> and MyST:
&lt;a href="https://mystmd.org" target="_blank" rel="noopener" >mystmd.org&lt;/a>), and data I/O tools and standards (e.g., Zarr:
&lt;a href="https://zarr.readthedocs.io" target="_blank" rel="noopener" >zarr.readthedocs.io&lt;/a> and Intake:
&lt;a href="https://intake.readthedocs.io" target="_blank" rel="noopener" >intake.readthedocs.io&lt;/a>).&lt;/p>
&lt;p>However there was a natural missing piece for &lt;strong>[FR-3 Challenges]&lt;/strong>, and we could not identify any community-managed infrastructure that facilitated data challenges. This is the goal of &lt;code>frx-challenges&lt;/code>.&lt;/p>
&lt;h2 id="why-facilitate-data-challenges">
Why facilitate data challenges?
&lt;a class="header-anchor" href="#why-facilitate-data-challenges">#&lt;/a>
&lt;/h2>&lt;p>Data challenges are harder than you think! While it is simple enough to run somebody else&amp;rsquo;s code locally, data challenges require a systematic, secure, and automated approach to accepting and evaluating submissions in a fair and repeatable way. Here are some of the big challenges to tackle:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Submissions must retain user and team identity&lt;/strong>, which means that we must keep track of users and their submissions over time, since data challenges are designed to encourage iterative improvement and optimization.&lt;/li>
&lt;li>&lt;strong>Evaluations must use potentially complex resources and data&lt;/strong> since many data challenges operate by publicly sharing a small dataset, and then running it against a much more complex dataset.&lt;/li>
&lt;li>&lt;strong>Evaluations must be totally secure&lt;/strong>, so that submissions can&amp;rsquo;t do nefarious things like mine cryptocurrency or extract the challenge&amp;rsquo;s private data in unintended ways.&lt;/li>
&lt;li>&lt;strong>Evaluations must be automated&lt;/strong>, so that running the challenge does not require extensive human intervention and can scale to many users.&lt;/li>
&lt;li>&lt;strong>Evaluation must be flexible&lt;/strong>, so that the infrastructure can accept a variety of types of submissions (e.g. code, data, model weights, etc), run them with arbitrary environments designed by the organizers, and run them with the right hardware to get the job done.&lt;/li>
&lt;/ul>
&lt;p>These are just a few of the major challenges that we&amp;rsquo;ve tried to address with &lt;code>frx-challenges&lt;/code>, and we&amp;rsquo;re excited to see how it goes with our first assisted community challenge: the
&lt;a href="https://cellmapchallenge.janelia.org/" target="_blank" rel="noopener" >Cellmap Challenge&lt;/a>.&lt;/p>
&lt;p>If you&amp;rsquo;re interested in learning more or participating in this project, follow along at its GitHub repository:&lt;/p>
&lt;p>
&lt;a href="https://github.com/2i2c-org/frx-challenges" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> 2i2c-org/frx-challenges&lt;/a>&lt;/p>
&lt;p>This is still the &lt;strong>very early stages&lt;/strong> of the project, and we imagine it will evolve significantly. We welcome feedback for how it can more effectively serve a variety of communities.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;p>Thanks to the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/hhmi/" >Howard Hughes Medical Institute&lt;/a> (HHMI) for collaborating with us on the
&lt;a href="https://cellmapchallenge.janelia.org/" target="_blank" rel="noopener" >Cellmap Challenge&lt;/a>, which led to the creation of this project.&lt;/p>
&lt;p>Thanks to Kristen Ratan and
&lt;a href="https://strategiesos.org/about/" target="_blank" rel="noopener" >Strategies for Open Science&lt;/a> (Stratos) for enabling this collaboration, and providing strategic guidance and support.&lt;/p></description></item><item><title>Improving the logged in home page experience in JupyterHub with `jupyterhub-fancy-profiles`</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-fancy-profiles-rollout/</link><pubDate>Mon, 18 Nov 2024 12:55:20 -0800</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-fancy-profiles-rollout/</guid><description>&lt;p>On most research oriented JupyterHub installations, users would like to customize their server (the environment, resources available, etc) after logging in. In Kubernetes based JupyterHub environments, a
&lt;a href="https://z2jh.jupyter.org/en/latest/jupyterhub/customizing/user-environment.html#using-multiple-profiles-to-let-users-select-their-environment" target="_blank" rel="noopener" >profile list&lt;/a> provides this functionality.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./classic-profiles.png" alt="image" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
(Profile List for the NASA VEDA JupyterHub with the default implementation from KubeSpawner)&lt;/p>
&lt;p>The profile list is the de-facto &amp;ldquo;logged in homepage&amp;rdquo; for these users, as that is what they see after they have logged in.&lt;/p>
&lt;p>In collaboration with
&lt;a href="https://developmentseed.org/" target="_blank" rel="noopener" >Development Seed&lt;/a>, funded by our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/" >earlier grant&lt;/a> from
&lt;a href="https://www.gesis.org/home" target="_blank" rel="noopener" >GESIS&lt;/a> as well as the
&lt;a href="https://www.earthdata.nasa.gov/data/tools/veda" target="_blank" rel="noopener" >NASA VEDA project&lt;/a>, we have been building the
&lt;a href="https://github.com/2i2c-org/jupyterhub-fancy-profiles" target="_blank" rel="noopener" >&lt;code>jupyterhub-fancy-profiles&lt;/code>&lt;/a> project to improve this experience.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./fancy-profiles.png" alt="image" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
(Profile List for the NASA VEDA JupyterHub with &lt;code>jupyterhub-fancy-profiles&lt;/code>)&lt;/p>
&lt;p>Last week, we rolled this new experience out to all 2i2c managed JupyterHubs! Here&amp;rsquo;s a quick rundown of what this enables:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Descriptions for choices in the dropdowns, making it much easier for users to know what they are getting with each environment (or resource selection).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Fully backwards compatible with the existing KubeSpawner profile list implementation. In our PR to
&lt;a href="https://github.com/2i2c-org/infrastructure/pull/5083" target="_blank" rel="noopener" >roll this out&lt;/a> to all hubs, you notice that we didn&amp;rsquo;t have to change the structure of any profile lists! So you can safely roll this out to your hubs too without needing to fundamentally change how your profiles are set up.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It is a modern web app (built with
&lt;a href="https://react.dev/" target="_blank" rel="noopener" >react&lt;/a>), just like the JupyterHub admin panel. This allows us to evolve and satisfy user needs much faster, as well as expanding the pool of people who can contribute to the project!&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Support for dynamically building images using
&lt;a href="https://mybinder.org" target="_blank" rel="noopener" >mybinder.org&lt;/a> style repositories! It talks to the
&lt;a href="https://github.com/jupyterhub/binderhub/" target="_blank" rel="noopener" >binderhub&lt;/a> API so users can build reproducible environments as they wish without admin involvement nor needing to fully understand how docker and containers work. Our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/" >earlier blog post&lt;/a> has more information.&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./fancy-profiles-build.png" alt="image" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>This is just the start, and thanks to ongoing funding from the
&lt;a href="https://www.earthdata.nasa.gov/data/tools/veda" target="_blank" rel="noopener" >NASA VEDA&lt;/a> project, we are going to continue making improvements to this experience.&lt;/p>
&lt;h2 id="use-this-in-your-jupyterhub">
Use this in your JupyterHub
&lt;a class="header-anchor" href="#use-this-in-your-jupyterhub">#&lt;/a>
&lt;/h2>&lt;p>As with everything we build at 2i2c (per our
&lt;a href="https://2i2c.org/right-to-replicate/" target="_blank" rel="noopener" >right to replicate&lt;/a> policy), this project can be used with &lt;em>any&lt;/em> JupyterHub installation that uses Kubernetes. There are
&lt;a href="https://github.com/2i2c-org/jupyterhub-fancy-profiles/?tab=readme-ov-file#how-to-use" target="_blank" rel="noopener" >instructions&lt;/a> in the README. Please try it out on yours and let us know what you think!&lt;/p>
&lt;h2 id="credit">
Credit
&lt;a class="header-anchor" href="#credit">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>The project was initiated with funding generously provided by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/gesis/" >GESIS&lt;/a> (see our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/" >earlier blog post&lt;/a>).&lt;/li>
&lt;li>
&lt;a href="https://developmentseed.org/team/sanjay-bhangar/" target="_blank" rel="noopener" >Sanjay Bhangar&lt;/a> and
&lt;a href="https://oliverroick.net/" target="_blank" rel="noopener" >Oliver Roick&lt;/a> from
&lt;a href="https://developmentseed.org/" target="_blank" rel="noopener" >Development Seed&lt;/a> for advocating for this project and contributing heavily to it.&lt;/li>
&lt;li>The
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/nasa-veda/" >NASA VEDA&lt;/a> project (in particular,
&lt;a href="https://github.com/freitagb/" target="_blank" rel="noopener" >Brian Freitag&lt;/a> and
&lt;a href="https://github.com/wildintellect" target="_blank" rel="noopener" >Alex Mandel&lt;/a>), for continued funding (in the form of engineering time) plus being early adopters!&lt;/li>
&lt;/ul></description></item><item><title>Track and manage cloud costs using Grafana</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-attribution/</link><pubDate>Fri, 15 Nov 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-attribution/</guid><description>&lt;p>
&lt;figure id="figure-grafana-dashboard-showing-cloud-costs-broken-down-by-compute-storage-and-other-components-for-the-openscapeshttpsopenscapesorg-hub">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Screenshot of a graph showing total daily costs per component." srcset="
/blog/aws-cost-attribution/featured_hu0a1ce7d8654f8efa8d798b6fefc5ebab_212463_55733394a3e42b9cab8734939a78d9bd.webp 400w,
/blog/aws-cost-attribution/featured_hu0a1ce7d8654f8efa8d798b6fefc5ebab_212463_025709b2a5b75f5862165f203ded6cd4.webp 760w,
/blog/aws-cost-attribution/featured_hu0a1ce7d8654f8efa8d798b6fefc5ebab_212463_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-attribution/featured_hu0a1ce7d8654f8efa8d798b6fefc5ebab_212463_55733394a3e42b9cab8734939a78d9bd.webp"
width="760"
height="485"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Grafana dashboard showing cloud costs broken down by compute, storage and other components for the
&lt;a href="https://openscapes.org/" target="_blank" rel="noopener" >Openscapes&lt;/a> hub.
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>We are pleased to unveil a new feature to track cloud costs within our Grafana dashboards! Community Champions now have the ability to monitor the cost and usage of their 2i2c-managed hubs that displays up to date aggregated costs as well as detailed breakdowns for more granular reports.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note that this feature is currently available to AWS hosted hubs only and will be rolled out to other cloud providers in the future.
&lt;/div>
&lt;/div>
&lt;h2 id="accessing-the-cloud-cost-dashboard">
Accessing the cloud cost dashboard
&lt;a class="header-anchor" href="#accessing-the-cloud-cost-dashboard">#&lt;/a>
&lt;/h2>&lt;p>Community Champions can view the Cloud Cost dashboard from their Grafana instance (please see the
&lt;a href="https://docs.2i2c.org/admin/monitoring/grafana-dashboards#getting-a-grafana-account" target="_blank" rel="noopener" >Service Guide&lt;/a> for how to gain access).&lt;/p>
&lt;p>From the main menu of Grafana, navigate to &lt;em>Dashboards &amp;gt; Cloud cost dashboards &amp;gt; Cloud cost attribution&lt;/em> to view the dashboard.&lt;/p>
&lt;h2 id="understanding-the-cloud-cost-dashboard">
Understanding the cloud cost dashboard
&lt;a class="header-anchor" href="#understanding-the-cloud-cost-dashboard">#&lt;/a>
&lt;/h2>&lt;p>A typical 2i2c-managed deployment comprises of a staging hub and a production hub, although some other communities may have extra hubs such as a workshop hub. By default, costs are not broken down on a per hub basis unless the community has opted in to this feature.&lt;/p>
&lt;p>The dashboard is made of several panels:&lt;/p>
&lt;ul>
&lt;li>Daily costs&lt;/li>
&lt;li>Daily costs per hub (opt-in only)&lt;/li>
&lt;li>Total daily costs per component&lt;/li>
&lt;li>Daily costs per component per hub (opt-in only).&lt;/li>
&lt;/ul>
&lt;video mute autoplay loop >
&lt;source src="https://deploy-preview-609--2i2c-org.netlify.app/blog/aws-cost-attribution/demo.mp4" type="video/mp4">
&lt;/video>
&lt;p>For more detailed information on the data that each panel displays, please consult our
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users#understanding-the-cloud-cost-dashboard" target="_blank" rel="noopener" >Service Guide&lt;/a> for reference.&lt;/p>
&lt;h2 id="sharing-cost-reports">
Sharing cost reports
&lt;a class="header-anchor" href="#sharing-cost-reports">#&lt;/a>
&lt;/h2>&lt;p>The dashboard can be shared with other community members and stakeholders so they can understand usage and cost patterns. Community Champions can export data to a CSV file, or they can generate a snapshot of the Grafana dashboard and share a public link.&lt;/p>
&lt;p>For instructions on how to export data from the dashboard, please see our
&lt;a href="https://docs.2i2c.org/admin/monitoring/cost-users#sharing-cost-reports" target="_blank" rel="noopener" >Service Guide&lt;/a> for reference.&lt;/p>
&lt;h2 id="next-steps">
Next steps
&lt;a class="header-anchor" href="#next-steps">#&lt;/a>
&lt;/h2>&lt;p>We would love to know whether this feature is useful and how it can be improved. We will be contacting individual communities to share their feedback with us – please share your thoughts with us!&lt;/p>
&lt;p>We will work on rolling out this service to GCP hosted clusters in future. Stay tuned to know when this feature is available to your community.&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thank you to Erik for spearheading the rollout effort and to the rest of the 2i2c team for their support.&lt;/li>
&lt;li>Thanks to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/openscapes/" >Openscapes&lt;/a> and
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cryocloud/" >Cryocloud&lt;/a> communities for providing valuable insights during the prototyping and testing phase, and for funding part of this work.&lt;/li>
&lt;/ul></description></item><item><title>Low storage alerting for the UToronto cluster</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/utoronto-storage-monitoring/</link><pubDate>Fri, 14 Jun 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/utoronto-storage-monitoring/</guid><description>&lt;p>
&lt;figure id="figure-the-utoronto-hub-landing-page">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./cover-featured.png" alt="The UToronto hub landing page" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The UToronto hub landing page
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>2i2c has operated
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/utoronto/" >The University of Toronto&lt;/a> hub since 2021 and this hub supports over 6000 educators and learners in a day! With a community of this size, file storage can quickly grow out of control and cause issues.&lt;/p>
&lt;p>The 2i2c engineering team have implemented a
&lt;a href="https://github.com/2i2c-org/infrastructure/issues/3320" target="_blank" rel="noopener" >low storage alerting system&lt;/a> for Microsoft Azure, so that they can pre-emptively take remedial action before the filesystem is about to run out of diskspace.&lt;/p>
&lt;p>Great job team 🚀&lt;/p>
&lt;p>
&lt;figure id="figure-utoronto-hub-usage">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="UToronto hub usage" srcset="
/blog/utoronto-storage-monitoring/usage_hu6ed2c2eb2e5ce90c08322c45dfdcb7ee_34488_a1a0f10d97ad208b72fb1666180aec3a.webp 400w,
/blog/utoronto-storage-monitoring/usage_hu6ed2c2eb2e5ce90c08322c45dfdcb7ee_34488_c1795edfa28486d37f54c3cbd33086a7.webp 760w,
/blog/utoronto-storage-monitoring/usage_hu6ed2c2eb2e5ce90c08322c45dfdcb7ee_34488_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/utoronto-storage-monitoring/usage_hu6ed2c2eb2e5ce90c08322c45dfdcb7ee_34488_a1a0f10d97ad208b72fb1666180aec3a.webp"
width="756"
height="500"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
UToronto hub usage
&lt;/figcaption>&lt;/figure>
&lt;/p></description></item><item><title>Security report for jupyter-server-proxy: CVE-2024-28179</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/cve-jupyter-server-proxy/</link><pubDate>Tue, 19 Mar 2024 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/cve-jupyter-server-proxy/</guid><description>
&lt;h2 id="what-happened">
What happened?
&lt;a class="header-anchor" href="#what-happened">#&lt;/a>
&lt;/h2>&lt;p>A few weeks ago, the JupyterHub team discovered a security vulnerability in
&lt;a href="https://jupyter-server-proxy.readthedocs.io/en/latest/" target="_blank" rel="noopener" >the &lt;code>jupyter-server-proxy&lt;/code> package&lt;/a> that would allow potential unauthenticated access to a JupyterHub via WebSockets, allowing unauthenticated users to run arbitrary code on the JupyterHub.
&lt;code>jupyter-server-proxy&lt;/code> is used by many communities to provide alternative user interfaces like RStudio and remote desktops.&lt;/p>
&lt;p>This vulnerability was detected by the JupyterHub team, with leadership from 2i2c&amp;rsquo;s engineers. It was resolved through upstream contributions to the JupyterHub project, and we have deployed a fix that mitigates this vulnerability for all the hubs 2i2c manages.&lt;/p>
&lt;h2 id="does-this-impact-my-2i2c-community-hub">
Does this impact my 2i2c community hub?
&lt;a class="header-anchor" href="#does-this-impact-my-2i2c-community-hub">#&lt;/a>
&lt;/h2>&lt;p>We do not believe that any of 2i2c&amp;rsquo;s communities were impacted by this vulnerability, and
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/f86d128a0d045163e72802f6df287a6f46d4b738/helm-charts/basehub/values.yaml#L296" target="_blank" rel="noopener" >a patch&lt;/a> has now been pushed to all community hubs to resolve this issue.&lt;/p>
&lt;p>If your community was vulnerable to this problem, you might experience slightly slower startup latency while we work out a long-term solution.&lt;/p>
&lt;p>Since this is a vulnerability in the docker image used by our communities, we will be reaching out over the next few weeks to put a more permanent fix in place.&lt;/p>
&lt;h2 id="where-can-i-learn-more">
Where can I learn more?
&lt;a class="header-anchor" href="#where-can-i-learn-more">#&lt;/a>
&lt;/h2>&lt;p>See
&lt;a href="https://github.com/jupyterhub/jupyter-server-proxy/security/advisories/GHSA-w3vc-fx9p-wp4v" target="_blank" rel="noopener" >the JupyterHub security advisory for CVE-2024-28179&lt;/a> for more information about the security vulnerability, including details on the mitigation we have put in place to protect our communities.&lt;/p>
&lt;h2 id="conclusion">
Conclusion
&lt;a class="header-anchor" href="#conclusion">#&lt;/a>
&lt;/h2>&lt;p>We&amp;rsquo;re grateful that the JupyterHub community was quick to acknowledge, respond, and resolve this security vulnerability after it was brought to their attention.
We&amp;rsquo;re also proud that 2i2c&amp;rsquo;s engineers helped the JupyterHub team throughout the process.&lt;/p>
&lt;p>This allowed our team to resolve the problem before it impacted any of 2i2c&amp;rsquo;s communities.
Because 2i2c community infrastructure is managed in a central location, we were able to resolve this for over 80 communities with a single team rather than expecting each community to learn about and fix this problem on their own.&lt;/p>
&lt;p>We also believe this reflects the healthy upstream relationships that we hope to encourage with our team&amp;rsquo;s
&lt;a href="https://compass.2i2c.org/open-source/" target="_blank" rel="noopener" >Open Source strategy and practices&lt;/a>.
By working with the JupyterHub community and pushing changes upstream, we&amp;rsquo;ve resolved this issue for &lt;em>any&lt;/em> user of &lt;code>jupyter-server-proxy&lt;/code>, not just 2i2c&amp;rsquo;s own ecosystem.
In particular, because of 2i2c&amp;rsquo;s position running hubs for many communities via Kubernetes, we were able to identify a solution that did not require every user image to be updated (as described in section &lt;strong>For JupyterHub admins of Z2JH installations&lt;/strong>).&lt;/p>
&lt;p>We believe that all of these lead to a healthier, safer ecosystem of open source tools ❤️.&lt;/p></description></item><item><title>Integrating BinderHub with JupyterHub: Empowering users to manage their own environments</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/</link><pubDate>Wed, 03 Jan 2024 16:56:14 -0800</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/</guid><description>&lt;p>&lt;em>Thanks to
&lt;a href="https://www.gesis.org/en/institute/staff/person/arnim.bleier" target="_blank" rel="noopener" >Arnim Bleier&lt;/a>,
&lt;a href="https://jnywong.github.io/" target="_blank" rel="noopener" >Jenny Wong&lt;/a>,
&lt;a href="https://github.com/GeorgianaElena" target="_blank" rel="noopener" >Georgiana Elena&lt;/a>,
&lt;a href="https://github.com/damianavila" target="_blank" rel="noopener" >Damián Avila&lt;/a>,
&lt;a href="https://colliand.com/" target="_blank" rel="noopener" >Jim Colliander&lt;/a> and
&lt;a href="https://github.com/jmunroe" target="_blank" rel="noopener" >James Munroe&lt;/a> for contributing to this blog post&lt;/em>&lt;/p>
&lt;p>
&lt;a href="https://mybinder.org" target="_blank" rel="noopener" >mybinder.org&lt;/a> is a very popular service that allows end users to specify and share the environment (languages, packages, etc) required for their notebooks to run correctly by placing
&lt;a href="https://repo2docker.readthedocs.io/en/latest/config_files.html#config-files" target="_blank" rel="noopener" >configuration files&lt;/a> they are already familiar with (like &lt;code>requirements.txt&lt;/code> or &lt;code>environment.yml&lt;/code>) along with their notebooks. While not without its own set of challenges, this is extremely powerful because it puts control of the &lt;em>environment&lt;/em> in the hands of the people who write the code. They can customize the environment to fit the needs of their code, instead of having to fit their code into the environment that admins have made available.&lt;/p>
&lt;p>But, mybinder.org (and the
&lt;a href="https://github.com/jupyterhub/binderhub/" target="_blank" rel="noopener" >BinderHub&lt;/a> software that powers it) is built for &lt;em>sharing&lt;/em> your work after you are done with it, &lt;em>not&lt;/em> for actively doing work. BinderHubs often do not have persistent storage nor persistent user identity, and UX is centered around &lt;em>ephemeral&lt;/em> interactivity that can be shared with others (via a link), rather than &lt;em>persistent&lt;/em> interactivity that a single user repeatedly comes back to.
&lt;a href="https://jupyter.org/hub" target="_blank" rel="noopener" >JupyterHub&lt;/a> is more commonly used for this kinda workflow, but doesn&amp;rsquo;t currently have the ability for users to easily build their own environments. Admins who are &lt;em>running&lt;/em> the JupyterHub can make
&lt;a href="https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#using-multiple-profiles-to-let-users-select-their-environment" target="_blank" rel="noopener" >multiple environments&lt;/a> available for users to choose from, but this still puts admins in the critical path for environment customization.&lt;/p>
&lt;p>Our
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/gesis-2i2c-collaboration-update/" >collaboration&lt;/a> with
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/gesis/" >GESIS&lt;/a>,
&lt;a href="https://www.nfdi4datascience.de" target="_blank" rel="noopener" >NFDI4DS&lt;/a>, and
&lt;a href="https://www.cessda.eu" target="_blank" rel="noopener" >CESSDA&lt;/a>, aims to bring this flexibility to JupyterHub directly. We aim to empower users to decide for themselves which applications and dependencies are installed on a per-project basis. Our work enables communities with heterogeneous requirements to share a single Hub. Our approach frees administrators from being overwhelmed by installation requests and transforms the JupyterHub platform into a platform for collaborative computational reproducibility. In this update, we report on our progress and upcoming steps in this project.&lt;/p>
&lt;h2 id="what-does-a-binderhub-do-exactly">
What does a BinderHub do, exactly?
&lt;a class="header-anchor" href="#what-does-a-binderhub-do-exactly">#&lt;/a>
&lt;/h2>&lt;p>It is helpful to understand that BinderHub primarily has 3 responsibilities:&lt;/p>
&lt;ol>
&lt;li>Present a UI to the end user for them to provide details on what to build (this is what you see when you go to mybinder.org)&lt;/li>
&lt;li>Call out to
&lt;a href="https://github.com/jupyterhub/repo2docker" target="_blank" rel="noopener" >repo2docker&lt;/a> in a scalable way to actually &lt;em>build and push&lt;/em> an image containing the environment for the given repository, and show the user logs as this build process happens. This also allows users to debug issues with their build more easily.&lt;/li>
&lt;li>Talk to a JupyterHub instance to launch a user server with the built docker image, and redirect the user to this.&lt;/li>
&lt;/ol>
&lt;p>(2) is really the &lt;em>core&lt;/em> feature of BinderHub, and we settled on figuring out how to make that available to JupyterHub users. It was really important to us that this was also done in a way that can be sustainably used by &lt;em>everyone&lt;/em>, not just 2i2c. This blog post discusses the various improvements to the broad ecosystem of projects in the Jupyter ecosystem to get this done.&lt;/p>
&lt;h2 id="demo">
Demo
&lt;a class="header-anchor" href="#demo">#&lt;/a>
&lt;/h2>&lt;p>But first, a very quick demo of how this looks like right now now!&lt;/p>
&lt;!-- generated from original .mov screen recording with `ffmpeg -i screencast.mov -c:v libx264 screencast.mp4` -->
&lt;p>&lt;video src="./screencast.mp4" autoplay muted controls>&lt;/video>&lt;/p>
&lt;p>This is very much a work in progress, but the basic flow can be seen clearly. Users see a Server Options menu after they log into JupyterHub. They can specify the two primary things that determine the server configuration:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>The resources allocated (RAM, CPU and maybe GPU)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The environment (container image) used, which can be specified in one of 3 ways:&lt;/p>
&lt;p>a. A pre-selected list of environments (container images), provided by the administrators who set up this JupyterHub
b. A blank text box where you can enter any publicly available docker image they want
c. A mybinder.org style way to specify a GitHub repository, which will be then dynamically built into a docker image for the user!&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>So what did we need to do to accomplish this, in a way that&amp;rsquo;s very upstream friendly and usable by everyone (and not just 2i2c)?&lt;/p>
&lt;h2 id="a-standalone-binderhub-service-helm-chart">
A Standalone &lt;code>binderhub-service&lt;/code> helm chart
&lt;a class="header-anchor" href="#a-standalone-binderhub-service-helm-chart">#&lt;/a>
&lt;/h2>&lt;p>The default upstream
&lt;a href="https://github.com/jupyterhub/binderhub/tree/main/helm-chart" target="_blank" rel="noopener" >BinderHub helm chart&lt;/a> &lt;em>includes&lt;/em> a JupyterHub as a dependency, and configures itself to be used primarily in a manner similar to
&lt;a href="https://mybinder.org" target="_blank" rel="noopener" >mybinder.org&lt;/a>. As the person who helped make that choice early on, I can tell you why it was made - for convenience! And it &lt;em>was&lt;/em> very convenient, as it allowed us to get mybinder.org going fast. However, it makes it difficult to install a BinderHub service &lt;em>alongside&lt;/em> an existing JupyterHub. To this end, we have created a standalone
&lt;a href="https://github.com/2i2c-org/binderhub-service/" target="_blank" rel="noopener" >BinderHub helm chart&lt;/a>, designed to be installed &lt;em>alongside&lt;/em> an existing JupyterHub, so we can use it &lt;em>purely&lt;/em> to build images. This allows the BinderHub instance to be used as a
&lt;a href="https://jupyterhub.readthedocs.io/en/stable/reference/services.html" target="_blank" rel="noopener" >JupyterHub Service&lt;/a>, which is what we want.&lt;/p>
&lt;p>While this helm chart is currently under the 2i2c GitHub org, the hope is that it can eventually migrate to a
&lt;a href="https://github.com/jupyterhub/team-compass/issues/519" target="_blank" rel="noopener" >jupyterhub-contrib&lt;/a> organization (once it is created), or it can become the upstream helm chart for BinderHub if enough work can be done in BinderHub to allow it to serve use cases like mybinder.org.&lt;/p>
&lt;p>As part of this work, we also added a way for BinderHub to run in
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1647" target="_blank" rel="noopener" >API only mode&lt;/a>, so we can fully turn off the UI &lt;em>and&lt;/em> launching ability of BinderHub. This change decoupled the
&lt;a href="#what-does-a-binderhub-do-exactly" >three responsibilities of BinderHub&lt;/a> we discussed previously, allowing us to bring our own UI and JupyterHub. BinderHub could now be used &lt;em>purely&lt;/em> for its scalable image building features, which is exactly what we want!&lt;/p>
&lt;h2 id="sustainably-extending-kubespawners-profilelist">
Sustainably extending KubeSpawner&amp;rsquo;s &lt;code>profileList&lt;/code>
&lt;a class="header-anchor" href="#sustainably-extending-kubespawners-profilelist">#&lt;/a>
&lt;/h2>&lt;p>We identified KubeSpawner&amp;rsquo;s &lt;code>profileList&lt;/code> feature as the ideal location for UI to dynamically build environments (container images), making it just another &amp;rsquo;environment choice&amp;rsquo; people can choose, along with picking the resources their server needs. From an end-user perspective, it was also the logical place for them to specify a repository to build into an environment, as they could already choose some pre-built environments from here. They can also select other arbitrary resources they want (such as memory, GPU, etc) from here as well. From a maintainer perspective, it helps with long-term maintenance of the JupyterHub projects.&lt;/p>
&lt;p>The implementation of &lt;code>profileList&lt;/code> however, was not easy to extend at this point. So
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/724" target="_blank" rel="noopener" >this PR&lt;/a> improved how easy it was to extend it in more complex ways, without making the implementation in KubeSpawner itself complicated. Even though this had &lt;em>no&lt;/em> visible end-user effects, it was an extremely important step in allowing us to experiment with UI in a &lt;em>sustainable&lt;/em> way without having to rely on upstream. These kinds of changes can sometimes be hard to sell to stakeholders but are extremely important in ensuring a continuous and sustainable relationship with upstream.&lt;/p>
&lt;h2 id="implementing-unlisted_choice-feature-in-kubespawner">
Implementing &lt;code>unlisted_choice&lt;/code> feature in KubeSpawner
&lt;a class="header-anchor" href="#implementing-unlisted_choice-feature-in-kubespawner">#&lt;/a>
&lt;/h2>&lt;p>The profileList feature was built to allow JupyterHub &lt;em>admins&lt;/em> to specify an explicit list of container images the end-user can choose from. It did not have a way for any choice that was &lt;em>not&lt;/em> pre-approved by the admin to be used. We needed this feature since the BinderHub API will build a new docker image for each environment the user wants, and so this can not be chosen from a pre-approved list. We had to safely add this feature to KubeSpawner in such a way that it was generally useful to everyone. Many other communities had been asking for such a feature anyway - the ability to simply &amp;rsquo;type in&amp;rsquo; an image and have that be used.&lt;/p>
&lt;p>
&lt;a href="https://www.earthdata.nasa.gov/esds/veda" target="_blank" rel="noopener" >NASA VEDA&lt;/a> was one such community, so we partnered with
&lt;a href="https://github.com/batpad/" target="_blank" rel="noopener" >Sanjay Bhangar&lt;/a> from
&lt;a href="https://developmentseed.org/" target="_blank" rel="noopener" >Development Seed&lt;/a> (an organization that helps run NASA VEDA) to implement this feature. Engineers from 2i2c contributed heavily to this feature as well, and after &lt;em>several&lt;/em> PRs (
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/735" target="_blank" rel="noopener" >1&lt;/a>,
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/766" target="_blank" rel="noopener" >2&lt;/a>,
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/773" target="_blank" rel="noopener" >3&lt;/a>,
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/774" target="_blank" rel="noopener" >4&lt;/a> and
&lt;a href="https://github.com/jupyterhub/kubespawner/pull/777" target="_blank" rel="noopener" >5&lt;/a>), this feature is now available for everyone to use!&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./screenshot-featured.png" alt="Screenshot of Kubernetes Profiles with Unlisted Choice" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>A key component of doing &lt;em>sustainable&lt;/em> upstream work is that every addition needs to be useful by itself for a broad group of people. This change was very helpful for many communities that wanted to allow their users the freedom to pick whatever image they want to use, regardless of wether they wanted to use dynamic image building or not. The broad interest allowed us to build a coalition with other interested parties, and get the change accepted upstream more easily!&lt;/p>
&lt;h2 id="jupyterhub-fancy-profiles">
&lt;code>jupyterhub-fancy-profiles&lt;/code>
&lt;a class="header-anchor" href="#jupyterhub-fancy-profiles">#&lt;/a>
&lt;/h2>&lt;p>Once we had all these pieces in place, it was time to actually work on the frontend UI that would allow users to build images dynamically and launch them. Since this will replace the &amp;lsquo;profileList&amp;rsquo; feature, it should also allow them to select different resources (RAM, CPU, etc) as needed, as well as type in an existing image if they desire. So it was a full re-implementation of the &lt;code>profileList&lt;/code> frontend.&lt;/p>
&lt;p>This is ongoing now at the
&lt;a href="https://github.com/yuvipanda/jupyterhub-fancy-profiles" target="_blank" rel="noopener" >jupyterhub-fancy-profiles&lt;/a> project. It is a pure frontend web application, using modern frontend tooling (
&lt;a href="https://react.dev/" target="_blank" rel="noopener" >React&lt;/a>,
&lt;a href="https://webpack.js.org/" target="_blank" rel="noopener" >webpack&lt;/a>,
&lt;a href="https://babeljs.io/" target="_blank" rel="noopener" >Babel&lt;/a>, etc) and written in JavaScript. It&amp;rsquo;s gone through a few revisions, but the demo provided earlier in the blog post is in its current state. Because the default profileList implementation is pure HTML / CSS with very &lt;em>minimal&lt;/em> JS, it is limited in what kind of UX it could have. &lt;code>jupyterhub-fancy-profiles&lt;/code> aims to be very helpful &lt;em>even&lt;/em> when dynamic image-building features are not enabled on a JupyterHub. We hope to roll this out to a few JupyterHubs and improve it over time based on feedback.&lt;/p>
&lt;h2 id="jupyterhubbinderhub-clienthttpswwwnpmjscompackagejupyterhubbinderhub-client-npm-package">
&lt;a href="https://www.npmjs.com/package/@jupyterhub/binderhub-client" target="_blank" rel="noopener" >&lt;code>jupyterhub/@binderhub-client&lt;/code>&lt;/a> npm package
&lt;a class="header-anchor" href="#jupyterhubbinderhub-clienthttpswwwnpmjscompackagejupyterhubbinderhub-client-npm-package">#&lt;/a>
&lt;/h2>&lt;p>While building &lt;code>jupyterhub-fancy-profiles&lt;/code>, we wanted to use the &lt;em>same&lt;/em> javascript code used by BinderHub frontend to interact with the BinderHub API, instead of re-implementing it. However, the existing BinderHub JavaScript code was not easily consumable by external projects. We refactored the code, added tests, migrated to use modern JS practices and published the
&lt;a href="https://www.npmjs.com/package/@jupyterhub/binderhub-client" target="_blank" rel="noopener" >&lt;code>jupyterhub/@binderhub-client&lt;/code> NPM package&lt;/a> that can be used not just by &lt;code>jupyerhub-fancy-profiles&lt;/code> but any external project for talking to the BinderHub API.&lt;/p>
&lt;p>This had to be done in such a way that current BinderHub installations (such as mybinder.org) do not break. That took quite a few pull requests:
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1689" target="_blank" rel="noopener" >1&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1693" target="_blank" rel="noopener" >2&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1694" target="_blank" rel="noopener" >3&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1741" target="_blank" rel="noopener" >4&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1742" target="_blank" rel="noopener" >5&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1758" target="_blank" rel="noopener" >6&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1761" target="_blank" rel="noopener" >7&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1771" target="_blank" rel="noopener" >8&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1773" target="_blank" rel="noopener" >9&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1775" target="_blank" rel="noopener" >10&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1778" target="_blank" rel="noopener" >11&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1779" target="_blank" rel="noopener" >12&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1781" target="_blank" rel="noopener" >13&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1782" target="_blank" rel="noopener" >14&lt;/a>,
&lt;a href="https://github.com/jupyterhub/binderhub/pull/1783" target="_blank" rel="noopener" >15&lt;/a>. This refactoring work was very helpful to us, and also appreciated by the broader community.&lt;/p>
&lt;h2 id="defending-against-cryptojacking-with-cryptnono">
Defending against cryptojacking with &lt;code>cryptnono&lt;/code>
&lt;a class="header-anchor" href="#defending-against-cryptojacking-with-cryptnono">#&lt;/a>
&lt;/h2>&lt;p>For Open Science to flourish, we need to allow access to resources without login / paywalls wherever possible. A new menace against this has been
&lt;a href="https://www.interpol.int/en/Crimes/Cybercrime/Cryptojacking" target="_blank" rel="noopener" >cryptojacking&lt;/a> - where attackers use up any and all available free compute to mine cryptocurrencies. This has affected &lt;em>many&lt;/em> folks on the internet, including
&lt;a href="https://www.bleepingcomputer.com/news/security/github-actions-being-actively-abused-to-mine-cryptocurrency-on-github-servers/" target="_blank" rel="noopener" >GitHub Actions&lt;/a> and mybinder.org, the primary public BinderHub installation. mybinder.org has some extra protections against cryptojacking that aren&amp;rsquo;t easily usable elsewhere, and this has unfortunately meant that the demo JupyterHubs we have with these features enabled have been behind a login wall. I personally believe login walls are long term antithetical to open science, and so this was an important problem to solve.&lt;/p>
&lt;p>
&lt;a href="https://github.com/cryptnono/cryptnono" target="_blank" rel="noopener" >cryptnono&lt;/a> is an open source project designed to help fight cryptojacking, and as part of this grant we ported some of this functionality out of mybinder.org specific code into cryptnono, so other deployments may also benefit from it! We also migrated to using the super efficient
&lt;a href="https://ebpf.io/" target="_blank" rel="noopener" >ebpf&lt;/a> Linux Kernel subsystem, allowing for more complex heuristics to catch a much broader range of cryptomining activity. We have been slowly tweaking the config on mybinder.org, and it has proven to be very effective! This will be very helpful for &lt;em>anyone&lt;/em> who wants to provide a JupyterHub (or any other computational service) without a login wall. If you are interested in using cryptnono in this fashion, please
&lt;a href="https://github.com/cryptnono/cryptnono/issues" target="_blank" rel="noopener" >reach out to us&lt;/a> so we can work together!&lt;/p>
&lt;h2 id="explored-pathways-that-were-then-discarded">
Explored pathways that were then discarded
&lt;a class="header-anchor" href="#explored-pathways-that-were-then-discarded">#&lt;/a>
&lt;/h2>&lt;p>List of things that were tried and then decided as not good pathways:&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="https://github.com/consideRatio/repo2docker-service" target="_blank" rel="noopener" >repo2docker-service&lt;/a>, a separate JupyterHub service that could &lt;em>only&lt;/em> build images. As we worked on it, we realized that it was replicating a lot of features that BinderHub already has, so we pivoted to working on BinderHub directly instead.&lt;/li>
&lt;li>Building off of
&lt;a href="https://github.com/plasmabio/tljh-repo2docker" target="_blank" rel="noopener" >tljh-repo2docker&lt;/a>. While this already had a nice UI, it would be hard to port it to run on a distributed Kubernetes environment without it becoming a &amp;lsquo;hard fork&amp;rsquo;.&lt;/li>
&lt;/ul>
&lt;p>While these did slow down the implementation of the project, it has allowed us to be very confident that the methods we have chosen are long-term sustainable.&lt;/p>
&lt;h2 id="want-to-try-this-out">
Want to try this out?
&lt;a class="header-anchor" href="#want-to-try-this-out">#&lt;/a>
&lt;/h2>&lt;p>We have a demo of this running at
&lt;a href="https://imagebuilding-demo.2i2c.cloud" target="_blank" rel="noopener" >imagebuilding-demo.2i2c.cloud&lt;/a>, but unfortunately as we are still fine-tuning &lt;code>cryptnono&lt;/code> config, at this moment it is not open to the public. Please
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/jupyterhub-binderhub-gesis/mailto:yuvipanda@2i2c.org" >contact me&lt;/a> with your GitHub account if you want access, and promise to not be a cryptominer and you shall be granted access.&lt;/p>
&lt;p>Want to set this up on your own JupyterHub? There is some
&lt;a href="https://github.com/2i2c-org/binderhub-service/pull/72" target="_blank" rel="noopener" >work in progress&lt;/a> documentation and more is being worked on. Drop a line in the linked pull request and we&amp;rsquo;ll be happy to help. The eventual goal is for &lt;em>anyone&lt;/em> to be able to simply follow documentation and set this up for themselves.&lt;/p>
&lt;p>We also have user facing documentation on using this service on
&lt;a href="https://docs.2i2c.org/user/environment/dynamic-imagebuilding#dynamic-image-building" target="_blank" rel="noopener" >docs.2i2c.org&lt;/a>.&lt;/p>
&lt;h2 id="future-work">
Future work
&lt;a class="header-anchor" href="#future-work">#&lt;/a>
&lt;/h2>&lt;p>This is not complete of course, and there is a lot of future work to be done.&lt;/p>
&lt;ol>
&lt;li>mybinder.org also helps you distribute your &lt;em>content&lt;/em>, not just the environment for your code to run in. Since JupyterHub usually comes with a persistent home directory for the user,
&lt;a href="https://github.com/jupyterhub/nbgitpuller/" target="_blank" rel="noopener" >nbgitpuller&lt;/a> is commonly used for this purpose instead. We should explore ways to integrate nbgitpuller (and other ways to distribute content) in the future.&lt;/li>
&lt;li>More thorough documentation for how you can recreate what is in the demo for yourself in your own JupyterHub installation.&lt;/li>
&lt;li>Better UX for specifying images, including figuring out how to &amp;lsquo;save&amp;rsquo; them for future reuse.&lt;/li>
&lt;li>Better compatibility with mybinder.org, particularly in allowing other sources of environments (not just GitHub, but Zenodo, raw git repositories, etc) and URL compatibility.&lt;/li>
&lt;li>Better authentication workflow between the frontend and the BinderHub API.&lt;/li>
&lt;/ol>
&lt;h2 id="credit">
Credit
&lt;a class="header-anchor" href="#credit">#&lt;/a>
&lt;/h2>&lt;p>All this work would not be possible without a large group of collaborators!&lt;/p>
&lt;ul>
&lt;li>From 2i2c:
&lt;a href="https://github.com/consideRatio" target="_blank" rel="noopener" >Erik Sundell&lt;/a>,
&lt;a href="https://github.com/GeorgianaElena" target="_blank" rel="noopener" >Georgiana Elena&lt;/a>,
&lt;a href="https://words.yuvi.in/" target="_blank" rel="noopener" >Yuvi&lt;/a>,
&lt;a href="https://github.com/jmunroe" target="_blank" rel="noopener" >James Munroe&lt;/a>, and
&lt;a href="https://github.com/damianavila" target="_blank" rel="noopener" >Damián Avila&lt;/a>.&lt;/li>
&lt;li>The
&lt;a href="https://github.com/gesiscss/persistent_BinderHub/" target="_blank" rel="noopener" >persistent BinderHub&lt;/a> project was the direct inspiration for all this work, with particular thanks to
&lt;a href="https://github.com/bitnik" target="_blank" rel="noopener" >Kenan Erdogan&lt;/a>.&lt;/li>
&lt;li>The
&lt;a href="https://github.com/plasmabio/tljh-repo2docker" target="_blank" rel="noopener" >tljh-repo2docker&lt;/a> project, which explores similar ideas in the context of running only on a single node.&lt;/li>
&lt;li>The broad JupyterHub and MyBinder.org community, particularly
&lt;a href="https://github.com/manics" target="_blank" rel="noopener" >Simon Li&lt;/a> and
&lt;a href="https://github.com/minrk/" target="_blank" rel="noopener" >MinRK&lt;/a>.&lt;/li>
&lt;li>Funding generously provided by
&lt;a href="http://gesis.org" target="_blank" rel="noopener" >GESIS&lt;/a> in cooperation with NFDI4DS (project number:
&lt;a href="https://gepris.dfg.de/gepris/projekt/460234259?context=projekt&amp;amp;task=showDetail&amp;amp;id=460234259&amp;amp;" target="_blank" rel="noopener" >460234259&lt;/a>) and
&lt;a href="https://www.cessda.eu" target="_blank" rel="noopener" >CESSDA&lt;/a>.&lt;/li>
&lt;li>
&lt;a href="https://www.gesis.org/en/institute/staff/person/arnim.bleier" target="_blank" rel="noopener" >Arnim Bleier&lt;/a> from GESIS was &lt;em>instrumental&lt;/em> in making this project happen.&lt;/li>
&lt;/ul></description></item><item><title>A QGIS desktop in the cloud with JupyterHub</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/qgis-greenland/</link><pubDate>Sat, 05 Aug 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/qgis-greenland/</guid><description>&lt;p>
&lt;figure id="figure-the-qgreenland-researcher-workshophttpsqgreenland-workshop-2023-researchergithubio">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="QGreenland Researcher Workshop" srcset="
/blog/qgis-greenland/featured_hu9562f6f382c010541578bd7b61c7cb4a_187505_a085d691b13f577db24aead3ae21385c.webp 400w,
/blog/qgis-greenland/featured_hu9562f6f382c010541578bd7b61c7cb4a_187505_9bd1833b93cf6052c08ff407d4528940.webp 760w,
/blog/qgis-greenland/featured_hu9562f6f382c010541578bd7b61c7cb4a_187505_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/qgis-greenland/featured_hu9562f6f382c010541578bd7b61c7cb4a_187505_a085d691b13f577db24aead3ae21385c.webp"
width="760"
height="498"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
The
&lt;a href="https://qgreenland-workshop-2023-researcher.github.io/" target="_blank" rel="noopener" >QGreenland Researcher Workshop&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;/p>
&lt;p>JupyterHub is a versatile platform that can serve a desktop with Geospatial Information Systems (GIS) software in the cloud. This was demonstrated by the QGreenland Researcher Workshop that was hosted by the NASA
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cryocloud/" >CryoCloud&lt;/a> hub. The hands-on workshop trained 25-30 researchers, from Germany, India, France, Canada, Poland and the United States, on how to work with geospatial data in an open science framework.&lt;/p>
&lt;h2 id="qgreenland-overview">
QGreenland Overview
&lt;a class="header-anchor" href="#qgreenland-overview">#&lt;/a>
&lt;/h2>&lt;p>
&lt;a href="https://qgreenland.org/" target="_blank" rel="noopener" >QGreenland&lt;/a> is an open-source geospatial data package designed for QGIS, a community-owned GIS platform. It focuses on Greenland, offering researchers and educators a comprehensive toolset for FAIR (findable, accessible, interoperable and reproducible) data analysis. The package integrates a variety of datasets into a single, easy-to-use data-viewing and analysis platform, supporting both offline and online use. This makes it particularly valuable for remote fieldwork and areas with limited internet access.&lt;/p>
&lt;h2 id="workshop-success">
Workshop Success
&lt;a class="header-anchor" href="#workshop-success">#&lt;/a>
&lt;/h2>&lt;p>The QGreenland workshop demonstrated several key benefits of using JupyterHub for cloud-based GIS:&lt;/p>
&lt;ul>
&lt;li>Accessibility: Participants from across the world could access the same powerful GIS tools through a web browser, eliminating the need for complex local installations while enhancing reproducibility&lt;/li>
&lt;li>Cloud block storage: Using a JupyterHub in the cloud allowed for faster data access than a traditional NFS file store by provisioning each user with an elastic block store disk, reducing load times from 5 minutes to under 3 seconds.&lt;/li>
&lt;li>Cost Efficiency: Utilizing the
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/collaborators/cryocloud/" >CryoCloud&lt;/a> JupyterHub instance managed by 2i2c drastically cut down setup costs and time, with only minimal cloud operating expenses of roughly $1/person/day.&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion">
Conclusion
&lt;a class="header-anchor" href="#conclusion">#&lt;/a>
&lt;/h2>&lt;p>The success of the QGreenland workshop underscores the potential of integrating interactive software applications in JupyterHub. This approach not only democratizes access to advanced geospatial tools but also fosters a collaborative research environment. We look forward to supporting more workshops for QGreenland in the future!&lt;/p>
&lt;p>&lt;em>Want to know more? Check out the companion post by QGreenland on the
&lt;a href="https://blog.jupyter.org/desktop-gis-software-in-the-cloud-with-jupyterhub-ddced297019a" target="_blank" rel="noopener" >Jupyter Blog&lt;/a>&lt;/em>&lt;/p>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>
&lt;a href="https://cires.colorado.edu/people/trey-stafford" target="_blank" rel="noopener" >Trey Stafford&lt;/a>
&lt;a href="https://cires.colorado.edu/" target="_blank" rel="noopener" >(CIRES)&lt;/a>&lt;/li>
&lt;li>
&lt;a href="https://cires.colorado.edu/people/matthew-fisher" target="_blank" rel="noopener" >Matthew Fisher&lt;/a>
&lt;a href="https://cires.colorado.edu/" target="_blank" rel="noopener" >(CIRES)&lt;/a>&lt;/li>
&lt;li>*Fisher, M., *T. Stafford, T. Moon, and A. Thurber (2023). QGreenland (v3) [software], National Snow and Ice Data Center.&lt;/li>
&lt;li>Snow, Tasha, Millstein, Joanna, Scheick, Jessica, Sauthoff, Wilson, Leong, Wei Ji, Colliander, James, Pérez, Fernando, James Munroe, Felikson, Denis, Sutterley, Tyler, &amp;amp; Siegfried, Matthew. (2023).
&lt;a href="https://book.cryointhecloud.com" target="_blank" rel="noopener" >CryoCloud JupyterBook&lt;/a> (2023.01.26). Zenodo.
&lt;a href="https://doi.org/10.5281/zenodo.7576602" target="_blank" rel="noopener" >10.5281/zenodo.7576602&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>* Denotes co-equal lead authorship&lt;/p></description></item><item><title>CILogon usage at 2i2c</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/cilogon-integration/</link><pubDate>Fri, 24 Feb 2023 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/cilogon-integration/</guid><description>
&lt;h2 id="about-cilogon">
About CILogon
&lt;a class="header-anchor" href="#about-cilogon">#&lt;/a>
&lt;/h2>&lt;p>
&lt;a href="https://www.cilogon.org" target="_blank" rel="noopener" >CILogon&lt;/a> is an open source service provider that allows users to log in against over 4000 various identity providers, including campus identity providers. The available identity providers are members of
&lt;a href="https://incommon.org/federation/" target="_blank" rel="noopener" >InCommon&lt;/a>, a federation of universities and other organizations that provide single sign-on access to various resources.&lt;/p>
&lt;h2 id="cilogon-and-2i2c">
CILogon and 2i2c
&lt;a class="header-anchor" href="#cilogon-and-2i2c">#&lt;/a>
&lt;/h2>&lt;p>For the past year, 2i2c has been successfully using CILogon for more than fifteen of the hubs it manages.&lt;/p>
&lt;p>Currently, most of the hubs that use it are hubs for communities in education that want to manage their hub access through their own institutional providers.&lt;/p>
&lt;p>With using a tool like CILogon, we allow hub access to be managed both through the communities&amp;rsquo; institutional providers, but also through social providers like GitHub and Google. Because both authentication mechanisms can coexist, there&amp;rsquo;s no need to provide specific credentials for 2i2c staff in order to have access to the hub. This reduces both the burden on institution&amp;rsquo;s IT departments, but also the complexity of a hub deployment.&lt;/p>
&lt;p>Moreover, as we migrate away from our current Auth0 setup, the number of hubs using CILogon will further increase in the following year.&lt;/p>
&lt;h2 id="the-setup">
The setup
&lt;a class="header-anchor" href="#the-setup">#&lt;/a>
&lt;/h2>&lt;p>The setup that 2i2c uses, is based on two important tools, the CILogon administrative client and the JupyterHub CILogonOAuthenticator.&lt;/p>
&lt;h3 id="the-cilogon-administrative-client">
The CILogon administrative client
&lt;a class="header-anchor" href="#the-cilogon-administrative-client">#&lt;/a>
&lt;/h3>&lt;p>The
&lt;a href="https://cilogon.github.io/oa4mp/server/manuals/dynamic-client-registration.html" target="_blank" rel="noopener" >2i2c administrative client&lt;/a> provided by CILogon allowed us to automatically manage the CILogon OAuth applications needed for authenticating into the hub.&lt;/p>
&lt;p>For each hub that uses CILogon, we dynamically create an OAuth
&lt;a href="https://cilogon.github.io/oa4mp/server/manuals/dynamic-client-registration.html" target="_blank" rel="noopener" >client application&lt;/a> in CILogon and store the credentials safely, using the script at
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/3312f373f0aa59fbc98dc1c8161aa9623b68726b/deployer/cilogon_app.py" target="_blank" rel="noopener" >cilogon_app.py&lt;/a>. The script can also used for &lt;code>updating&lt;/code> the callback URLs of an existing OAuth application, &lt;code>deleting&lt;/code> a CILogon OAuth application when a hub is removed or changes authentication methods, &lt;code>getting&lt;/code> details about an existing OAuth application, &lt;code>getting all&lt;/code> existing 2i2c CILogon OAuth applications.&lt;/p>
&lt;h3 id="the-jupyterhub-cilogonoauthenticator">
The JupyterHub CILogonOAuthenticator
&lt;a class="header-anchor" href="#the-jupyterhub-cilogonoauthenticator">#&lt;/a>
&lt;/h3>&lt;p>For CILogon&amp;rsquo;s integration with JupyterHub&amp;rsquo;s authentication workflow, we&amp;rsquo;re using the
&lt;a href="https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/cilogon.py" target="_blank" rel="noopener" >&lt;strong>CILogonOAuthenticator&lt;/strong>&lt;/a>, which is part of the
&lt;a href="https://oauthenticator.readthedocs.io/en/latest/" target="_blank" rel="noopener" >JupyterHub OAuthenticator project&lt;/a>. This is what allows JupyterHub to use common OAuth providers for authentication, and it&amp;rsquo;s also a base for writing other Authenticators with any OAuth 2.0 provider.&lt;/p>
&lt;p>As part of this 2i2c integration with the JupyterHub CILogonOAuthenticator some important upstream fixes and enhancements to the
&lt;a href="https://github.com/jupyterhub/oauthenticator" target="_blank" rel="noopener" >&lt;code>oauthenticator&lt;/code>&lt;/a> were identified and performed. For example, the
&lt;a href="https://github.com/jupyterhub/oauthenticator/security/advisories/GHSA-r7v4-jwx9-wx43" target="_blank" rel="noopener" >GHSA-r7v4-jwx9-wx43&lt;/a> vulnerability was reported and fixed, and a
&lt;a href="https://oauthenticator.readthedocs.io/en/latest/how-to/migrations/upgrade-to-15.html" target="_blank" rel="noopener" >migration guide&lt;/a> containing a description of the breaking changes that were made, together with a step by step guide for the users on how to update their usage of JupyterHub CILogonOAuthenticator was provided.&lt;/p>
&lt;p>Read more about how CILogon is setup for use at 2i2c from
&lt;a href="https://infrastructure.2i2c.org/hub-deployment-guide/configure-auth/cilogon.html" target="_blank" rel="noopener" >the docs&lt;/a>.&lt;/p>
&lt;h2 id="celebration">
Celebration
&lt;a class="header-anchor" href="#celebration">#&lt;/a>
&lt;/h2>&lt;p>Thanks to the 2i2c - CILogon partnership, during this past year we were able to integrate CILogon into 2i2c&amp;rsquo;s infrastructure and to observe its importance, usefulness and great support for 2i2c and the communities we server.&lt;/p>
&lt;p>We are now happy to announce that the 2i2c - CILogon partnership has been expanded to another year!&lt;/p>
&lt;p>&lt;strong>Acknowledgements&lt;/strong>: The upstream
&lt;a href="https://oauthenticator.readthedocs.io/en/latest" target="_blank" rel="noopener" >&lt;code>jupyterhub-oauthenticator&lt;/code>&lt;/a> project mentioned in this post as being used at 2i2c is a JupyterHub package, kindly developed and maintained by the
&lt;a href="https://discourse.jupyter.org/c/jupyterhub/" target="_blank" rel="noopener" >JupyterHub community&lt;/a> and the 2i2c integration described was developed by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/organization/" >the 2i2c engineering team&lt;/a>. Also, this post was edited by
&lt;a href="https://jbasney.net/" target="_blank" rel="noopener" >Jim Basney&lt;/a>.&lt;/p></description></item><item><title>Tech update: Multiple JupyterHubs, multiple clusters, one repository.</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/ci-cd-improvements/</link><pubDate>Tue, 19 Apr 2022 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/ci-cd-improvements/</guid><description>&lt;p>2i2c manages the configuration and deployment of multiple Kubernetes clusters and JupyterHubs from
&lt;a href="https://github.com/2i2c-org/infrastructure" target="_blank" rel="noopener" >a single open infrastructure repository&lt;/a>.
This is a challenging problem, as it requires us to centralize information about a number of &lt;em>independent&lt;/em> cloud services, and deploy them in an efficient and reliable manner.
Our initial attempt at this had a number of inefficiencies, and we recently completed an overhaul of its configuration and deployment infrastructure.&lt;/p>
&lt;p>This post is a short description of what we did and the benefit that it had.
It covers the technical details and provides links to more information about our deployment setup.
We hope that it helps other organizations make similar improvements to their own infrastructure.&lt;/p>
&lt;h2 id="our-problem">
Our problem
&lt;a class="header-anchor" href="#our-problem">#&lt;/a>
&lt;/h2>&lt;p>2i2c&amp;rsquo;s problem is similar to that of many large organizations that have independent sub-communities within them.
We must centralize the operation and configuration of JupyterHubs in order to boost our efficiency in developing and operating them, but must also treat these hubs &lt;em>independently&lt;/em> because their user communities are not necessarily related, and because we want communities to
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/right-to-replicate/" >be able to replicate their infrastructure on their own&lt;/a>.&lt;/p>
&lt;p>A year ago, we built the first version of our deployment infrastructure at
&lt;a href="https://github.com/2i2c-org/infrastructure" target="_blank" rel="noopener" >&lt;i class='fa-brands fa-github'>&lt;/i> github.com/2i2c-org/infrastructure&lt;/a>.
Over the last year of operation, we identified a number of major shortcomings:&lt;/p>
&lt;ul>
&lt;li>Within a Kubernetes cluster, we deployed hubs sequentially, not in parallel. This grew out of a common practice of
&lt;a href="https://sre.google/workbook/canarying-releases/" target="_blank" rel="noopener" >Canary deployments&lt;/a> that allowed us to test changes on a &lt;strong>staging hub&lt;/strong> before rolling them out to a &lt;strong>production hub&lt;/strong>.&lt;/li>
&lt;li>We used a single configuration file for all hubs within a cluster, which led to confusion and difficulty in identifying a hub-specific configuration.&lt;/li>
&lt;li>Moreover, any change to a hub within a cluster caused a re-deploy of &lt;em>all hubs on that cluster&lt;/em>. This is because we did not know whether a given change touched cluster-wide configuration or hub-specific configuration.&lt;/li>
&lt;/ul>
&lt;h2 id="our-goal">
Our goal
&lt;a class="header-anchor" href="#our-goal">#&lt;/a>
&lt;/h2>&lt;p>So, we spent several weeks discussing a plan to resolve these major problems - here were our goals:&lt;/p>
&lt;ul>
&lt;li>We should be able to &lt;strong>upgrade a specific hub&lt;/strong> alone, by inspecting which configuration files have been added or modified.&lt;/li>
&lt;li>&lt;strong>Production hubs should be upgraded in parallel&lt;/strong> when they are effectively run independently.&lt;/li>
&lt;li>We should &lt;strong>use staging hubs as &amp;ldquo;canary&amp;rdquo; deployments&lt;/strong> and not continue upgrading production hubs if the staging hub fails.&lt;/li>
&lt;/ul>
&lt;h2 id="an-overview-of-our-changes">
An overview of our changes
&lt;a class="header-anchor" href="#an-overview-of-our-changes">#&lt;/a>
&lt;/h2>&lt;p>To accomplish this, we needed to identify which hub required an upgrade based on file additions/modifications.
This took a lot of discussion and iteration on design, and so we share it below in the hopes that it is helpful to others!&lt;/p>
&lt;h3 id="improvements-to-our-code-and-structure">
Improvements to our code and structure
&lt;a class="header-anchor" href="#improvements-to-our-code-and-structure">#&lt;/a>
&lt;/h3>&lt;p>We made a few major changes to
&lt;a href="https://github.com/2i2c-org/infrastructure" target="_blank" rel="noopener" >the infrastructure repository&lt;/a> to facilitate the deployment logic described above.
Here are the major changes we implemented:&lt;/p>
&lt;ul>
&lt;li>We separated each hub&amp;rsquo;s configuration into its own file, or set of files. For example,
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/master/config/clusters/2i2c/staging.values.yaml" target="_blank" rel="noopener" >here is 2i2c&amp;rsquo;s &lt;code>staging&lt;/code> hub configuration&lt;/a>.&lt;/li>
&lt;li>We created a separate &lt;code>cluster.yaml&lt;/code> file that holds the canonical list of hubs deployed to that cluster and the configuration file(s) associated with each one. For example,
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/master/config/clusters/2i2c/cluster.yaml" target="_blank" rel="noopener" >here is 2i2c&amp;rsquo;s GKE cluster configuration&lt;/a>, which contains a reference to the previously mentioned
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/master/config/clusters/2i2c/cluster.yaml#L14-L26" target="_blank" rel="noopener" >staging hub&lt;/a>.&lt;/li>
&lt;li>We updated
&lt;a href="https://github.com/2i2c-org/infrastructure/tree/master/deployer" target="_blank" rel="noopener" >our deployer module&lt;/a> to do the following things:
&lt;ul>
&lt;li>Inspect the list of files modified in a Pull Request.&lt;/li>
&lt;li>From this list, calculate the name of a hub that required an upgrade, and the name of its respective cluster.&lt;/li>
&lt;li>Trigger a GitHub Actions workflow that deploys changes in parallel for each cluster/hub pair.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>In addition to these structural and code changes, we also developed new GitHub Actions workflows that control the entire process.&lt;/p>
&lt;h3 id="a-github-actions-workflow-for-upgrading-our-jupyterhubs">
A GitHub Actions workflow for upgrading our JupyterHubs
&lt;a class="header-anchor" href="#a-github-actions-workflow-for-upgrading-our-jupyterhubs">#&lt;/a>
&lt;/h3>&lt;p>We defined a new GitHub Actions workflow that carries out the logic described above.
These are all defined in
&lt;a href="https://github.com/2i2c-org/infrastructure/blob/master/.github/workflows/deploy-hubs.yaml" target="_blank" rel="noopener" >this &lt;code>deploy-hubs.yaml&lt;/code> configuration file&lt;/a>.
Here are the major jobs in this workflow, and what each does:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;code>generate-jobs&lt;/code>: Generate a list of clusters/hubs that must be upgraded, given the files that are changed in a Pull Request.&lt;/p>
&lt;ul>
&lt;li>Evaluate an input list of added/modified files in a PR&lt;/li>
&lt;li>Decide if the added/modified files warrant an upgrade of a hub&lt;/li>
&lt;li>Generate a list of hubs and clusters that require upgrades, and some extra details:
&lt;ul>
&lt;li>Does the support chart that is deployed to the cluster also need an upgrade?&lt;/li>
&lt;li>Does a staging hub on this cluster require an upgrade?&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>This produced two outputs to be used in subsequent steps:&lt;/p>
&lt;ul>
&lt;li>A &lt;strong>human-readable table&lt;/strong> including information on &lt;em>why&lt;/em> a given deployment requires an upgrade (using the excellent
&lt;a href="https://github.com/Textualize/rich" target="_blank" rel="noopener" >Rich library&lt;/a>).&lt;/li>
&lt;li>&lt;strong>JSON outputs&lt;/strong> that can be interpreted by GitHub Actions as sets of matrix jobs to run.&lt;/li>
&lt;/ul>
&lt;figure id="figure-our-staging-and-support-hub-job-matrix-tells-github-actions-to-deploy-staging-and-support-upgrades-that-act-as-canaries-and-stop-production-deploys-if-they-fail">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our staging and support hub job matrix tells GitHub Actions to deploy staging and support upgrades that act as canaries and stop production deploys if they fail." srcset="
/blog/ci-cd-improvements/images/staging-hub-matrix_hu7a1bb3fb06e3f581f944c2d267a10ff9_107479_c22eca1370111aa2970fd6f3a1e28585.webp 400w,
/blog/ci-cd-improvements/images/staging-hub-matrix_hu7a1bb3fb06e3f581f944c2d267a10ff9_107479_c450c36b33a99013d3cbbbf4d20f017f.webp 760w,
/blog/ci-cd-improvements/images/staging-hub-matrix_hu7a1bb3fb06e3f581f944c2d267a10ff9_107479_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/ci-cd-improvements/images/staging-hub-matrix_hu7a1bb3fb06e3f581f944c2d267a10ff9_107479_c22eca1370111aa2970fd6f3a1e28585.webp"
width="760"
height="529"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Our staging and support hub job matrix tells GitHub Actions to deploy staging and support upgrades that act as canaries and stop production deploys if they fail.
&lt;/figcaption>&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>&lt;code>upgrade-support-and-staging&lt;/code>: Update the support and staging Helm charts on each cluster. These are &amp;ldquo;shared infrastructure&amp;rdquo; Helm charts that control services that are shared across all hubs.&lt;/p>
&lt;ul>
&lt;li>Accepts the JSON list described above to determine what to do next&lt;/li>
&lt;li>Parallelises over clusters&lt;/li>
&lt;li>Upgrades the support chart of each if required&lt;/li>
&lt;li>Upgrades a staging hub for the cluster if required (for canary deployments, this is always required if at least one production hub is to be upgraded on the cluster)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;code>filter-generate-jobs&lt;/code>: Allows us to treat the support / staging hubs as canary deployments for all the production hubs on a cluster.&lt;/p>
&lt;ul>
&lt;li>If a staging/support hub deploy fails, removes any jobs for the corresponding cluster.&lt;/li>
&lt;li>Allows production deploys to continue on &lt;em>other clusters&lt;/em>.&lt;/li>
&lt;/ul>
&lt;figure id="figure-our-production-hub-job-matrix-tells-github-actions-which-hubs-to-update-with-new-changes-these-are-triggered-if-a-clusters-stagingsupport-job-does-not-fail">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Our production hub job matrix tells GitHub Actions which hubs to update with new changes. These are triggered if a cluster&amp;#39;s staging/support job does not fail." srcset="
/blog/ci-cd-improvements/images/prod-hub-matrix_huad3521b0ae4afb8512dab5e3fdf016b6_36691_e0646a77211fee9ce2bb65237f8949ce.webp 400w,
/blog/ci-cd-improvements/images/prod-hub-matrix_huad3521b0ae4afb8512dab5e3fdf016b6_36691_f5a5462797c024cb828e58497c4a1c1d.webp 760w,
/blog/ci-cd-improvements/images/prod-hub-matrix_huad3521b0ae4afb8512dab5e3fdf016b6_36691_1200x1200_fit_q75_h2_lanczos_3.webp 1200w"
src="https://deploy-preview-609--2i2c-org.netlify.app/blog/ci-cd-improvements/images/prod-hub-matrix_huad3521b0ae4afb8512dab5e3fdf016b6_36691_e0646a77211fee9ce2bb65237f8949ce.webp"
width="760"
height="515"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Our production hub job matrix tells GitHub Actions which hubs to update with new changes. These are triggered if a cluster&amp;rsquo;s staging/support job does not fail.
&lt;/figcaption>&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>&lt;code>upgrade-prod-hubs&lt;/code>: Deploy updates to each production hub.&lt;/p>
&lt;ul>
&lt;li>Accepts the JSON list described above to determine what to do next&lt;/li>
&lt;li>Parallelises over each production hub that requires an upgrade&lt;/li>
&lt;li>Deploy the relevant changes to that hub&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="concluding-remarks">
Concluding Remarks
&lt;a class="header-anchor" href="#concluding-remarks">#&lt;/a>
&lt;/h2>&lt;p>We think that this is a nice balance of infrastructure complexity and flexibility.
It allows us to separate the configuration of each hub and cluster, which makes each more maintainable by us, and is more aligned with a community&amp;rsquo;s
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/right-to-replicate/" >Right to Replicate&lt;/a> their infrastructure.
It allows us to remove the interdependence of deploy jobs that do not &lt;em>need&lt;/em> to be dependent, which makes our deploys more efficient.
Finally, it allows us to make &lt;em>targeted deploys&lt;/em> more effectively, which reduces the amount of toil and unnecessary waiting associated with each change. (It also
&lt;a href="https://github.blog/2021-04-22-environmental-sustainability-github/" target="_blank" rel="noopener" >reduces our carbon footprint by reducing unnecessary GitHub Action time&lt;/a>).&lt;/p>
&lt;p>We hope that this is a useful resource for others to follow if they also maintain JupyterHubs for multiple communities.
If you have any ideas of how we could further improve this infrastructure, please reach out on GitHub!
If you know of a community that would like 2i2c to
&lt;a href="https://2i2c.org/service/" target="_blank" rel="noopener" >manage a hub for your community&lt;/a>, please
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/blog/ci-cd-improvements/mailto:hello@2i2c.org" >send us an email&lt;/a>.&lt;/p>
&lt;p>&lt;em>&lt;strong>Acknowledgements&lt;/strong>: The infrastructure described in this post was developed by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/organization/team.md" >the 2i2c engineering team&lt;/a>, and this post was edited by
&lt;a href="https://deploy-preview-609--2i2c-org.netlify.app/author/chris-holdgraf" >Chris Holdgraf&lt;/a>.&lt;/em>&lt;/p></description></item><item><title>Incident report: Brief description of the incident</title><link>https://deploy-preview-609--2i2c-org.netlify.app/blog/incident-report/</link><pubDate>Wed, 01 Jan 1000 00:00:00 +0000</pubDate><guid>https://deploy-preview-609--2i2c-org.netlify.app/blog/incident-report/</guid><description>&lt;p>On MMMM DD, YYYY our cloud infrastructure team experienced an incident with the XXXXX community hub. [See this issue for the full report](LINK TO ISSUE IN 2i2c-org/incident-reports).&lt;/p>
&lt;h2 id="what-happened">
What happened
&lt;a class="header-anchor" href="#what-happened">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>___ thing happened.&lt;/li>
&lt;li>This resulted in ___.&lt;/li>
&lt;li>It happened because ___.&lt;/li>
&lt;li>It was resolved by ___.&lt;/li>
&lt;/ul>
&lt;h2 id="what-we-learned">
What we learned
&lt;a class="header-anchor" href="#what-we-learned">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>We need to ___.&lt;/li>
&lt;li>We learned that ___.&lt;/li>
&lt;li>This will happen again if ___.&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgements">
Acknowledgements
&lt;a class="header-anchor" href="#acknowledgements">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Thanks to ___ for helping us identify the problem.&lt;/li>
&lt;li>Thanks to ___ for helping us implement a solution.&lt;/li>
&lt;/ul>
&lt;h2 id="extra-examples">
EXTRA EXAMPLES
&lt;a class="header-anchor" href="#extra-examples">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>contentblog/2025/incident-ucmerced-user-throttling/index.mdindex.md&lt;/li>
&lt;/ul></description></item></channel></rss>