Experiencing disruptions

WN de cloud network problem →

Computing Elements

Investigating - We are investigating a potential issue that might affect the uptime of one our of services. We are sorry for any inconvenience this may cause you. This incident post will be updated once we have more information.


HPC - Broken Omnipath switch →

Batch System

Tenemos dos switchs Omnipath y uno se ha roto en los nodos de hpc. Estos son los que realizan el intercambio de mensajes para jobs paralelizados, cola compute de Slurm. Estamos a la espera que nos manden recambio

-– — —

We have two Omnipath switches, and one has broken in the HPC nodes. These are the ones that exchange messages for parallelized jobs, Slurm compute queue. We are waiting for a replacement to be sent to us.


Altamira supercomputer   (?) Altamira supercomputer related systems.
Batch System   (?) Slurm batch system for Altamira Maintenance
Login nodes   (?) Altamira login nodes (login1, login2) Operational
Cloud Infrastructure   (?) OpenStack Cloud infrastructure.
Grid and HTC   (?) General purpose batch system and high throughput compute system.
Web and miscelaneous services   (?) Web services, wiki pages and other services.
AAI   (?) Authentication, Authorization and Identity systems.
Networking   (?) Internal and external networking.
Storage systems   (?) Distributed storage systems.

Incident history


September 28, 2024 at 11:31 AM UTC

Incidencia eléctrica urgente / Urgent electrical incident

Resolved in under a minute
September 12, 2024 at 8:37 AM UTC

Mantenimiento de transformador de edificio / Building transformer maintenance 2024

Resolved after 29h 22m of downtime
August 6, 2024 at 7:32 AM UTC

Cinder update to Wallaby  ℹ

During the service upgrade, no new volumes or snapshots of volumes can be created. All volumes already created can be used without any problem.
July 23, 2024 at 6:30 AM UTC

[CLOUD] Upgrade Compute and Network OpenStack services

Resolved in under a minute
April 11, 2024 at 12:02 PM UTC

Database down

Resolved after 3h 57m of downtime
January 29, 2024 at 8:00 AM UTC

Cloud OpenStack cluster disrupting failures

Resolved after 48h 42m of downtime
December 18, 2023 at 8:00 AM UTC

Upgrade Slurm urgently because of multiple CRITICAL risk vulnerabilities

Resolved after 5h 0m of downtime
November 22, 2023 at 5:18 PM UTC

Mantenimiento de transformador de edificio / Building transformer maintenance

Resolved after 110h 11m of downtime
November 8, 2023 at 9:41 AM UTC

Nextcloud Upgrade

Resolved after 1h 49m of downtime
October 19, 2023 at 6:00 AM UTC

Actualización de Slurm / Slurm upgrade

Resolved after 31h 0m of downtime

←   Previous     3 / 8     Next   →