Experiencing disruptions

Mantenimiento de transformador de edificio / Building transformer maintenance →

Batch System User Interfaces Computing Elements Indico Agenda pages Wordpress pages OpenStack Compute Nodes

[Update: 29 nov. 12am]

  • HPC Altamira is operative again.
  • Working in making HTC operative.

[Update: 28 nov. 11am]

  • Some issues with HPC to make new slurm nodes available with the new gpfs storage.
  • HTC is down yet.
  • Cloud is operative and the virtual machines are up. Btw, some machines were in ERROR state or do not boot properly. They will be reviewed during today.
  • New Cloud and HPC physical machines are installed but not up.

[English version below]

Estimados/as usuarios/as,

Por motivos de…

Altamira supercomputer   (?) Altamira supercomputer related systems.
Batch System   (?) Slurm batch system for Altamira Disrupted
Login nodes   (?) Altamira login nodes (login1, login2) Operational
Cloud Infrastructure   (?) OpenStack Cloud infrastructure.
Grid and HTC   (?) General purpose batch system and high throughput compute system.
Web and miscelaneous services   (?) Web services, wiki pages and other services.
AAI   (?) Authentication, Authorization and Identity systems.
Networking   (?) Internal and external networking.
Storage systems   (?) Distributed storage systems.

Incident history

November 22, 2023 at 5:18 PM UTC

Mantenimiento de transformador de edificio / Building transformer maintenance

▲ This issue is not resolved yet
November 8, 2023 at 9:41 AM UTC

Nextcloud Upgrade

Resolved after 1h 49m of downtime
October 19, 2023 at 6:00 AM UTC

Actualización de Slurm / Slurm upgrade

Resolved after 31h 0m of downtime
September 7, 2023 at 9:15 AM UTC

Problema con red cloud / Cloud network problem

Resolved after 100h 0m of downtime
April 10, 2023 at 3:34 AM UTC

General power failure

Resolved after 3h 54m of downtime
March 6, 2023 at 8:00 AM UTC

Actualización de router / Router upgrade

Resolved after 4h 0m of downtime
February 8, 2023 at 11:20 AM UTC

External Network vlan change

Resolved after 20h 40m of downtime
February 1, 2023 at 9:15 AM UTC

Spectrum Scale low performance

Resolved after 239h 59m of downtime
January 19, 2023 at 7:30 AM UTC

Nextcloud Upgrade

Resolved after 60m of downtime
December 14, 2022 at 8:55 AM UTC

Cloud DHCP failed

Resolved after 1h 4m of downtime

←   Previous     1 / 5     Next   →