Releases: clustervision/trinityX
Releases · clustervision/trinityX
Release 15.1u1
Bug fixes
- Hot-Fix: Fix version to 4.0.3 for Open Ondemand
- Added CRB repository for non-OpenHPC based slurm installation
- AlertX Drainer will not start when slurm is not installed
Release 15.1
New features
- Login image support. The login image will provide for Open OnDemand functionality and Shell access for users
- OpenSuse image support. A playbook is now provided to build an opensuse image
- External Floating IP support for HA setups where access to the active controller is needed
- Improved AlertX functionality:
- Silencing Alerts is now supported
- A more scalable approach in Overview for larger clusters
- Rules for HA related degradations
- Introduction of python3 libraries to render configurations for slurm, genders and more:
- Luna based configurations are now extendable by pre-configured defaults
- Support for GRES resources, based on a dedicated gres.conf file
Release 15u2
New features
- Support for panfs, lustre, gpfs and beegfs as external source for shared_fs_disk, used in HA setups.
- Introduction of manual fstype in shared_fs_disk, used in HA to tell TrinityX that the admin will take care of the mount point.
- Legacy prometheus rules disable task, to pave the road for TrinityX 14.x upgrade to 15.
Bug fixes
- luna 2.1 introduced a newer way setting up interfaces during iPXE, but this broke cloud provisioning. Fixed by introducing allowing to skip the looped interface discovery.
Release 15u1
Bug fix
- fix for fallback mechanism while setting up HA without using any provided shared_fs_disks resulted in a playbook termination
Release 15
New features
- AlertX - commandline and graphical application to manage prometheus alerts, rules and manage Node Health Checking (NHC)
- NHC drainer - nodes triggered by the NHC rule are drained from jobs. currently slurm supported.
- Per Job statistics - detailed breakdown per job for resource utilization and power consumption
- Beta ARM support. Note that currently only homogeneous clusters are supported. Controller(s) and nodes are expected to be the same architecture, ARM+ARM and x86+x86
- additional prometheus exporters for collecting more metrics including GPU, Hardware config and state
- OOD application for changing a user’s password
- Improved/extended grafana panels
- luna 2.1
- Open Ondemand 4.0.0
- latest OpenHPC release 2.9 for EL8 and 3.2.1 for EL9
- Prometheus 3.1.0
- HA setups support cross mount shared disk exports, allowing passive/standby controllers to access the shared filesystems
Release 14.4u4
Fixes
- nfs or direct mounts for shared_fs_disks fix
- added lchroot wrapper foor OOD OSImage app
- openldap role to better handle non posix/attr compliant filesystems
- mariadb datadir overlap fix for HA setups
Release 14.4u3
Fixes
- Code Server recent commit broke form.yml.erb. TrinityX now uses a fixed release.
Release 14.4u2
Fixes
- Better reporting on Selinux mismatch
- Better hostname checking
- Improved external fqdn discovery which is a key component for Grafana and OOD
- Fixes for reported issues
Release 14.4u1
Fixes
- Ubuntu image creation work arounds for RH family debootstrap/Ubuntu repository name changes
- multi host, e.g. HA controllers inventory hosts file based Ansible runs working as intended again
- Removal of legacy roles and code
Release 14.4
New features
- Kubernetes K3s integration
- ZFS grafana panel
- Infiniband exporter added for Infiniband analyzer link troubleshooter support
- Introducing password quality constraints
- Retries added to dnf/yum Ansible calls to overcome troubled or busy repository fails
Fixes
- Openldap fix to not start on H/A pairs on boot time
- Epel repo link fix to point to latest release