Posted on Leave a comment

Nutanix Hardware diagnostics

Disk

(Note: If using AHV and ESXi run these commands from CVM. If Hyper-V run from the host.)

  • List the disks

lsscsi

  • Tests the health of an individual drive.

sudo smartctl -x /dev/sdX -T permissive

  • List the internal LSI cards links and their status

sudo ./lsiutil -a 12,0,0 20

  • Check for any errors on the LSI card and internal links

sudo /usr/local/nutanix/cluster/lib/lsi-sas/lsiutil -p 1 -a 12,0,0,0 20

  • Print hades config for the disks

edit-hades -p

For hyper-V hosts these commands can be found under C:\Program Files\Nutanix\Utils\

Power Supply

  • Check if PSU is online

ESXi: /ipmitool sensor list | grep -i PS

AHV: ipmitool sensor list | grep -i PS

Hyper-V: ipmiutil.exe sensor list | grep -i PS. The output of each of these should show PS1 and PS2 with a value of 0x1 or 01. This indicates the PSU is OK and functional. Values other than this could indicate problematic PSU.

  • Check the power distribution information for each PSU (Run from CVM):

for i in $(ipmiips); do echo “Node with IPMI IP $i” &&  /home/nutanix/foundation/lib/bin/smcipmitool/SMCIPMITool $i ADMIN ADMIN pminfo; done

Posted on Leave a comment

Nutanix Terminology

Technologies Covered

ACS – Acropolis Container Services is an opinionated kubernetes-based container service that is offered to customers. ACS will offer several pre-configured open-source services, such as Elastic Search/Fluentbit/Kibana (EFK) as a logging stack, flannel for networking, Prometheus for alerts, docker as a container runtime. It also comes with the Kubernetes volume plugin (KVP) pre-loaded and configured.

Genesis – The key role of genesis is to be the cluster component & service manager. Genesis is a process which runs on each node and is responsible for any services interactions (start/stop/etc.) as well as for the initial configuration. It is also a process which runs independently of the cluster and does not require the cluster to be configured/running. The only requirement for Genesis to be running is that Zookeeper is up and running. The cluster_init and cluster_status pages are displayed by the Genesis process.

Hades – Hades is a disk monitoring service that handles all base hard drive operations, such as disk adding, formatting, and removal. Its purpose is to simplify the break-fix procedures for disks and to automate several tasks that previously required manual user actions.

Hardware – Includes the nodes, chassis, disks, memory and any of the other physical components that may need to investigated / fixed. To see the current list of supported NX hardware, please click here. OEM vendors are responsible for their own hardware support, but we can take a cursory look at any potential hardware-related issues.

Kubernetes – Kubernetes (k8s) is an Open-Source Container orchestration tool that allows for pluggable addons using standard interfaces like Container Network Interface (CNI), Container Runtime Interface (CRI), and most recently, Container Storage Interface (CSI). Kubernetes is a core component of other Nutanix offerings, including Acropolis Container Services (ACS), MicroService Platform (MSP), and Sherlock. Nutanix also offers a storage plugin for k8s, known as the Kubernetes Volume Plugin (KVP).

LCM/Upgrades – LifeCycle Manager (LCM) is the 1-click upgrade process for firmwares and softwares on Nutanix clusters. This feature can be accessed through the Prism UI. LCM provides 2 operations:(1) Inventory: Detects what can be managed on a cluster and (2) Update: Performs an update to a certain version.

MSP  – Micro Service Platform is an internal-only platform to deploy and manage microservices.  MSP is composed of two pieces, Central MSP and Services MSP, which break apart management plane from control and data plane respectively. MSP VMs will provide various standardized services to all of the applications running on the MSP cluster, such as the EFK stack for logging, Prometheus for alerting, Panacea integration and LCM support.

Zookeeper  – Zookeeper’s key role is to act as the cluster configuration manager. Zookeeper stores all of the cluster configuration including hosts, IPs, state, etc. and is based upon Apache Zookeeper. This service runs on three nodes in the cluster, one of which is elected as a leader. The leader receives all requests and forwards them to its peers. If the leader fails to respond, a new leader is automatically elected. Zookeeper is accessed via an interface called Zeus.

Stargate – Stargate is considered the IO manager for the Acropolis Distributed Storage Fabric (ADSF). Stargate is the primary process responsible for reading and writing information to disks in the cluster. In this regard, it is extremely efficient and powerful, but it relies on other processes for functions such as maintenance of data, or determining where the data is physically located (metadata). Stargate is also responsible for “in-line” data operations (such as fingerprinting, compression), and works in concert with the Curator process to execute post-process operations (post-process dedup, post-process compression, EC-X). Stargate consists of two main locations to place data, Oplog and extent store, where we can differentiate between different IO patterns and place data in the most sensible location based on if it’s random or sequential IO.

Cassandra/Medusa – Cassandra is the process the maintains our metadata tables across the cluster. Cassandra is based on the open source Apache Cassandra project, with a heavy amount of modifications from development over the years since it’s adoption. Medusa is an interface that other processes utilize to interact with cassandra.

Curator – If Stargate is the “IO manager”, that would make Curator the “data-at-rest manager”, indicating that it takes care of operations that occur on data after the data has been initially written to the disk. We mentioned the post-process transforms above, but curator is also responsible for tasks like detecting corruption, balancing disk usage, ILM down-migrations, data deletions, etc. Curator scans the metadata table on a periodic and an on-demand basis (as of AOS 5.0), resulting in the ability to target specific problem areas or just doing general maintenance of the data on the cluster.

Pithos – Pithos is a wrapper for vdisk information within a Nutanix cluster. Pithos is also used to store critical information used by the Acropolis process for VM management.

Data Transforms – Compression, Deduplication (Dedup), and Erasure Coding (EC-X) are the three data transforms that Nutanix uses. All of these are used for space-saving for customer storage, but all three have different applications and best-practices.

OSS – Object Storage Service (OSS) is the storage and retrieval of unstructured blobs of data and metadata using an HTTP API. Instead of breaking files down into blocks to store it on disk using a filesystem, it deals with whole objects stored over the network. These objects could be an image file, logs, HTML files, or any self-contained blob of bytes. They are unstructured because there is no specific schema or format they need to follow.

ABS – Acropolis Block Services (ABS) is a native scale-out block storage solution that provides direct block-level access via the iSCSI protocol to the Acropolis Distributed Storage Fabric (ADSF). It enables enterprise applications running on external servers to leverage the benefits of the hyperconverged Nutanix architecture.

Cerebro – Cerebro is responsible for the replication and DR capabilities of DSF.  This includes the scheduling of snapshots, the replication to remote sites, and the site migration/failover.

Async DR – Async DR allows snapshots for VMs on the remote Nutanix cluster. These snapshots can be used to restore or cloned an existing VM. Async DR can also be used for disaster recovery to power VMs up on the remote site.

Near-Sync DR – Near-Sync DR leverages a new snapshotting technology known as Lightweight Snapshots (LWS) to execute faster snapshots (down to a 1 minute RPO). These LWS snapshots roll up to a regular 15 minute snapshot.

Metro Availability – Metro Availability (Metro) provides stretch cluster capabilities which allow for shared storage to span two Nutanix clusters. This provides a near 0 RTO and a RPO of 0.

3rd Party Backups – There are several 3rd Party Backups that customers often use. It’s useful to understand some of how these work since we frequently get questions about how they work with Nutanix. Some of these include HYCU, Veeam, AWS, Azure, Commvault, Rubrik, and Veritas.

Posted on Leave a comment

1 Korintus 2:9

1 Korintus 2:9 (TB) Tetapi seperti ada tertulis: “Apa yang tidak pernah dilihat oleh mata, dan tidak pernah didengar oleh telinga, dan yang tidak pernah timbul di dalam hati manusia: semua yang disediakan Allah untuk mereka yang mengasihi Dia.”