verify_testcell_config.sh

The verify_testcell_config.sh script was created to provide a quick means of checking certain system parameters. Although it was primarily developed for developers to quickly  assess how a system has been configured, it is a good tool for customers to use after building a new computer.

The script does a variety of checks, then outputs information to a terminal in 3 different sections:

SYSTEM INFORMATION

CYFLEX SPECIFIC INFORMATION

HARDWARE/DRIVER INFORMATION

A brief explanation of these checks are listed below.

################## SYSTEM INFORMATION ##################

Kernel version
Displays the current kernel running on this system.
References documentation on cyflex.com at any given time for what kernel you should be running.
To check the kernel yourself from the command line, execute the following:

uname -r

Cummins Mail RPM Installed
For Cummins customers, this rpm attempts to automatically configure the necessary system files to allow sending automated emails from scripts, or from CyFlex programs such as event response.
If it is not installed, you can execute

sudo yum install cyflex_mail_cummins_ctc

ntpd RPM Installed
The Network Time Protocol daemon (nptd) is an operating system program that maintains the system time in sync with a designated server. For Cummins customers, you can install this with the following command:

sudo yum install cyflex_ntp_ctc

For non-Cummins customers, reference cyflex.com or Oracle documentation for configuring the daemon and selecting a time server to sync with. What is most important is to ensure that you’re able to reach the server you select, and all the computers at your site are syncing to the same time server.

ntp source
This displays your timeserver source by parsing the

chronyc tracking

command output. Execute that command to see more information about your time server connection.

ntp source reachable
This simply checks that the computer is able to ping the time server it is configured to sync with. If you are not able to ping the time server, remedy this to ensure your system clock is properly synced.

Firewall service disabled
This will indicate Yes if the firewall systemd service has been disabled. For opto22 and gantner IO to properly work, this must be disabled. For more information execute

systemctl status firewalld.service

Kernel update excluded
This will indicate Yes if

exclude=kernel*

is found in the /etc/yum.conf file. This is done to ensure a user doesn’t unintentionally perform a kernel upgrade when updating another package.

Core dump limit correctly set
For core dumps to work correctly, the output of

ulimit -c

should be 819200. If the limit is not correctly set on your system, execute the following:

sudo yum install security_limits

or if it is already installed but out of date,

sudo yum update security_limits

Locked memory correctly set
For floger to work correctly, the output of

ulimit -c

should be 819200. If the limit is not correctly set on your system, execute the following:

sudo yum install security_limits

or if it is already installed but out of date,

sudo yum update security_limits

Autologin enabled
Indicates Yes if AutomaticLoginEnable=TRUE is found in /etc/gdm/custom.conf

Automatic login user
Indicates what user is configured to automatically log in.
This should be the user following the AutomaticLogin= argument in /etc/gdm/custom.conf
This will only be displayed if Autologin is enabled.

vnc service enabled
This will indicate Yes if the x0vncserver.service is found to be enabled by systemd. It must be enabled in order for it to automatically start after boot.
For more details on this service, execute the following

systemctl --user status x0vncserver.service

VNC services are configured during the system installation process via the post_install.8.6.sh script if the user checks the “Install VNC” checkbox. Reference the Oracle installation document on cyflex.com for additional details.

vnc service active
This will indicate Yes if the x0vncserver.service is found to be active by systemd. For more details on this service, execute the following

systemctl --user status x0vncserver.service

System crash rpms:
The next 5 sections detail system configuration used to help troubleshoot system crashes. These are not required, but are helpful in troubleshooting when computers mysteriously shut themselves down or freeze.
For installation instructions on any of these, reference section 5 of the installation manual “Configuring Oracle Linux 8.x to Capture a Crash Dump”
https://cyflex.com/wp-content/uploads/Oracle-64-Bit-Install.pdf

crash rpm installed
This will indicate Yes if the crash rpm is installed and No if it is not. The crash rpm installs the crash program, which is used to analyze core dump files. This rpm is not required – but is useful when  troubleshooting system crash scenarios.

kernel debug installed
This will indicate Yes if kernel-uek-debuginfo rpm is installed for your current kernel version. This rpm is not required – but is useful when troubleshooting system crash scenarios.

kdump service loaded
This will indicate Yes if the kdump systemd service is loaded. The service being loaded means that if the system were to freeze during this session, a crash file would be generated. This does not indicate that the system is enabled to automatically start on boot. To see more information, execute the following:

systemctl status kdump

kdump service enabled
This will indicate Yes if the kdump systemd service is enabled, meaning it will automatically load on boot. To see more information, execute the following:

systemctl status kdump

Number of recorded system crashes
This counts the number of directories located in the /var/crash directory, which indicates the number of times the system has captured a kernel crash. If the directory doesn’t exist, this value will display 0.

Home and root directories < 80% capacity
This check parses the df -h command to check the Use% value for -root and -home.
This isn’t a full-proof check – but most customers tend to put specs, cell, data in one of these locations and cyflex resides in ol-root, so this ensures those particular partitions are not nearing capacity.

To see more details on your disc space, execute

df -h

for a simplified human-readable output, or see the man page for df for more possible arguments.

If you get “N/A” for this entry, the parsing didn’t work with your partitioning method. This doesn’t indicate a problem, just use the df -h command to verify your disk capacity.

rc-local service enabled
This will indicate Yes if the rc-loal service is enabled, meaning it will automatically load on boot. For more information on the status of your specific rc-local service, execute the following

systemctl status rc-local

rc-local is a legacy function still carried over where the commands located in /etc/rc.local will be executed when the computer boots. As the header of this file indicates, it is recommended to move away from this and generate standalone systemd services for this functionality. However, it is is still supported at this time (so long as it works…).

NOTE the service is indeed rc-local and the file is rc_local. This is not a typo.

rc-local service loaded
This will indicate Yes if the rc-local service is loaded. This essentially means the rc.local script was executed, either during boot or manually if someone ran

sudo systemctl start rc-local.service

from the command line during troubleshooting.

rc-local script successfully executed
This will indicate Yes if the rc-local service has a result of “success”. If you have an error somewhere in your /etc/rc.local script, then the service may be successfully executing the script, but the script itself is failing. There are ample reasons it could be failing, such as

-A syntax error in the script itself
-You’re executing a command within the script that is failing
-A boot order-of-operations issue. This is very common. An example of this issue would be trying to set permissions on a serial port, and at the time the script is executed, that serial port hasn’t been created yet. If this occurs, it is recommended you create a dedicated service and ensure it runs later during the boot process. Consult Oracle Linux documentation for instructions for this, or reach out to TRP for assistance.

A starting point for troubleshooting this would be to execute

systemctl status rc-local.service

and see if the output gives any indication of the issue.

You can also run

systemctl stop rc-local.service
systemctl start rc-local.service

To restart the service and see if it is successful. If it succeeds from the command line but not during boot, it is most likely a boot order issue.
If restarting still isn’t successful, go through your script line-by-line and investigate why the commands are failing. Depending on the command, you may be able to execute each command (as root) from the command line and investigate which one is failing.

Additionally, you could disable the service by executing

systemctl disable rc-local.service
slay_stuff 
      ** if cyflex is running
systemctl reboot

After this, copy your rc.local script to /tmp, make it executable, and add

set -x

to the top of the file. Now execute it from the terminal (as root) and investigate why it is failing based on the output you see in the terminal.

######### CYFLEX SPECIFIC INFORMATION ###############

SRR module loaded
SRR RPM Installed
SRR is the driver that handles timer events for CyFlex. This must be installed and loaded in order for CyFlex to run.

If it is not installed, you can execute

sudo yum install srr-64-devel

SVN_SPEC_HOST set
This will indicate Yes if SVN_SPEC_HOST is found in /etc/profile.d/cyflex.sh
This is only required if you are using SVN to back up your test cells files.

/cell/cell_name exists
This will indicate Yes if this file exists. It does not do any further checks that it is configured correctly.

/cell/site_special exists
This will indicate Yes if this file exists. It does not do any further checks that it is configured correctly.

SITE defined in /cell/site_special
This will indicate Yes if SITE is defined in /cell/site_special. It does not verify it is configured correctly.

/.cellspecs/<site name>/<cell name> exists
This will indicate Yes if this directory exists. If using svn for your test cell backup, this directory is where your backup files will be located and is where the specsbackup script will execute the relevant check out/in commands for your files.
This is only required if you are using SVN to back up your test cells files.

SVN_SPEC_HOST in .cellspecs//<site name>/<cell name>
This is only required if you are using SVN to back up your test cells files.

All /data/ subdirs exist
This verifies all subdirectories of data that are (potentially) required for CyFlex operation have been created.
If this indicates No, you can remedy this by using the mk_data_dirs_tc command. Read the usage page of that command to learn more.

Below is a list of directories which must exist for this to indicate Yes:

compressed6
compressed
meters/occ
meters/utl
meters/occ_complete
transfer
transfer/PAM_datapoint
transfer/PAM_header
transfer/PC_format
transfer/logrPAM_data
transfer/mua_data
transfer/mcparts
PAMtestids errors
mua_data mcparts
PC_format
PAM_datapoint
spc
PAM_header
turbo
fuel_log
logrPAM_header
logrPAM_data
faults
utl
utl/complete
utl/ready
utl/hold
utl/ready
dlog
dlog/ready
dlog/hold
dlog/complete
dlog/logging
darts_datapoint

All suid bits set correctly
Certain cyflex programs need suid bits set a particular way so they can run with root privileges. This should all be handled during the standard CyFlex install. If this script indicates the bits are not set correctly, follow instructions on cyflex.com to reinstall your particular cyflex version.
If that doesn’t correct the issue, contact TRP.

Qt 5 Version Installed
Displays the current version of Qt 5 that is installed.
Reference cyflex.com documentation for what version is the latest.

CyFlex version installed
Displays what cyflex version is installed.

Correct java rpms installed
For cyflex version 7.0.20 and later you should NOT have cyflex_java rpm installed and you should have the cyflex_sys_log_replicate rpm installed. This line will indicate YES if this is the case and NO if it is not.

CyFlex Error Viewer installed
This will indicate Yes if the cyflex_error_viewer rpm is installed and No if it is not. This rpm is not required. If you would like to install it, execute the following:

sudo yum install cyflex_error_viewer

Electronic logbook present
This will indicate Yes if /specs/log/elogbook.db is present. It does NOT verify that the database is the latest format, but the elb command should automatically make these updates.

Number of core dumps on this system
Indicates how many files are located in /data/errors with “-core.” in the filename. This should generally correspond to the number of core dumps present on the system.

Error database size
Simple file size of the error database /data/errors/error.db
This will only output if an error database is present on the system, which may not be the case if cyflex has not yet ran or a prior database hasn’t yet been copied over. This is simply displayed for information purposes. There is no hard-set maximum size for an error database, but this can give a quick idea of how many errors are being generated on a particular system.

Number of CyFlex errors generated in last 24 hrs
This will display an integer value indicating the number of errors that have been generated on this system in the last 24 hours. It will only be displayed if an error database is present, and if the /cyflex/bin/get_errs command is found (indicating cyflex is installed).
This is simply displayed for information purposes. There is no hard-set number of errors that indicates a problem, but fewer errors is better for overall system performance.

Number of CyFlex apps running
This simply counts the output of sin names to give an indication of how many CyFlex apps are running.

CyFlex memory properly allocated
This will display “Yes” if running “show_mem” does not generate any errors. If you have errors, execute

show_mem

to see what values are overrun and correct them by updating sys_start in the go.scp.
NOTE that this will report “No” if show_mem is generating an error that you are simply nearing a memory limit, even if you have not exceeded. If this is the case – the best action to take is to increase the memory limit.

############## HARDWARE/DRIVER INFORMATION ##############

NVIDIA graphics card installed
This parses the output of lspci to see if an NVIDIA graphics card is detected in a pci slot.

NVIDIA driver manually installed
If an NVIDIA graphics card is detected, but the rpm provided by TRP for the NVIDIA driver was not detected, however the nvidia module is loaded, it is assumed that an NVIDIA driver has been manually installed, possibly because the NVIDIA card being used is not supported by any of the rpm’s offered by TRP.
This entry will not be displayed if an NVIDIA graphics card is not detected from the lspci output.

NVIDIA rpm installed
This indicates if one of the NVIDIA driver rpms provided by TRP has been installed. This entry will not be displayed if an NVIDIA graphics card is not detected from the lspci output.

nouveau graphics driver in use
The nouveau driver that comes installed with the Oracle OS can be used with NVIDIA graphics cards, although it has found to lead to graphics lockups in many cases and is not recommended. This entry will not be displayed if an NVIDIA graphics card is not detected from the lspci output.

NVIDIA rpm number
TRP hosts 4 different NVIDIA drivers (all from NVIDIA) on our server. They can be manually installed, or one will automatically be picked during the install process by the post_install script. This check simply shows which driver is installed. This entry will not be displayed if an NVIDIA graphics card is not detected from the lspci output.

Intel i915 module loaded
This is the driver used by default when you are using onboard intel graphics instead of an NVIDIA card.

Devicemaster driver installed
This will indicate No if the nslink module is not running. If the module is running, you should see the following entries

nslink service loaded
This will indicate Yes if the nslink systemd service is loaded. This does not indicate that the system is enabled to automatically start on boot. To see more information, execute the following:

systemctl status nslink

nslink service enabled
This will indicate Yes if the nslink systemd service is enabled, meaning it will automatically load on boot. To see more information, execute the following:

systemctl status nslink

Read/write permissions correctly set for devicemaster
When using a comtrol devicemaster TCP to serial hub, the nslink module must be loaded. Checking that the module is loaded comes from parsing the output of lsmod.
nslink is managed by systemd. For it to function, the service must be active. For it to function on reboot, the service must be enabled.

To check the status, run

systemctl status nslink

To start the service,

systemctl start nslink

To ensure it automatically starts on boot,

systemctl enable nslink

There are different means of setting permissions for the serial ports. The tar file used to install the devicemaster drivers will include detailed instructions for setting up and troubleshooting that device.

Rocketport driver RPM installed
This will indicate Yes if either the rocketport card or rocketport infinity card is detected in the output of lspci. The rocketpart card is no longer available, but many systems still use them. They plug into the PCI slot of the motherboard. The rocketport infinity card is the successor and plugs into the PCIe slot. These are serial devices.
Generally speaking, the same rocketport infinity driver can be used for any serial rocketport card that installs in the PCI express slot, no matter how many ports it has.

Rocketport module loaded
This will indicate Yes if the rocket module is found to be loaded in the output of lsmod for a rocketport card.

RocketPort Infinity Express RPM installed
This will indicate Yes if the rp_infinity_express rpm is installed.

RocketPort Infinity Express module installed
This will indicate Yes if the rp2 module is found to be loaded in the output of lsmod for a rocketport infinity card.

Read/write permissions correctly set for devicemaster
This checks that read/write permissions for the test cell user are currently set.
These permissions are sometimes configured in the rc-local service managed by systemd, the status of which can be seen by checking the output of

systemctl status rc-local.service

Counter Timer module loaded
This will indicate Yes if the tc9513 modules is loaded in the output of lsmod. This module is only required if you’re using MTL IO from the dark ages.

CanFD rpm installed
This will indicate Yes if the cyflex-canfd rpm is installed. This rpm will download and install the drivers from PEAK.
This rpm can also be installed to install the drivers for a PEAK PCIe CAN card installed on the motherboard, or to utilize the CanFD USB dongle from PEAK.

CAN device present
This will indicate Yes if the output of

ifconfig -a

indicates a CAN device is present.
It does not validate anything else is properly configured, or that the device has been enabled or configured. For more information on CAN check the usage for

CanDbc
CanFD
enable_candbc_nopasswd

EtherCAT rpm installed
This will indicate Yes if the cyflex-ethercat rpm is installed. This rpm does not need to be installed if you’re not communicating with any EtherCAT devices.

MAC address for EtherCAT configured
This will indicate Yes if anything is found to be configured after MASTER0= in /etc/sysconfig/ethercat. It does not verify you’ve correctly put a MAC address in that file.
This will only be displayed if the cyflex-ethercat rpm has been installed.

Number of EtherCAT MAC addresses configured
Indicates an integer value of how many MASTERS have been configured in /etc/sysconfig/ethercat. This will only be displayed if the cyflex-ethercat rpm has been installed.

Generic driver configured for EtherCAT
Ensures the generic driver has been specified in the bottom of the /etc/sysconfig/ethercat config file. This will only be displayed if the cyflex-ethercat rpm has been installed.