C Program to read and interpret SMART log of an NVMe Drive

Just like SATA drives, NVMe drives also provide the SMART log data. SMART log, if you people do not know, is the concentration of all the logs a drive maintains. It stands for “Self-Monitoring, Analysis, and Reporting Technology“. It gives us the information like Temperature of the drive, Host Data written, Available Spare Space and etc. We can put this information to the use like managing the health of the drive and etc.

We will write a C-program to fetch and interpret the SMART Log raw data of an NVMe drive. Please note that this code has been written for Linux OS (in particular Ubuntu).

STEPS to send the read SMART Log command in NVMe in LINUX: (it’s easy! we just have to send an admin command – Get Log Page)
1. Open the device file in O_RDWR mode.
2. Get 512 bytes allocated to receive and store the SMART data.
3. Prepare the “nvme_admin_cmd” structure.  This structure is defined in nvme drivers libraries.
4. Call the IOCTL function to send and register the admin command.
5. Interpret the returned raw data.

Step 1:
We give the device file-path as an argument to the executable (./a.out /dev/nvme0). We can open the file with the simple open call.

int fd;  // to store the file discriptor
fd = open(argv[1],O_RDWR);
if(fd == 0){
printf(“could not open device file\n”);
return 1;
}else printf(“device file opened successfully %d\n”,fd);

Step 2:
You can malloc or simply create an array.

char data[SMART_LEN];  //SMART_LEN is a MACRO for 512

Step 3:
There are many fields in this nvme_admin_cmd structure but we only need few of them for GET_LOG_PAGE commands.

struct nvme_admin_cmd cmd = {
.opcode = 0x02,  //Log identifier for SMART Log Page: SEE NVME DOCUMNET
.nsid = 0xffffffff,   //namespace independent
.addr = data,  //data buffer to store the data
.data_len = SMART_LEN, //macro for smart length : it is 512
.cdw10 = 0x007F0002, // command double word : see the NVME document below
//bit 31:28 reserved = 0h
// bit 27:16 NUMBER OF DWORDS = 07Fh(512B)
//bit 15:08 RESERVED = 00h
//bit 7:0 LOG PAGE IDENTIFIER = 02h

}

Step 4:
call the IOCTL function to launch the command.

int ret;
ret= ioctl(  fd,  NVME_IOCTL_ADMIN_CMD,  &cmd);//fd is file descriptor
// “NVME_IOCTL_ADMIN_CMD” is magic number to tell functionality of IOCTL.
// “cmd” is structure that we prepared above

Step 5:
Interpret the raw data returned by the above IOCTL call. Upon completion of the above IOCTL call with success, data buffer will have requested pages.
THIS LINK HOLDS THE NVME DOCUMENT WHICH EXPLAINS THE SMART LOG DATA (go to GET-LOG-PAGE  in admin command).

I have also attached screenshots of the document in the end.

Below is the complete code. It sends the command  and also attempts to interpret some of the attributes. Try and run this code on your Linux Machine (you will need an NVMe drive for this code to work – sorry).

GITHUB link for download :

 

#include<stdio.h>
#include<stdlib.h>
#include<sys/ioctl.h>
#include<unistd.h>
#include<linux/nvme_ioctl.h>
#include<fcntl.h>
#include<linux/types.h>

#define SMART_LEN 512

int main(int argc, char* argv[]){
if(argc < 2){
printf(“kindly give the device file name\n”);
return 1;
}
int fd;
fd = open(argv[1],O_RDWR);
if(fd == 0){
printf(“could not open device file\n”);
return 1;
}else printf(“device file opened successfully %d\n”,fd);

char data[SMART_LEN];
for(register int i=0; i<SMART_LEN;data[i++]=0);
struct nvme_admin_cmd cmd = {
.opcode = 0x02,
.nsid = 0xffffffff,
.addr = data,
.data_len = SMART_LEN,
.cdw10 = 0x007F0002,//bit 31:28 reserved = 0h
// bit 27:16 NUMBER OF DWORDS = 07Fh(512B)
//bit 15:08 RESERVED = 00h
//bit 7:0 LOG PAGE IDENTIFIER = 02h

};

int ret;
ret= ioctl(fd,NVME_IOCTL_ADMIN_CMD,&cmd);

if(ret==0) printf(“successful \n”);
else printf(“failed %d\n”,ret);

printf(“SMART LOG DETAILS\n\n”);
if(data[0]&0x1)
printf(“–Available Spare Space has fallen below the threshold\n”);
if(data[0]&0x2)
printf(“–Temperature has exceed a critical threshold\n”);
if(data[0]&0x4)
printf(“–Device Reliability has been degraded\n”);
if(data[0]&0x8)
printf(“–Volatile memory backup device has failed\n”);
if(*((__u16*)(data+1)))
printf(“–Temperature: %d\n”, *((__u16*)(data+1)));
printf(“–Available spare space: %d %\n”,(int)data[3]);
printf(“–Available spare threshold: %d %\n”,(int)data[4]);
printf(“–Percentage used(goes upto 255): %d %\n”,(int)data[5]);
printf(“–Data Units Read(1000 units of 512B): %lld \n”,*((__u64*)(data+32)) );
printf(“–Data Units Written(1000 units of 512B): %lld \n”,*((__u64*)(data+48)));
printf(“–Host Read Commands: %lld \n”, *((__u64*)(data+64)));
printf(“–Host Writes Commands: %lld \n”, *((__u64*)(data+80)));
printf(“–Controller Busy Time: %lld \n”,  *((__u64*)(data+96)));
printf(“–Power Cycles: %lld \n”,  *((__u64*)(data+112)));
printf(“–Power On Hours: %lld \n”, *((__u64*)(data+128)));
printf(“–Unsafe Shutdowns: %lld \n”, *((__u64*)(data+144)));
printf(“–Media Errors: %lld \n”, *((__u64*)(data+160)));
printf(“–Number of Error Information Log Entries: %lld \n”, *((__u64*)(data+176)));

return 0;
}

RESULT (below figure shows how the output would be for the above code):

blog

 

SCREEN-SHOTS of the document explaining the meaning of the bytes of data returned:

Screenshot from 2017-05-22 00-11-52Screenshot from 2017-05-22 00-13-16

 

GOOD DAY, GOOD PEOPLE;

Advertisements

2 thoughts on “C Program to read and interpret SMART log of an NVMe Drive

  1. Thanks for sharing. This sample code is a good start to understand NVMe programming. If you can share more about NVMe programming will be great. Something like, READ/WRITE screening, Defect Screening, Soft Repair (Over Provisioning, TRIM, SMART Reset), and ID Change (Passport Edit)

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s