Error Notifications when Running Scheduled Jobs

There are several ways to send notifications with R when a scheduled job/process has completed running, but if it errors during that process then we still want to have a notification that the script has been unsuccessful. Normally a scheduled job is set-up by using Rscript. If a script is being scheduled, then it generally should be in a good position to run every time without erroring. However there may be times where this is not the case, for example the data from the previous day hasn’t updated, the response of an API call has changed, or simply a new edge case has appeared that hasn’t previously been tested. When this happens the script will exit and the only way to find this out is to actively search the log files to find where the error happened.

Here we will look at a way that if an Rscript errors, then you will get notified within seconds.

Set-Up

Creating an automated job is simple enough to do in R; there are different packages available depending on the OS the job is being scheduled on.

{taskscheduleR} for Windows
{cronR} for Linux/Unix systems

To run a script every day at 6am, we can run the following:

# taskscheduleR
taskscheduleR::taskscheduler_create(
  taskname = "my_daily_process", 
  rscript = "path/to/script", 
  schedule = "DAILY", 
  starttime = "06:00"
)

# cronR
cronR::cron_add(
  command = cronR::cron_rscript("path/to/script"),
  frequency = "daily",
  at = "06:00"
)

Error Handling

`.Last`

.Last is an useful variable. If .Last has been assigned a function in the global environment, then when you decide to quit your R session, that function will be run before R fully shuts down. If you look at the documentation of quit by default runLast, whether or not .Last should be executed, is set to to TRUE. However when running Rscript and there is an error, it does not call the .Last function and therefore the e-mail will not be sent.

To get around this, we can change how errors are handled when run within the scheduled job by changing the error option to enable running the .Last function.

options(error = \() quit(save = "no", status = 1, runLast = TRUE))

Logging

Logging normally happens in automated jobs, for example when a cron job runs in Linux, the output by default is sent to /var/spool/mail/user. However in this case we want to have that information available within the email. To enable this we can create a temporary file and use the sink() command to log both the standard output, and the warnings and error messages, to this file which we can attach to the e-mail.

log_filename <- tempfile(fileext = ".log")
log_file <- file(log_filename, open = "wt")
sink(log_file)
sink(log_file, type = "message")

To make sure we close the connection to the log file properly within the .Last function we add the following calls:

sink()
sink(type = "message")

E-mail

Finally, we need a way to send the results to the user. For this I have used the {blastula} package. Within the e-mail, we can include a brief message saying whether or not the job has run successfully, and the log file to help diagnose the error.

For more about setting up e-mails to send using SMTP, have a read of this {blastula} vignette

Final Script

job_completed <- FALSE

if (!interactive()) {
  .Last <- function() {
    sink()
    sink(type = "message")
    
    if (job_completed) {
      job_status <- "Success"
      email_body <- blastula::md(
        "Hello,
        
        Congratulations, the job has run successfully!"
      )
    } else {
      job_status <- "Error"
      email_body <- blastula::md(
        "Hello,
        
        The job has errored. Please take a look at the logs to diagnose where the error occurred."
      )
    }
    
    email_content <- blastula::compose_email(body = email_body)
    email_content <- blastula::add_attachment(email_content, log_filename)
    
    blastula::smtp_send(
      email = email_content,
      to = "username@gmail.com",
      from = "username@gmail.com",
      subject = paste("Scheduled Job Result:", job_status),
      credentials = blastula::creds_file(id = "gmail_creds")
    )
    
    file.remove(log_file)
  }
  
  options(error = \() quit(save = "no", status = 1, runLast = TRUE))
  
  log_filename <- tempfile(fileext = ".log")
  log_file <- file(log_filename, open = "wt")
  sink(log_file)
  sink(log_file, type = "message")
}

... # Code for automated process

job_completed <- TRUE
# end of script

Notes

A lot of this functionality is wrapped around by if (!interactive()) because when we run this to debug any issues that have occurred, we want to avoid quitting the process when we find the erroneous code.
Extra information can be included with the e-mail to save time when debugging the error, such as saving the current workspace when the process errors and include as an attachment.
If you only want to receive the e-mail upon erroring, then add an early return() in .Last checking if job_completed is true.

Alternatives

Sending the notification by push notification using {RPushBullet} rather than by e-mail.
Using a tool like Apache Airflow to monitor all scheduled jobs

For more great examples of R in action, check out R-bloggers

Tags: rstats rscript error-handling