Freeing Up Storage

No matter how big your storage pools are, it seems that the recorded content always expands to fit the space available. If you are like us, you never manage to get enough time to watch everything you want to watch so that, eventually, you are bound to start running out of space for new recordings. When the time comes, you could just buy some larger hard drives or you could bite the bullet and clean up the disks instead.

In this section, we describe two scripts that can help you make the decision as to which recording files are orphaned from the database, using up space for nothing, and can simply be deleted, which remaining programs should be watched first to free up the most space immediately, and which programs are so old that you'll likely never watch them.

Obviously, orphaned recording files are the low hanging fruit, since nobody even knows that they exist. Despite its best efforts to keep track of things, sometimes orphaned recordings are created when MythTV loses the database pointer to a recording file in a storage group, especially when storage groups are spread across multiple systems. This occurs most often when database updates are done on the master backend without the benefit of communication with the secondary backends (e.g. when a network connection goes down).

For example, MythTV will occasionally ask you if you want to delete a database entry (i.e. for a recorded program) despite the fact that the file associated with it cannot be found. The reason may well be that MythTV querried the remote, secondary backend that owns the file but the secondary backend didn't respond. The clever MythTV code then assumes that the file doesn't exist (not that we can't talk now because the network is down). If you allow it to proceed with the database delete, an orphaned file will be created.

On other occasions, power failures, system failures, etc., may lead to orphaned files with no database entries. Regardless of how they are created, orphaned files occupy space, thereby reducing the amount of recording capacity available and leading to a cluttered storage groups. When you only have three or four terabytes of storage left for recording, it's not a good thing (tm). MythTV has no idea that these files even exist so it's never going to do anything about them.

A special script that just looks for orphans and optionally cleans them up can be installed somewhere on your MythTV backends. This script will also look for associated tumbnail image files that have not been deleted, as well as orphaned image files themselves. A good spot to install it is in /usr/local/bin:

     #!/bin/bash
     #
     # orphan - Shell script to find all orphaned MythTV recordings.
     #
     #
     # Usage
     # -----
     #
     # orphan [-r] storage_path [host] [user] [password]
     #
     # The optional flag parameters, if specified, must come first.  These
     # parameters begin with '-' and are as noted above.
     #
     #      -r                Causes any orphans to be automatically removed.
     #
     #      storage_path      The full path (minus trailing slash) of the
     #                        storage directory that is to be checked for orphan
     #                        recordings.
     #
     #      host              The optional, remote host where the database is
     #                        kept.  If you are running this script on a
     #                        secondary backend, you should supply the hostname
     #                        of the master backend.  The default is no
     #                        hostname, which will cause the the query to be run
     #                        on localhost.
     #
     #      user              The optional user name to connect to the database
     #                        with.  The default, if omitted, is obtained from
     #                        the MythTV "mysql.txt" file.  If this file cannot
     #                        be found, the default is "mythtv".
     #
     #                        Typically, you would need to supply this parameter
     #                        if you are using a remote host for the database
     #                        and the local "mysql.txt" file doesn't exist or
     #                        has an incorrect user name.
     #
     #      password          The optional password for the user chosen.  The
     #                        default, if omitted, is obtained from the MythTV
     #                        "mysql.txt" file.  If this file cannot be found,
     #                        the default is "mythtv" (yeah, like that'll work
     #                        -- you should probably use the password that you
     #                        set when you configured MythTV).
     #
     #                        Typically, you would need to supply this parameter
     #                        if you are using a remote host for the database
     #                        and the local "mysql.txt" file doesn't exist or
     #                        has an incorrect password.
     #
     #
     # Description
     # -----------
     #
     # This script can be used by the user, when they are running out of space,
     # to delete MythTV recordings that are orphaned.  An orphaned recording is
     # one that is found in the recordings storage directory but which is not
     # referenced in the MythTV database.
     #
     # Orphans can be safely deleted so the "-r" flag can be used to
     # automatically remove them, although you may want to run this script
     # without the "-r" flag first, to see what it is thinking of deleting.
     # Otherwise, cleanup can be done maually by the user from the command line.
     #
     #
     # Revision History
     # ----------------
     #
     # E. Wilde    2009Oct6   Initial coding.
     # E. Wilde    2018Jun22  Use mysql.txt, automatically remove orphans, scan
     #                        for orphaned image files.
     #
     ##########################################################################
     #
     # The possible locations of the MythTV "mysql.txt" file, in order of usage.
     #
     MythConfigs=("/etc/mythtv/mysql.txt" \
         "/usr/local/share/mythtv/mysql.txt" \
         "/usr/share/mythtv/mysql.txt" \
         "/usr/local/etc/mythtv/mysql.txt")
     #
     # Default values.  You can change these, if you don't want to have to
     # supply these values on the command line all the time.
     #
     DefUser="mythtv"
     DefPasswd="mythtv"
     #
     # Flags for automatic removal, etc.
     #
     AutoClean="no"
     #
     # Let's see if we can figure out the MythTV database userid and password.
     #
     for MythConfig in "${MythConfigs[@]}"; do
         if [ -f $MythConfig ]; then
             . $MythConfig
          if [ x"$DBUserName" != "x" ]; then
              DefUser=$DBUserName
          fi
          if [ x"$DBPassword" != "x" ]; then
              DefPasswd=$DBPassword
          fi
      fi

done
#
# Next, examine any parameters passed to this script to see if they begin # with a dash, thereby indicating an optional flag. These must occur first # because, as soon as the first parameter without a dash occurs, we're # done.
#
# Also note that single letter flags must occur before multiple letter # flags, in case the single letter flags start with the same letter.
while [ $
!= 0 ]; do

      if [ "${1:0:1}" == "-" ]; then
          case "$1" in
              -r)
                  AutoClean="yes"
                  ;;
          esac
      else
          break
      fi
      shift

done
#
# See if there's a storage path supplied. #
if [ x"$1" == "x" ]; then

      echo Storage directory path omitted
      exit 1

fi

     if [ ! -d $1 ]; then
         echo Storage directory $1 not found
         exit 1
     fi
     #
     # See if there's a hostname supplied.
     #
     if [ x"$2" == "x" ]; then
         Host=""
     else
         Host=" -h$2"
     fi
     #
     # See if there's a userid and password supplied.
     #
     if [ x"$3" == "x" ]; then
         Userid=$DefUser
     else
         Userid=$3
     fi
     if [ x"$4" == "x" ]; then
         Password=$DefPasswd
     else
         Password=$4
     fi
     #
     # Get a listing of all of the recording files in the given directory and
     # see if they're referenced in the database.
     #
     ls -1 $1/.avi $1/.m4v $1/.mkv $1/.mp4 $1/.mpg $1/.nuv 2>&1 | \
         while read Line; do
         #
         # Strip the path name off the recording file name.  Only the file names
         # are stored in the database.  Note that the file names can contain
         # spaces so you need to use double quotes around them.
         #
         File=`echo $Line | sed -ne "s|.$1/\(.\)|\1|p"`
      if [ ! -f "$1/$File" ]; then continue; fi
      #
      # Let's see if we can find the recording in the database.
      #
      Status=`mysql -e "select chanid from recorded where basename='$File';" -N -u$Userid -p$Password$Host mythconverg`
      if [ x"$Status" == "x" ]; then
          #
          # Remove it, if we were asked to do so.
          #
          if [ "x$AutoClean" == "xyes" ]; then
              echo Recording $File is orphaned, removed
              rm -f $1/$File >/dev/null 2>&1
              ls -1 $1/$File.png 2>&1 | while read List; do
                  Image=`echo $List | sed -ne "s|.$1/\(.\)|\1|p"`
                  if [ ! -f "$1/$Image" ]; then continue; fi
                  echo "  Image $Image also removed"
                  rm -f $1/$Image >/dev/null 2>&1
              done
          else
              echo Recording $File is orphaned
          fi
      fi
     done
     #
     # Now that we've (possibly) deleted all of the orphan recording files, get
     # a listing of all of the image files in the given directory and see if
     # their associated recording files are referenced in the database.
     #
     ls -1 $1/.png 2>&1 | while read Line; do
         #
         # Strip the path name off the recording file name.  Only the file names
         # are stored in the database.  Note that the file names can contain
         # spaces so you need to use double quotes around them.
         #
         Image=`echo $Line | sed -ne "s|.$1/\(.*\)|\1|p"`
      if [ ! -f "$1/$Image" ]; then continue; fi
      #
      # Strip the ".png" extension off the file so that we can look for the
      # associated recording file.
      #
      File=`echo $Image | \
          sed -rne "s/^(.\.(avi|m4v|mkv|mp4|mpg|nuv))\\..$/\1/p"`
      #
      # Let's see if we can find the recording in the database.
      #
      Status=`mysql -e "select chanid from recorded where basename='$File';" -N -u$Userid -p$Password$Host mythconverg`
      if [ x"$Status" == "x" ]; then
          #
          # Remove it, if we were asked to do so.
          #
          if [ "x$AutoClean" == "xyes" ]; then
              echo Image $Image is orphaned, removed
              rm -f $1/$Image >/dev/null 2>&1
          else
              echo Image $Image is orphaned
          fi
      fi
     done
     #
     # That's all for Ray Nance.
     #
     exit 0

However, before you proceed with cleaning up all of the orphans, we should note that the contrib directory has a Perl script which can be used to identify orphaned recording files and optionally reinsert them into the database. If you wish to use it to recover orphaned recordings instead of just deleting them, you should unzip it and install it somewhere where it can be run (e.g. /usr/local/bin):

     su
     cp /usr/share/doc/mythtv-backend/contrib/myth.rebuilddatabase.pl.gz
        /usr/local/bin
     cd /usr/local/bin
     gunzip myth.rebuilddatabase.pl.gz
     chmod ugo+ myth.rebuilddatabase.pl

Information about how to run this script can be found at:

     http://www.mythtv.org/wiki/Myth.rebuilddatabase.pl

Further good candidates for cleanup are old recordings that have been hanging around forever and that are large in size (depending on the source, capture method and content type, the file size of a recording on disk can bear little relationship to its length in minutes). If you are ever planning to watch these recordings, watching the largest ones first can be the most productive in terms of reclaiming space. If you aren't going to watch these recordings, seeing them listed as really old could provide you with the incentive needed to delete them.

Even when it comes to recent recordings, watching the largest ones first can give back the most space in the least amount of viewing time. Thus, using a list of recordings ordered by disk size, to assist you with deciding which recordings to watch first, will prove to be the most fruitful approach when space needs to be reclaimed.

Finally, depending on your setting of the number of days to keep deleted recordings before really deleting their recording files, deleted files can hang around for some time before their space is released. By listing all of the deleted files, you can decide to delete them manually, ahead of time, if you really want to reclaim the space sooner.

The cleanup script, shown below, can be used to assist with choosing which orphaned recording files to delete, which recordings to watch first, and which old recordings to simply delete. It can be installed in /usr/local/bin as well:

     #!/bin/bash
     #
     # cleanup - Shell script to help the user clean up MythTV recordings.
     #
     #
     # Usage
     # -----
     #
     # cleanup storage_path [age] [host] [user] [password]
     #
     #      storage_path      The full path (minus trailing slash) of the storage
     #                        directory that is to be checked for orphaned and
     #                        aged recordings.
     #
     #                        Note that you can give a list of storage
     #                        directories to check by enclosing the list in
     #                        double quotes and separating each individual
     #                        path by a space.
     #
     #      age               The optional age in days.  Any recordings that are
     #                        older than this many days should be included in the
     #                        list of recordings to be cleaned up.  The default,
     #                        if omitted, is 365.
     #
     #      host              The optional, remote host where the database is
     #                        kept.  If you are running this script on a
     #                        secondary backend, you should supply the hostname
     #                        of the master backend.  The default is no hostname,
     #                        which will cause the the query to be run on
     #                        localhost.
     #
     #      user              The optional user name to connect to the database
     #                        with.  The default, if omitted, is obtained from
     #                        the MythTV "mysql.txt" file.  If this file cannot
     #                        be found, the default is "mythtv".
     #
     #                        Typically, you would need to supply this parameter
     #                        if you are using a remote host for the database
     #                        and the local "mysql.txt" file doesn't exist or
     #                        has an incorrect user name.
     #
     #      password          The optional password for the user chosen.  The
     #                        default, if omitted, is obtained from the MythTV
     #                        "mysql.txt" file.  If this file cannot be found,
     #                        the default is "mythtv" (yeah, like that'll work
     #                        -- you should probably use the password that you
     #                        set when you configured MythTV).
     #
     #                        Typically, you would need to supply this parameter
     #                        if you are using a remote host for the database
     #                        and the local "mysql.txt" file doesn't exist or
     #                        has an incorrect password.
     #
     #
     # Description
     # -----------
     #
     # This script can be used by the user, when they are running out of space,
     # to decide which MythTV recordings should be cleaned up.  Cleanup must be
     # done maually by the user, either through the MythTV UI or, in the case of
     # orphaned recordings, using the "orphan" script or from the command line.
     #
     # Orphaned recordings are listed first since the user will, presumably,
     # want to delete them right away.  Recordings that are marked as deleted
     # but which have not yet been deleted are listed next -- the user may want
     # to hasten their demise.  Finally, the listing of old programs then
     # follows in order, sorted by size.  The expectation is that the user then
     # watches the largest ones first and deletes the recordings.
     #
     # Given the premise of the preceding paragraph, for orphaned files, the
     # file name is listed.  The user can delete them from the command line with
     # "rm", or they can use the "orphan" script to delete them automatically
     # (this is probably a good idea, since it also cleans up the image turd
     # files that MythTV loves to leave lying around).  For programs that aren't
     # orphaned but that meet the age cutoff, the program name, instead of the
     # file name, is given, along with the program's size, so that the user can
     # find the program in MythTV's recorded programs listing and watch it.
     #
     #
     # Revision History
     # ----------------
     #
     # E. Wilde    2012May7   Initial coding.
     # E. Wilde    2012Oct24  Use mysql.txt, allow multiple paths, prioritize
     #                        orphan reporting.
     # E. Wilde    2018Jun22  Look for deleted recordings.
     #
     ##########################################################################
     #
     # The possible locations of the MythTV "mysql.txt" file, in order of usage.
     #
     MythConfigs=("/etc/mythtv/mysql.txt" \
         "/usr/local/share/mythtv/mysql.txt" \
         "/usr/share/mythtv/mysql.txt" \
         "/usr/local/etc/mythtv/mysql.txt")
     #
     # Default values.  You can change these, if you don't want to have to
     # supply these values on the command line all the time.
     #
     DefAge=365
     DefUser="mythtv"
     DefPasswd="mythtv"
     #
     # Let's see if we can figure out the MythTV database userid and password.
     #
     for MythConfig in "${MythConfigs[@]}"; do
         if [ -f $MythConfig ]; then
             . $MythConfig
          if [ x"$DBUserName" != "x" ]; then
              DefUser=$DBUserName
          fi
          if [ x"$DBPassword" != "x" ]; then
              DefPasswd=$DBPassword
          fi
      fi

done
#
# See if there's a storage path supplied. #
if [ x"$1" == "x" ]; then

      echo Storage directory path omitted
      exit 1

fi

     if [ ! -d $1 ]; then
         echo Storage directory $1 not found
         exit 1
     fi
     #
     # See if there's an age supplied.
     #
     if [ x"$2" == "x" ]; then
         Age=$DefAge
     else
         Age=$2
     fi
     #
     # See if there's a hostname supplied.
     #
     if [ x"$3" == "x" ]; then
         Host=""
     else
         Host=" -h$3"
     fi
     #
     # See if there's a userid and password supplied.
     #
     if [ x"$4" == "x" ]; then
         Userid=$DefUser
     else
         Userid=$4
     fi
     if [ x"$5" == "x" ]; then
         Password=$DefPasswd
     else
         Password=$5
     fi
     #
     # Start with the title.
     #
     echo Recordings stored in: $1
     #
     # Create a temp file that we can use for sorting.
     #
     TmpFile=`mktemp /tmp/CleanupRecorded.XXXXXXXXXX`
     if [ ! $TmpFile ]; then
         echo Unable to create temporary file, sorting not possible
     fi
     #
     # Process all of the paths that we were given.
     #
     for MythPath in $1; do
         #
         # Get a listing of all of the recording files in the given directory,
         # that are older than the time given, and get their information from
         # the database.
         #
         find $MythPath -mtime +$Age \
                 -regex '.\(avi\|m4v\|mkv\|mp4\|mpg\|nuv\)' | \
             while read Line; do
                 #
                 # Strip the path name off the recording file name.  Only the file
                 # names are stored in the database.
                 #
                 File=`echo $Line | sed -ne "s|.$MythPath/\(.*\)|\1|p"`
              if [ ! -f "$MythPath/$File" ]; then continue; fi
              #
              # Let's see if we can find the recording in the database.  If
              # not, it is orphaned.  All of the orphans are listed first
              # since, presumably, the user will want to delete them right
              # away.  Deleted files are listed next.
              #
              Info=`mysql -e "select recgroup, title, starttime, subtitle from recorded where basename='$File';" -N -u$Userid -p$Password$Host mythconverg`
              #
              # Strip off the recording group.
              #
              RecGroup=`echo $Info | sed -ne "s/^\([a-zA-Z0-9]\) .$/\1/p"`
              Info=`echo $Info | sed -ne "s/^[a-zA-Z0-9] \(.\)$/\1/p"`
              #
              # Decide whether the file is orphaned, deleted, or still in
              # play.
              #
              if [ x"$Info" == "x" ]; then
                  echo Recording $File is orphaned
              else
                  FileSize=`stat -c %s $MythPath/$File`
                  if [ x"$RecGroup" == x"Deleted" ]; then
                      echo Recording $File is marked deleted
                      echo "  $FileSize, $Info"
                  else
                      if [ $TmpFile ]; then
                          echo $FileSize, $Info >>$TmpFile
                      else
                          echo $FileSize, $Info
                      fi
                  fi
              fi
          done
      done

#
# If we were able to capture the output, sort it in order of descending # size.
#
if [ $TmpFile ]; then

      sort -nr $TmpFile
      rm -f $TmpFile

fi
#
# That's all for Ray Nance.
#
exit 0

MythTV seems to be able to loose other files and database entries as well as the orphans mentioned above. On one occasion a backend was found to be hoarding deleted files from years ago. This being the case, you might want to check the deleted recordings by hand, using MySQL:

     mysql -uroot -p mythconverg
       select title, starttime, lastmodified, deletepending as dp
         from recorded where recgroup='Deleted' order by lastmodified asc;

This will show you all of the files that are pending deletion. The last modified date indicates when the file was deleted. If that date is a long time ago (MythTV calculates the retention period for deleted files as the delta between today's date and the last modified date), you know that MythTV has lost the deleted file and it ain't never going to get deleted. You can delete it by hand (don't forget to delete any associated image files).

You can also check to see if any of the deleted recordings don't have a recording file in any of the storage groups. If that's the case, you can use MySQL to delete the recording entry:

     mysql -uroot -p mythconverg
       delete from recorded
         where title='xxxx'
           and starttime='yyyy-mm-dd hh:mm:ss'and recgroup='Deleted';

As a matter of fact, if you simply delete a recording's recorded record like this and then run the orphan script (above), with the "-r" flag, all of the deleted recordings that you wish to get rid of will be cleaned up, along with any associated image files, an you'll be in business.

Lastly, as we noted above, what MythTV does with deleted files depends on your setting of the number of days to keep deleted recordings before their recording files are really deleted. If you wish to see all of the pertinent delete paramters, you can select their rows using MySQL:

     mysql -uroot -p mythconverg
       select * from settings where value like '%delete%';