How can I get debug information from my program in the field?

A little history: I've been working in this industry since CP/M days, and have noticed that it seems like an inescapable fact that a certain percentage of bugs just don't show up in testing, they only ever show up in the field with real users hitting on the system, doing all that annoying crap that real-world entites engage in. This is especially true of real-time systems, which is the field I started in (embedded military stuff).

Once you accept this fact of life, you realize that for this class of bugs all the ASSERTs in the world aren't going to help you because you're working with release builds, and you need something you can use in the field.

So, let's take an example. I can embed debug in the code which does something like this :

if (m_bRuntimeDebugEnabled)
{
   CString csDebug
   csDebug.Format ("Mainfrm_MyFunc: max A_elements=%d max A_ units=%d",
                   wMaxAE,
                   byNoOfUnits);
   DebugMessage (csDebug);
}

where m_bRuntimeDebugEnabled is a global flag set up from the registry. DebugMessage is the debug handler, which can be as simple or complex as you like, but would typically look something like this. m_ctrl_listEvents is a DDX variable assigned to an unsorted listbox in my debug dialog.

#define MAX_DBG_MSGS    250

void CMainFrame::DebugMessage (const char * pszString)
{
   int      iLbRetVal;
   CString  csMsg;
   CTime    TimeNow;
   CString  csTime;

   if (m_bRuntimeDebugEnabled) // set elsewhere (from registry)
   {
      TimeNow = time (NULL);
      csTime  = TimeNow.Format (TEXT("%H:%M:%S > "));
      csMsg   = csTime;
      csMsg  += lpszMsg;
                
      iLbRetVal = m_ctrl_listEvents.AddString (csMsg);

      if ((iLbRetVal != LB_ERR) && 
          (iLbRetVal != LB_ERRSPACE))
      {
         // Select the inserted item to keep it in view.
         m_ctrl_listEvents.SetCurSel (iLbRetVal) ;
      }

      // If we've reached the limit of maintained debug o/p, then
      // delete the first line.
      if (m_ctrl_listEvents.GetCount () > MAX_DBG_MSGS)
         m_ctrl_listEvents.DeleteString (0) ;
                   
      // optional extra : allow output to be diverted to an
      // external trace application. m_bCopyToTrace is also
      // set from the registry at startup.

      if (m_bCopyToTrace)
      {
         csMsg += "\n";
         OutputDebugString (csMsg);
      }
   }
}

If you can't use MFC classes, then use this native Win32 alternative (this is the version in the download):

#define DBG_TIME_STRLEN 12
#define MAX_DBG_LEN     120
#define MAX_DBG_MSGS    250

void CMainFrame::DebugMessage (const char * pszString)
{
   static char szLocalDbg [MAX_DBG_LEN + DBG_TIME_STRLEN] = {0};
   LRESULT litem ;
   time_t TimeNow ;
   struct tm * pTime ;

   if (m_bRuntimeDebugEnabled) // set elsewhere (from registry)
   {
      TimeNow = time (NULL) ;
      pTime = localtime (&TimeNow) ;
      strftime (szLocalDbg, DBG_TIME_STRLEN, "%H:%M:%S >", pTime) ;
      strcat (szLocalDbg, pszString) ;

      SendDlgItemMessage (IDC_DBG_MSGLIST,
                          LB_ADDSTRING,
                          0,
                          (LPARAM) (LPSTR) szLocalDbg) ;

      // optional extra : allow output to be diverted to an
      // external trace application. m_bCopyToTrace is also
      // set from the registry at startup.
         
      if (m_bCopyToTrace)
      {
         strcat (szLocalDbg, " \n");
         OutputDebugString (szLocalDbg) ;
      }
      litem = SendDlgItemMessage (IDC_DBG_MSGLIST,LB_GETCOUNT,0,0L);

      // If we've reached the limit of maintained debug o/p, then
      // delete the first line.
      if (litem > MAX_DBG_MSGS)
      {
         litem = SendDlgItemMessage (IDC_DBG_MSGLIST,
                                     LB_DELETESTRING,0,0L);
      }
      SendDlgItemMessage (IDC_DBG_MSGLIST,LB_SETCURSEL,(litem-1),0L);
   }
}

If you already have a suitable dialog hanging around, you only have to add a listbox, and a mechanism for holding and clearing it (my diaog is normally hidden, made visible by a weird keystroke sequence). Having the ability to tell the debug code to copy its output to OutputDebugString via m_bCopyToTrace helps, because I can then use a debug output catcher application like DBWIN32, which provides the ability to save to a file, and also allows me to catch the debug output from multiple apps, all neatly serialised. This is very handy when you're developing multi-app systems.

You pay a little price in code fat for the debug function and the formatting code, but if all you're doing is stuff like tellbacks (i.e. "called this function","exited this function") then even that isn't much, and when debug is disabled you're only paying the performance penalty of a boolean check, which is not much.

Once you have the basic technique down pat, you can then add extra features, such as adding an OnCommand handler which calls DebugMessage to give you a record of all the users actions, so you can tell exactly what they did, like this one:

BOOL CMainFrame::OnCommand(WPARAM wParam, LPARAM lParam)
{
   TCHAR szLabel [50];
   HMENU hMenu = 0;

   if (m_bRuntimeDebugEnabled)
   {
      hMenu = ::GetMenu (GetSafeHwnd());
      if (hMenu)
      {
         if (::GetMenuString (hMenu,
                              LOWORD(wParam),
                              szLabel,
                              sizeof(szLabel)-1,
                              MF_BYCOMMAND))
         {
            wsprintf (m_szDebug,
                      "Menu: User selected '%s'", 
                      szLabel);
            DebugMessage (m_szDebug);
         }
      }
   }
   return CFrameWnd::OnCommand(wParam, lParam);
}

You can add similar handlers to dialogs to capture button pushes etc. It's up to you how far you want to take it. You can even send your output to an external file for study off-site, but then performance (and disk space!) can become an issue.