4

From apps I've written and one I've inherited, I have a continuing desire to better understand the thread-safety issues of loading data on a background thread. Suppose I have a simple, single-window Windows Forms app with a "Load" button and a BackgroundWorker:

My App in the Visual Studio Designer

The button's Click handler calls loadBackgroundWorker.RunWorkerAsync(), and the worker's DoWork handler creates and initializes an object of type Document which, after loading, it stores in the form's LoadedDocument property. In the worker's RunWorkerCompleted handler, a MessageBox displays the properties of the LoadedDocument. I know this is all hard to visualize, so I'm including complete code. Sorry that it makes the question so long to read.

Here's the form's code:

using System;
using System.ComponentModel;
using System.Windows.Forms;

namespace BackgroundLoadTest
{
    public partial class Form1 : Form
    {
        private Document _loadedDocument;
        public Document LoadedDocument
        {
            get
            {
                lock (this)
                {
                    return _loadedDocument;
                }
            }
            set
            {
                lock (this)
                {
                    _loadedDocument = value;
                }
            }
        }

        public Form1()
        {
            InitializeComponent();
            loadBackgroundWorker.DoWork += new DoWorkEventHandler(loadBackgroundWorker_DoWork);
            loadBackgroundWorker.RunWorkerCompleted += new RunWorkerCompletedEventHandler(loadBackgroundWorker_RunWorkerCompleted);
        }

        void loadBackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
        {
            Document d = new Document();
            d.Property1 = "Testing";
            d.Property2 = 1;
            d.Property3 = 2;
            this.LoadedDocument = d;
        }

        void loadBackgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
        {
            MessageBox.Show("Document loaded with Property1 = " +
                LoadedDocument.Property1 + ", Property2 = " +
                LoadedDocument.Property2 + ", Property3 = " +
                LoadedDocument.Property3);
        }

        private void loadButton_Click(object sender, EventArgs e)
        {
            loadBackgroundWorker.RunWorkerAsync();
        }
    }
}

Here's the code for the Document class:

using System;

namespace BackgroundLoadTest
{
    public class Document
    {
        public string Property1 { get; set; }
        public double Property2 { get; set; }
        public int Property3 { get; set; }
    }
}

My question is:

What thread-safety/memory-visibility problems do you see with this code, or what would you do differently given the goal of loading data on the background thread and eventually using the loaded data on the UI thread?

Is the locking in the LoadedDocument property sufficient to ensure that data initialized in the background thread will be visible to the UI thread? Is the locking necessary? I really want to understand the seemingly very common problem of loading complex documents on a background thread while keeping the GUI responsive, and I know it's tricky stuff.

Edit: to be clear, what I'm most concerned about here is memory visibility. I want to be sure that all the data initialization done by the background thread becomes visible to the GUI thread when the worker completes. I don't want changes getting stuck in a CPU cache and remaining invisible to threads on other CPUs. I don't know how to state my concerns better because they're still rather vague to me.

adv12
  • 8,443
  • 2
  • 24
  • 48
  • @ScottChamberlain, thanks. Are the results of atomic operations guaranteed to be immediately visible to other threads? That is, can I trust that even without locking, a variable set in a thread on one CPU will always be seen as modified when viewed by a thread on another CPU? Synchronization I get, but memory visibility really confuses me. – adv12 Dec 19 '14 at 21:00
  • (In case it wasn't clear, it's CPU caches I'm worried about. I want to know that the result of an assignment doesn't just happen in a cache on one CPU without being made visible to other CPUs.) – adv12 Dec 19 '14 at 21:02
  • Forgot about that, yes locking would ensure the latest value. I would modify your question to be more clear that CPU cache issues is what you are concerned about and not protecting concurrent access. – Scott Chamberlain Dec 19 '14 at 21:07
  • You should do all the document-loading work on the background thread, using a local `Document` instance (local to the thread function). Once you're done with loading the document, only then assign it to the form's document variable. If not, other threads may try to access various document properties before they're ready, unless you have locking _inside_ the document class. – xxbbcc Dec 19 '14 at 21:12
  • @xxbbcc, that's exactly what I was trying to do. Did I screw up? – adv12 Dec 19 '14 at 21:14
  • @adv12 No, I think I either missed that code snippet or you added it in an edit. :) – xxbbcc Dec 19 '14 at 21:15

1 Answers1

3

Locking around your getters and setters do nothing, assigning a reference type to a variable is an atomic operation.

This is plain wrong. Locking introduces memory barriers and thereby prevents instruction reordering and makes cached values visible to other threads. Accessing fields or properties (which also access fields) from different threads without synchronization isn't guaranteed to always work and can't be considered correct code.

What you're doing is accessing the LoadedDocument property from both your background thread and your UI thread. As you have implemented locking in there, this is correct code and will be thread safe.

The DoWorkEventArgs argument in your loadBackgroundWorker_DoWork method has a Result property which should be used to set the result of the background work. The RunWorkerCompletedEventArgs.Result property then can be used to access this value. Try the following:

    void loadBackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
    {
        Document d = new Document();
        d.Property1 = "Testing";
        d.Property2 = 1;
        d.Property3 = 2;
        e.Result = d;
    }

    void loadBackgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
    {
        this.LoadedDocument = (Document)e.Result;
        MessageBox.Show("Document loaded with Property1 = " +
            LoadedDocument.Property1 + ", Property2 = " +
            LoadedDocument.Property2 + ", Property3 = " +
            LoadedDocument.Property3);
    }

This tutorial is one of the most comprehensive and understandable resources in regard to multithreading in .NET which I would highly recommend. Your question would have been answered here.


Edit: Clarification of how BackgroundWorker synchronizes stuff

Still, I'm curious what magic goes on in BackgroundWorker that makes data passed via e.Result fully visible to the GUI thread.

Looking into reference source of Background worker, it is not really obvious how the result is synchronized between threads:

    private void WorkerThreadStart(object argument)
    {
        object workerResult = null;
        Exception error = null;
        bool cancelled = false;

        try
        {
            DoWorkEventArgs doWorkArgs = new DoWorkEventArgs(argument);
            OnDoWork(doWorkArgs);
            if (doWorkArgs.Cancel)
            {
                cancelled = true;
            }
            else
            {
                workerResult = doWorkArgs.Result;
            }
        }
        catch (Exception exception)
        {
            error = exception;
        }

        RunWorkerCompletedEventArgs e = 
            new RunWorkerCompletedEventArgs(workerResult, error, cancelled); 

        asyncOperation.PostOperationCompleted(operationCompleted, e);
    }

This happens on the background thread. The last line then marshals back to the UI thread. Looking further down the stack, there are no lock statements or other synchronization directives there. So how is this made thread safe?

Looking into the RunWorkerCompletedEventArgs, we find no synchronization code either. But there is some strange attribute over there:

[HostProtection(SharedState = true)]
public class RunWorkerCompletedEventArgs : AsyncCompletedEventArgs

MSDN explains:

When SharedState is true, it indicates that a state is exposed that might be shared between threads.

So putting this attribute above your class obviously makes its members thread safe by synchronizing their access. Is this awesome? I think so. Should you use this in your code? Probably not.

Frank
  • 4,461
  • 13
  • 28
  • Thanks. I'd seen the `Result` property before and forgot about it. Still, I'm curious what magic goes on in `BackgroundWorker` that makes data passed via `e.Result` fully visible to the GUI thread. Wouldn't it be some sort of locking akin to what I had in my `LoadedDocument` property? – adv12 Dec 19 '14 at 21:23
  • Btw, the Albahari document is great. I downloaded it recently and read most of it, but it's a lot to take in. Thanks for the link to the `BackgroundWorker` section. I'll give it another look. – adv12 Dec 19 '14 at 21:25
  • 1
    @ScottChamberlain I'm sorry, but I think this is not true: `RunWorkerCompletedEventArgs` is constructed in the background thread. The instance fields, including `Result` are set by the constructor on the same background thread. Then, the reference to the EventArgs instance is sent to the UI thread using `SynchronizationContext.Post`. This doesn't magically synchronize instance fields of the EventArgs. What makes it thread safe is the [HostProtection(SharedState = true)] attribute on the `RunWorkerCompletedEventArgs` class. – Frank Dec 19 '14 at 22:02
  • @adv12 I'm sorry, I didn't see your lock statements before. I've updated my response with additional information. – Frank Dec 19 '14 at 22:03
  • @Frank, no prob. Thanks for the additional information. I'm calling this question answered. – adv12 Dec 19 '14 at 22:05