One problem you may face is the need to check if an application is already running or not. You may not want a user to run multiple instances of an application for various reasons.

So, how does one go about doing this?

You can use the Mutex class. Yes, Mutex. While this if a funny name it does have its purpose.

To read more about Mutex, click ‘here

Basic summation of what Mutex is: A synchronization primitive that can also be used for interprocess synchronization.

So what does this mean? Basically, Mutex will provide the same functionality as the lock statement. This really means that Mutex is a bit redundant, however there is one advantage that Mutex has over lock:

Mutex provides a computer-wide lock versus just an application-wide lock.

So, this is really cool!

How will this work in our application? Let’s cut to the chase and look at code.

I have created a sample application called ThreadingConsole. My assembly and namespace are ThreadingConsole for consistency.

Now, I know you are really here to just see code that works and not read a bunch of mumbo jumbo about how it is supposed to work. Without further ado, the code:

using System;
using System.Threading;

namespace ThreadingConsole
{
    class Program
    {
        private static readonly Mutex AppMutex = new Mutex(false, "ThreadingConsole");

        static void Main(string[] args)
        {
            if(!AppMutex.WaitOne(TimeSpan.FromSeconds(5), false))
            {
                Console.WriteLine("Another instance of the app is currently running.");
                Console.WriteLine("Hit any key to exit");
                Console.ReadLine();
                return;
            }

            Console.WriteLine("Running - hit any key to exit");
            Console.ReadLine();
        }
    }
}

Now that  you have the code, you can copy and paste this into your application, build it. Navigate to your bin/debug directory. Launch the application and then launch it again. You will find that the 2nd instance will give you the proper error message and you can move on.

Exciting huh?

So, this post was not to explain the nitty gritty of what Mutex is (that is what the click ‘here’ link is for).

I hope you found this useful.

Happy programming.

So, I came across an issue today where I needed to test internal sealed classes in a separate testing project. Well, since you can’t instantiate internal sealed classes, I was in a tight spot.

So, I figured out a work around, and also discovered that there are resources out there on google that would have guided me towards this if I had just taken the time to search versus trying to figure it out on my own.

I remembered that in each project there is an AssemblyInfo.cs file. One of the assembly values is:
InternalsVisibleTo()

So, i set the following:
[assembly: InternalsVisibleTo("TestingProject.Tests")]

I compiled my project and went to my testing project. Behold, success!

I hope this helps others out there!

So, I came across the idea that I needed to come up with a way to dynamically set the where clause for a LINQ statement. There are many articles out there that give different examples of how to do this. Some show how you can pass in “string” values such as “id = 2” and then you can append your where clause this way.

Personally, I did not like any of these solutions. Granted my solution is not the first time it has been used, and I did not invent it, however I found that it works quite well and I would share with the world how to do this.

Not only am I going over how to do dynamic where clauses, I will be showing you a structure that I like to use when dealing with data and business in my applications. It provides an easy way to organize your code and when you need to make changes later, you only ever have to update a few spots.

Before we start coding and getting into the how’s, I want to make sure that you can follow along with me. To do that, we need to prepare some stuff.

If you don’t have the tools to follow along, then keep reading and you will get to the how it is done shortly.

Step 1:

We are going to create a database that we can shove a table into. I have SQL 2005 installed on my local machine.

I created a database called LINQTester, and added 1 table:

CREATE TABLE TestData

(

    NameKey INT IDENTITY NOT NULL PRIMARY KEY,

    FirstName VARCHAR(20),

    LastName VARCHAR(20)

)

We don’t need millions or records in this table, just a couple to prove a point. If you want to test performance, you can insert those records later, but performance is not the key point here (even though there is not a performance issue to really be discussed).

I am using the following insert statements to add the data:

INSERT INTO TestData VALUES ('Jason', 'Heine')

INSERT INTO TestData VALUES ('Bubba', 'Jones')

INSERT INTO TestData VALUES ('William', 'Tell')

INSERT INTO TestData VALUES ('Willie', 'Wonka')

 

So, now we have our data, we need to setup our project in Visual Studio. I am using Visual Studio 2008.

Here is the folder structure we are going to end up with (yes, yes, it may seem like a lot, but I am organizing it this way for a purpose).

If you want to go ahead and create this structure, with empty files, that will work. I will explain what each file is used for along the way.

image

We are going to start with our DBML file. Since we only have 1 table, that is all we need to add. Once you have added your table to the DBML file, make sure you build your project (this will enable the designer stuff to generate for the table).

We are going to start with the CoreConnection.cs

The CoreConnection.cs file is used for building the IDbConnection object, which will be used to physically connect to the database.

I am not going into the details of why all this works, if you need to ask questions, please do so in the comments.

Contents of the CoreConnection.cs file:

using System.Configuration;

using System.Data;

using System.Data.SqlClient;

 

namespace DynamicLinqTester.DataService.Core

{

    public class CoreConnection

    {

        public static IDbConnection GetConnection()

        {

            return GetConnection(ConfigurationManager.AppSettings["CoreDatabase"]);

        }

 

        public static IDbConnection GetConnection(string databaseName)

        {

            var builder = new SqlConnectionStringBuilder();

            builder["Data Source"] = ConfigurationManager.AppSettings["DatabaseServer"];

            builder["Integrated Security"] = true;

            builder["Initial Catalog"] = databaseName;

            builder["MultipleActiveResultSets"] = true;

            IDbConnection iConnection = new SqlConnection(builder.ConnectionString);

            return iConnection;

        }

    }

}

Once you have this, we are going to move onto the TestContextFactory.cs

The TestContextFactory.cs is just an initializer for your DataContext. This will be utilized in the BaseQuery.cs file.

Contents of TestContextFactory.cs

namespace DynamicLinqTester.DataService.Context

{

    public class TestContextFactory

    {

        public static TestDataContext CreateContext()

        {

            return new TestDataContext(CoreConnection.GetConnection());

        }

    }

}

The next thing we want to do is setup our object. Our object will be used to pass the information back to the program, and we will map the data from our table to this object.

Let’s take a look at our BaseQuery.cs file. This will be what each Query class uses to initialize the data context and dispose of it when it is done.

using DynamicLinqTester.DataService.Context;

 

namespace DynamicLinqTester.DataService.Queries

{

    public abstract class BaseQuery

    {

        public readonly TestDataContext CoreContext;

 

        protected BaseQuery()

        {

            CoreContext = TestContextFactory.CreateContext();

        }

 

        ~BaseQuery()

        {

            CoreContext.Dispose();

        }

    }

}

So, on to OTestTable.cs

Contents:

namespace DynamicLinqTester.Objects

{

    public class OTestTable

    {

        public int Id { get; set; }

        public string FirstName { get; set; }

        public string LastName { get; set; }

    }

}

Now that we have our object, we need to create our “mapper”. The mapper is what we are going to use to say “Column 1” = “Property 1”. Doing it like this will mean you don’t need to use object initialization on your methods, and it will help reduce your code by a lot.

Open TestTableMap.cs (this is our mapper file)

Contents:

using System;

using DynamicLinqTester.DataService.Context;

using DynamicLinqTester.Objects;

 

namespace DynamicLinqTester.DataService.Mappers

{

    public class TestTableMap

    {

        public static readonly Func<TestData, OTestTable> DataToObject = mapper =>

           new OTestTable

           {

               FirstName = mapper.FirstName,

               LastName = mapper.LastName,

               Id = mapper.NameKey

 

           };

    }

}

Okay you may be wondering what all this means. If you do not understand the Func<> method, I would suggest reading up on it on MSDN. They will do a far better job then I could.

Again, the main purpose of this is to map the table to your local object.

Now, we want to move to the heart of this blog entry, the where clause.

Before we do that, look at the following LINQ query:

var query = from c in CoreContext.TestDatas

            where c.LastName == "Heine"

            select new OTestTable

                       {

                           FirstName = c.FirstName,

                           LastName = c.LastName,

                           Id = c.NameKey

                       };

This is how you would normally write your LINQ, you would have your where clause and then your object initialization. You can do it without the initialization, but you will end up with more code.

The mapper part get’s rid of the new OTestTable object. Doing so enables you to take multiple queries and use the same mapping without having to update lots of methods when you add/remove a column.

Now, what we want is something that looks like this:

return CoreContext.TestDatas.Where(whereClause)

                                .Select(TestTableMap.DataToObject)

                                .FirstOrDefault();

As you can see, we are replacing the object initialization with the TestTableMap.DataToObject. Now, what about the “whereClause”, what is that?

This is the apex of the blog entry.

Here is the full method with this LINQ statement (this goes in TestTableQueries.cs)

using System;

using System.Linq;

using System.Linq.Expressions;

using DynamicLinqTester.DataService.Context;

using DynamicLinqTester.DataService.Mappers;

using DynamicLinqTester.Objects;

 

namespace DynamicLinqTester.DataService.Queries.TestQueries

{

    public class TestTableQueries : BaseQuery

    {

        public OTestTable GetTestCode(Expression<Func<TestData, bool>> whereClause)

        {

            return CoreContext.TestDatas.Where(whereClause)

                                            .Select(TestTableMap.DataToObject)

                                            .FirstOrDefault();

        }

    }

}

Okay, again you can see we are using the Func<> again, but also we are using the Expression<>. If you do not use the Expression<> LINQ will not know how to evaluate the where clause into a SQL statement, and you will get a SELECT * FROM Table with no where clause.

So, how do we call something like this?

Open up testMethodsEngine.cs

Now add the following code:

using DynamicLinqTester.DataService.Queries.TestQueries;

using DynamicLinqTester.Objects;

 

namespace DynamicLinqTester.Business.TestMethods

{

    public class TestMethodsEngine

    {

        private readonly TestTableQueries QueryEngine;

 

        public TestMethodsEngine()

        {

            QueryEngine = new TestTableQueries();

        }

 

        public OTestTable GetTestCode(string lastName)

        {

            return QueryEngine.GetTestCode(id => id.LastName == lastName);

        }

    }

}

We are using a simple lambda expression to say Table.Column[Value] = ‘value’

If you want to add multiple statements to your where clause you can make it look like this:

public OTestTable GetTestCode(string lastName, string firstName)

{

    return QueryEngine.GetTestCode(id => id.LastName == lastName && id.FirstName == firstName);

}

Now, you control what data you bring back via your business layer. Your data layer no longer controls the business requirements. You give the business the ability to query how it needs to without dictating what the business needs to query.

Let’s open our program.cs and add some test methods:

Contents:

using System;

using DynamicLinqTester.Business.TestMethods;

using DynamicLinqTester.Objects;

 

namespace DynamicLinqTester

{

    class Program

    {

        static void Main(string[] args)

        {

            CodeSearchTest();

            Console.WriteLine(Environment.NewLine);

            Console.ReadLine();

        }

 

        private static void CodeSearchTest()

        {

            var engine = new TestMethodsEngine();

            DateTime startTime = DateTime.Now;

            OTestTable code = engine.GetTestCode("Heine");

            DateTime stopTime = DateTime.Now;

 

            TimeSpan span = stopTime - startTime;

            Console.WriteLine("Searching for Last Name 'Heine' in TestTable");

            Console.WriteLine("Time in MS:" + span.TotalMilliseconds);

            Console.WriteLine(code.FirstName);

        }

    }

}

Also, before we run this, we need to modify our app.config file to point to the proper database and server:

 

<configuration>

  <appSettings>

    <add key="DatabaseServer" value="SERVER"/>

    <add key="CoreDatabase" value="LINQTester"/>

  </appSettings>

</configuration>

When you execute your program you should see the following output:

image

Well, I hope you enjoyed this little bit of information. Feel free to ask me questions, or add comments.

Happy Coding!

The other day I had come across a problem. I needed to be able to sort one of my generic lists by several different properties.

Let us take the following code:

class Program
    {
        static void Main(string[] args)
        {
            var myList = new List<MyObject>
                                        {
                                           new MyObject{SortId = "D"},
                                           new MyObject{SortId = "B"},
                                           new MyObject{SortId = "C"},
                                           new MyObject{SortId = "A"}
                                        };

            foreach (MyObject o in myList)
            {
                Console.WriteLine(o.SortId);
            }

           Console.ReadLine();
        }
    }       

    class MyObject
    {
        public string SortId { get; set; }
    }

If you run this code you will see the following output:

image

So, we want to be able to sort this list.

The way this is normally done is like this:

myList.Sort((s1, s2) => s1.SortId.CompareTo(s2.SortId));

If you add this little bit of code right before your foreach statement you get the following output:

image

This works great. Now, we want to add a new property to our object. Let’s call is Value for simplicity.

Here is our new object:

class MyObject
    {
        public string SortId { get; set; }
        public string Value { get; set; }
    }

We are now going to update or object initialization with the new property:

var myList = new List<MyObject>
{
    new MyObject{SortId = "D", Value = "Value1"},
    new MyObject{SortId = "B", Value = "Value3"},
    new MyObject{SortId = "C", Value = "Value4"},
    new MyObject{SortId = "A", Value = "Value2"}
};

So, now we want to sort of the Value column or the SortId column.

Normally we would just add an if statement based on some parameters on which one we wanted to sort by.

Something like this (yes, this would be done a lot better if we were to pursue this avenue):

string sortBy = "Value";
            if (sortBy == "SortId")
            {
                myList.Sort((s1, s2) => s1.SortId.CompareTo(s2.SortId));
            }
            else
            {
                myList.Sort((s1, s2) => s1.Value.CompareTo(s2.Value));

            }

So, how are we going to write our sorting without all these giant If() statements in our code? I want to avoid as much code in my business logic as possible. Especially when it comes to doing if() statements.

Let’s look at the Func<,> method in c#

I am not going to go into detail on what Func really is, you can read up on it here: MSDN : Func

We are going to create an extension methos off of MyObject. This extension method will allow us to call the .Sort() directly from our object.

Let’s take a look at our method. This is the complete method which does the sort.

 

    public static class ObjectExtension
    {
        public static List<MyObject> Sort<T>(this List<MyObject> o
                                                , Func<MyObject, T> keySort)
                                             where T : IComparable
        {
            o.Sort((a, b) => keySort(a).CompareTo(keySort(b)));
            return o;
        }
    }

 

So what is happening here?

We are passing in an expression (func) which will extract the value which will be used to compare. You add the constraint where T: IComparable to allow the expression to be compared. If you do not have this you will get a compile error saying that CompareTo does not exist.

Okay, so how do I call something like this?

First, before we continue, make sure you make your MyObject class public, so your extension method will work.

    public class MyObject
    {
        public string SortId { get; set; }
        public string Value { get; set; }
    }

 

Now, we are going to update our program to call the extension method. We will have the same object initialization the only difference is we will be calling our extension method to do the sort.

static void Main(string[] args)
        {
          var myList = new List<MyObject>
                                 {
                                   new MyObject{SortId = "D", Value = "Value1"},
                                   new MyObject{SortId = "B", Value = "Value3"},
                                   new MyObject{SortId = "C", Value = "Value4"},
                                   new MyObject{SortId = "A", Value = "Value2"}
                                };

            myList.Sort(myObject => myObject.Value);

            foreach (MyObject o in myList)
            {
                Console.WriteLine(o.SortId + " - " + o.Value);
            }

           Console.ReadLine();
        }

As you can see we are calling myList.Sort() but this time we are using an expression to say, use this property to sort by.

If we run this we get the following result (which is what we want):

image

Great! We have accomplished the core of this article, which was to dynamically sort our objects based on the parameters.

Now, when you actually call your sort method, you will probably need to provide some if() statements in order to change which property you are going to pass in. The beauty in this is your client is now in control of which parameters to sort by and as the business layer, you don’t need to worry about the sorting, you just provide the functionality to sort based on whatever parameter is passed in (which falls into the Open Closed Principle, your business logic is open for extension but closed for modification).

 

One of the key problems with LINQ is you cannot do cross data context joins. Now, there is a good reason why you can’t do this using the standard LINQ framework. You can run into security issues and you can run into potentially strange data results.

Now, typically when you have multiple data contexts you are connecting to multiple databases for each context. There are some cases when you would want to have multiple data contexts for the same database.

In the example I am going to show you how you can have 2 databases, with 2 data contexts and join them together in the same query.

Before we begin, I am going to be setting up some test databases with some tables and test data. This is to show that you can actually have 2 different databases and have 2 different data contexts and join them together.

Step 1:

Creating the database:

Create database DatabaseA
GO
Create Database DatabaseB
GO

 

Now that we have our databases, we can create our tables and insert some dummy data.

Create Table DatabaseA.dbo.TableADatabaseA
(
    PrimaryKeyA INT IDENTITY(1, 1)
    , TableAValue CHAR(1)
)
GO
Create Table DatabaseB.dbo.TableBDatabaseB
(
    PrimaryKeyB INT IDENTITY(1, 1)
    , TableAJoinValue INT
    , TableBValue CHAR(1)
)
GO
INSERT INTO DatabaseA.dbo.TableADatabaseA VALUES ('A')
INSERT INTO DatabaseB.dbo.TableBDatabaseB VALUES (1, 'B')
GO

Okay great. We now have two databases, two tables and a single row in each table to work with.

If we were to run this query in management studio:

 

select
TB.TableBValue from DatabaseB.dbo.TableBDatabaseB TB
INNER JOIN DatabaseA.dbo.TableADatabaseA TA
on TA.PrimaryKeyA = TB.TableAJoinValue

We would get the result of “B” in our query results.

We want to get the same result using LINQ.

Step 2:

In Visual Studio I created a sample project which contains my 2 data contexts:

image

In my ContextA I have the following table from our DatabaseA:

image

In my ContextB i have the following table from our DatabaseB:

image

 

Step 3:

Now that we have our database and our data contexts setup, we need to write some code!

Before we get into the actual working code, let’s look at code that we think would work:

 

var contextA = new ContextADataContext();
var contextB = new ContextBDataContext();
var query = from a in contextA.TableADatabaseAs
                   join b in contextB.TableBDatabaseBs
                     on a.PrimaryKeyA equals b.TableAJoinValue
                        select b;
Console.WriteLine(query.First().TableBValue);

This code looks just fine. However, if you were to execute it, you will get the following error:

 

image

So, how are we going to fix this?

We are going to create two methods. Each method will be  returning IEnumerable<T>

Let’s create those methods so you can see what I am talking about.

Here is the code for those methods:

static IEnumerable<TableADatabaseA> GetTableA()
        {
            var context = new ContextADataContext();
            return (from t in
                        context.TableADatabaseAs
                    select t).AsQueryable();

        }

        static IEnumerable<TableBDatabaseB> GetTableB()
        {
            var context = new ContextBDataContext();
            return (from t in
                        context.TableBDatabaseBs
                    select t).AsQueryable();
        }

 

As you can see, I am returning an IEnumerable<T> but the query is being returned as AsQueryable(). This is the key, returning AsQueryable(). If you were to return as AsEnumerable() you would get the same error.

Now, for our main query:

            var query = from a in GetTableA()
                        join b in GetTableB()
                        on a.PrimaryKeyA equals b.TableAJoinValue
                        select b;
            Console.WriteLine(query.First().TableBValue);

As you can see, our tables turn into methods.

If you were to execute this you now see this, which is the result we want!

image

Congratulations! We now have a cross data context JOIN!!

Now, with this you do end up with a drawback. When it executes your queries it will execute them separately and then do the join after the queries execute. Which means you lose the lazy loading.

If you look at SQL profiler you can see this happening when you execute this query:

image

While this may not be the best practice for SQL, it does provide you with a solution to actually do a JOIN on multiple data contexts.

I hope that this helps you get a good start on how to do a cross data context join.

One of the major issues that I have come across in the past with writing tests for my application is when I get to the point where I need to test data.

There are a couple of things you can do with this.

1. You can create a data base with actual data that you can query by changing your connection string to this temporary database.

2. You can create fake data and use dependency injection to test your business logic.

First, let’s talk about creating a test database. While this is good you are relying on the fact that you actually have to be able to connect to this database. What happens when you are in a continuous integration environment and the server that does all the builds does not have access to the server with the database?

This brings us to the next topic. Fake Data!

So, some of you may be pondering what fake data really is.

What is not fake data?

Fake data is not a database in some SQL server, or Oracle if you prefer.

What fake data is.

Fake data is data that you create either in your objects or in some other static location, such as xml. I would not recommend xml, because you now have to work with opening new larger memory objects just to get your fake data.

So what does this all mean?

Okay, let’s begin with some code. It is always easier to explain a coding concept with code.

First of all, we need a base set of code to work from. I have here a really basic set of classes with methods as shown:

using System;

namespace FakeDataProject
{
    class Program
    {
        static void Main()
        {
            var myBusinessClass = new MyBusinessClass();
            MyObjectForTest result = myBusinessClass.MyMethodThatReturnsString();
            Console.WriteLine(result.Value1);
            Console.ReadLine();
        }
    }

    class MyObjectForTest
    {
        public string Value1 { get; set; }
        public string Value2 { get; set; }
    }

    class MyBusinessClass
    {
        public MyObjectForTest MyMethodThatReturnsString()
        {
            var dataLayerClass = new MyDataLayerClass();
            return dataLayerClass.MyDataLayerMethodThatReturnsString();
        }
    }

    class MyDataLayerClass
    {
        public MyObjectForTest MyDataLayerMethodThatReturnsString()
        {
            //this really calls the database 
            //(and no, this is not the fake data yet)
            //however this will give you an idea of where we are going for testing...
            return new MyObjectForTest
                       {
                           Value1 = "Some value 1",
                           Value2 = "Some value 2"
                       };
        }
    }
}

So, in this code we have the following:

MyObjectForTest – This is our generic object which we will be using to create fake data off of.

MyBusinessClass – This is our core logic.

MyDataLayerClass – Our DAL, where we are going to connect to the database.

 

If you were to copy and paste this code, it will run and you will see a result. Your result should be “Some value 1”

Okay now that all that works. We need to move into the part where we create fake data. If you noticed, we already have fake data in our MyDataLayerClass. While this is true, it is not the ultimate result we want. In reality we will have a database connection.

Now, because we want to use your favorite testing framework for this next step, your code may vary slightly from mine. I will be using NUnit and Resharper.

I have created a new class called Tester.cs

I have placed in my tester class one test with a very basic test.

using NUnit.Framework;

namespace FakeDataProject
{
    [TestFixture]
    public class Tester
    {
        private MyBusinessClass BusinessClass;
        [SetUp]
        public void Setup()
        {
            BusinessClass = new MyBusinessClass();
        }

        [Test]
        public void Length_Of_MyObjectForTest_Variable1_Should_Be_12()
        {
            Assert.That(BusinessClass.MyMethodThatReturnsString().Value1.Length == 12);
        }
    }
}

 

When you run the test, you will get a pass.

Now, we need to get down to the gritty detail on how we are going to produce fake data.

There are a few framework changes that need to happen for this.

First of all, when you write Unit Tests and when you do TDD, you should always be using Interfaces. The reason for this is because you can create fake services which inherit the same interfaces that your real service will inherit.

So, let’s change our main code to use an interface on the service.

interface iMyDataLayerClass
    {
        MyObjectForTest MyDataLayerMethodThatReturnsString();
    }

    class MyDataLayerClass : iMyDataLayerClass
    {
        public MyObjectForTest MyDataLayerMethodThatReturnsString()
        {
            //this really calls the database 
            //(and no, this is not the fake data yet)
            //however this will give you an idea of where we are going for testing...
            return new MyObjectForTest
                       {
                           Value1 = "Some value 1",
                           Value2 = "Some value 2"
                       };
        }
    }

 

As you can see, I took the code from above and just placed an interface on the data layer class.

In our business layer, we need to utilize the interface versus the class.

I went ahead and created a private interface variable and instantiated the class in the method (we will break this out later.)

    class MyBusinessClass
    {
        private iMyDataLayerClass DataLayerClass;
        public MyObjectForTest MyMethodThatReturnsString()
        {
            DataLayerClass = new MyDataLayerClass();
            return DataLayerClass.MyDataLayerMethodThatReturnsString();
        }
    }

 

At this point, if you run your test again, it will still pass. Which means we were able to refactor the core logic and our code still works.

Here is where it gets fun. We are going to use a little thing called Dependency Injection to pass in the data service into the business class. Why are we going to do this? The reason for this is so we can do several things. For one, the business layer no longer is coupled with the data layer. We can now use multiple data layers with the same interface which will help your application be a bit more portable. You can now test your business logic with a fake data service!

Here is the business layer modified to have the interface passed into the business constructor:

class MyBusinessClass
    {

        private readonly iMyDataLayerClass DataLayerClass;

        public MyBusinessClass(iMyDataLayerClass dataLayerClass)
        {
            DataLayerClass = dataLayerClass ?? new MyDataLayerClass();
        }

        public MyObjectForTest MyMethodThatReturnsString()
        {
            return DataLayerClass.MyDataLayerMethodThatReturnsString();
        }
    }

What is happening here?

We have a private variable which is the interface for the data layer.

Our constructor has a parameter to pass in the interface. Inside the constructor, we have a null check for the interface, if it is null we will instantiate a new instance of the class.

Our method now only has 1 line of code, return the result of the data layer.

Granted, our business logic will probably have more to it, but again, this is just a demo.

At this point, if you build, you are going to get an error. The reason for this is your constructor now requires a parameter.

Part of TDD and writing Unit Tests, is your tests will evolve.

Before we update our tests, we need to make our main program work. We need to update our Main() method with the new constructor:

static void Main()
        {
            iMyDataLayerClass dataLayerClass = new MyDataLayerClass();
            var myBusinessClass = new MyBusinessClass(dataLayerClass);
            MyObjectForTest result = myBusinessClass.MyMethodThatReturnsString();
            Console.WriteLine(result.Value1);
            Console.ReadLine();
        }

Now that we have done that, we can update our testing class.

Here is the code with the update:

private MyBusinessClass BusinessClass;
        private iMyDataLayerClass IMyDataLayerClass;

        [SetUp]
        public void Setup()
        {
            IMyDataLayerClass = new MyDataLayerClass();
            BusinessClass = new MyBusinessClass(IMyDataLayerClass);
        }

As you can see, we have a new interface for the data layer and in the setup we are instantiating the class for the data layer.

If you run your test, it should still pass.

Now, we finally get to the point where we can create the fake data.

Inside the tester.cs file I am going to create a new class called FakeDataService. This class will inherit the interface that we created for the real data service:

    class FakeDataService : iMyDataLayerClass
    {
        public MyObjectForTest MyDataLayerMethodThatReturnsString()
        {
            throw new NotImplementedException();
        }
    }

As you can see, we now have all the methods that our interface requires. What we want to do at this point is update our test fixture to use the new fake data service. We update the Setup() method as follows:

        [SetUp]
        public void Setup()
        {
            IMyDataLayerClass = new FakeDataService();
            BusinessClass = new MyBusinessClass(IMyDataLayerClass);
        }

Now, run your test. You will get a failure. If you get a System.NotImplementedException, then you are doing the right thing. You now know that you are  using the new fake data service instead of the real data service.

Now we need to make our test pass. Here is where the fake data comes into play.

You can do this one of a couple of ways.

The best practice is to create an ObjectMother class, which contains all your objects with “fake” data. You can create methods in your ObjectMother such as:

public static MyObject GetValidObject()

public static MyObject GetInvalidObject()

With these methods you can return an instance of that object with valid/invalid data. Each of which you can run tests against. This is important because sometimes your real data layer is going to return invalid data. Doing this will enable you to reuse that valid/invalid data for other tests.

So in this example, I will create a small ObjectMother class which we will use to get our fake data.

class ObjectMother
    {
        public static MyObjectForTest GetValidData()
        {
            return new MyObjectForTest
                       {
                           Value1 = "123456789012",
                           Value2 = "000"
                       };
        }
    }

 

In my fake data service, I will be calling the ObjectMother to get the valid data:

class FakeDataService : iMyDataLayerClass
    {
        public MyObjectForTest MyDataLayerMethodThatReturnsString()
        {
            return ObjectMother.GetValidData();
        }
    }

So, what we are doing here, is we are pretending we are connecting to a database. The object mother is the database, the fake service is the data layer and we are passing in that data layer, using Dependency Injection, into our business layer to handle our logic.

If we run our test after adding this code. It should pass.

One of the ways to know that this is working is to debug your test. When you debug you can see that when you get into your business layer, it is really using the fake data service and not the real one.

Also remember, that the main point of this is to test your business logic based on data that you would receive from the database. You don’t want to test the database to see if that works or not. Your goal should be to test the business layer and your business layer should handle if your data layer does not return valid data.

This article is a continuation of Cursors. Previous Parts can be viewed here:

Cursors : An In Depth Look – Part 1

Cursors : An In Depth Look – Part 2 

Cursors : An In Depth Look – Part 3

Now that we have looked at query execution time, I/O and CPU statistics we are now going to look at overall system performance when running these operations.

 

One of the greatest tools for a DBA is the performance monitor. You can see what your system is doing at the exact event of a query that is being executed.

So we have executed our query once in each scenario. What happens if you execute the query thousands of times? Well, we should know what happens to our system when this happens because normal SQL servers don’t sit idle with 1 query execution every day.

In order to test this process, we can’t use the standard messages window or the execution plan. We need to use Profiler. Setting up profiler is a whole different beast and that will not be covered here.

To test this setup, I will be outputting my results to a table.

NOTE: This is normally a VERY bad thing and you should not do this. I am doing it for this example to save the time of exporting to a file and then back into a table.

I will be running these tests in the same order as the previous articles, Cursor, Set-Based and Set-Based with Temporary table.

One of the key differences in this test is I will be creating a dump table so that our cursor and set-based operations will actually do something versus just sitting there and looking pretty.

Create Setup Environment

First thing I am doing to do is create the dump table:

CREATE TABLE DumpTable (filler CHAR(200))

We also want to make sure that statistics are turned off and your actual execution plan is turned off from our previous examples.

You also want to set the nocount on, so that your message window does not get flooded with all the execution messages.

The Operations

We will be using the following cursor for your test. The cursor will process 100 rows.

DECLARE
    @keycol AS INT,
    @filler AS CHAR(200)
DECLARE C CURSOR FAST_FORWARD
    FOR SELECT
            keycol,
            filler
        FROM
            dbo.T1 WHERE keycol <= 100
OPEN C
FETCH NEXT FROM C INTO @keycol, @filler
WHILE @ @fetch_status = 0
    BEGIN
        INSERT INTO DumpTable VALUES(@filler)
        FETCH NEXT FROM C INTO @keycol, @filler
    END
CLOSE C
DEALLOCATE C
GO 1000

Here is the set-based operation we will be using (non-temporary table)

DECLARE
    @keycol AS INT

SELECT
    @keycol = keycol
FROM
    (SELECT TOP (1) keycol
      FROM dbo.T1 where keycol < 101) AS D
WHILE @ @rowcount = 1
    BEGIN
        INSERT INTO DumpTable values (@keycol)

        SELECT
            @keycol = keycol
        FROM
            (SELECT TOP (1) keycol
              FROM dbo.T1 WHERE keycol > @keycol and keycol < 101
            ) AS D 

    END
GO 1000

Here is the set-based operation with temporary table that we will be using

DECLARE @Counter INT
DECLARE @TempValue CHAR(1)
DECLARE @MaxCount INT
DECLARE @keycol CHAR(200)
Create Table #tmpCursor
    (
      id int not null
             identity(1, 1)
             primary key,
      filler char(200)
    )

CREATE NONCLUSTERED INDEX [IX_CursorDump] ON #tmpCursor ( filler ASC )

INSERT INTO
    #tmpCursor
    SELECT
        filler
    FROM
        dbo.T1
    TABLESAMPLE(100 ROWS) REPEATABLE(200)

SELECT @Counter = Count(*) FROM #tmpCursor
SET @MaxCount = 100
WHILE( @Counter < @MaxCount )

    BEGIN
        SELECT @keycol = keycol from #tmpCursor WHERE keycol = @counter
        INSERT INTO DumpTable values (@keycol)
        SET @Counter = @Counter + 1
    END

DROP TABLE #tmpCursor
GO 1000

While running profiler and performance monitor at the same time we get some interesting results.

For performance monitor I am monitoring the following statistics (These are just the main items out of 120 counters):

Logical Disk: Disk Reads/Sec

Logical Disk: Disk Writes/Sec

Logical Disk: Avg. Disk Bytes/Write

Physical Disk: Disk Read Bytes/Sec

Physical Disk: Disk Write Bytes/Sec

Logical Disk: Current Disk Queue Length

Processor: Interrupts/ Sec

SQL Server: Wait Statistics : Page IO Latch Waits

The Results

 

Okay, if you have ever run performance monitor you will know that it can be difficult to read. I am not going to go into super details on the following graphs. They are for visual representation only. The purpose here is to show you visually what is happening to your system when you run each of these operations. While some counters may be higher then others and visa versa, we are looking at how much overhead your machine has when running cursor or set-based operations.

The Cursor

Cursor-8

The Set-Based Operation (Non Temporary Table)

Cursor-9

The Set-Based Operation (Temporary Table)

Cursor-10

 

As you can see, there is a HUGE difference between the activity from each of the query types. Some of the processes for the temporary table are higher (Yellow Line, which is the Page IO Latch Waits). This means that the temporary table has a higher I/O rate then CPU and Memory. The way around this is to either use a table variable (different discussion on this) or make sure your tempdb is on a different disk (it should be anyway).

Conclusion

So, have we really learned anything here? Sure we have, we have learned that cursors have a huge amount of overhead. Set-based operations don’t have as much overhead.

So, what can you decide from this? Should I use a cursor or not? Well, like it was said in the beginning. It all depends on what you need it for. Just because you have to write a bit of extra code to make a temporary table and make your "fake” cursor work that way, does not make it bad. If you are looking for performance and do not want to kill your server, I would suggest the temporary table version of the cursor.

Please feel free to comment and add your input to these articles.

This article is a continuation of Cursors. Previous Parts can be viewed here:

Cursors : An In Depth Look – Part 1

Cursors : An In Depth Look – Part 2 

This section will go into the finer details of the cursor and set-based operations. We will be looking at what is going on behind the scenes to get a better understanding of how these operations work, and to show you how to calculate performance so you can determine the best approach for your query, either cursor or set-based.

We will be looking at the STATISTICS IO and the Query Execution Plan when we execute these queries.

Before we dive into these details, we are going to explore some definitions that you need to know when looking at the Query Execution Plan.

Copyright Information:

The following definitions were taken directly from “Inside SQL Server 2005 T-SQL Programming” by Itzik Ben-Gan.

Estimated I/O Cost and Estimated CPU Cost:

The estimated part of the operator’s cost associated with that particular resource (I/O or CPU). These measures will help you identify whether the operator is I/O or CPU intensive.

Estimated Operator Cost:

The cost associated with the particular operator.

Estimated Subtree Cost:

The cumulative cost associated with the whole subtree up to the current node.

In order to not overload our results window, we need to make sure we get a TOP 1 operation when doing the initial selects. This way we get 1 result set. The results will be slightly skewed but we are only really looking for the basic I/O operations for each type of query.

Before you run these queries, run the following command AND turn on the Actual Execution Plan by hitting Control-M on your keyboard.

SET STATISTICS IO ON

Also, we need to make sure you clear your cache for each type, so we can compare cold cache against warm cache results.

The query for clearing your cache is located in Part 2 of this article.

 

The Cursor

We are first going to execute the cursor. The query will be the same as our previous cursor query from Part 2 of this article, the only difference is we are going to add TOP 1 to the operation.

DECLARE
    @keycol AS INT
   ,@filler AS CHAR(200)
DECLARE C CURSOR FAST_FORWARD
    FOR SELECT TOP 1
            keycol
           ,filler
        FROM
            dbo.T1
OPEN C
FETCH NEXT FROM C INTO @keycol, @filler
WHILE @@fetch_status = 0
    BEGIN
        FETCH NEXT FROM C INTO @keycol, @filler
    END
CLOSE C
DEALLOCATE C 

 

When you execute your query, we are going to look at the message window.

Note: your results may vary slightly, however, all in all the results should be very close.

Cold Cache:

Table 'T1'. Scan count 1, logical reads 3, physical reads 2,
  read-ahead reads 0, lob logical reads 0, lob physical reads 0,
  lob read-ahead reads 0.

Warm Cache:


Table 'T1'. Scan count 1, logical reads 3, physical reads 0,
  read-ahead reads 0, lob logical reads 0, lob physical reads 0,
  lob read-ahead reads 0.

As you can see, table T1 did 1 scan, 3 logical reads and 2 physical reads total for the cold cache and the warm cache had the same except for the physical reads because the data was in the cache.

NOTE: If you do not use the FAST_FORWARD Command you will get an ADDITIONAL 3 logical reads from a ‘Worktable’

Logical reads are from memory, physical reads are from disk.

When you look at your execution plan you will see something like the following (this will be the same for both cold and warm cache).

Cursor Execution Plan

Because we have a clustered index on the primary key and we did not specify actual key columns, we got an Index Scan. If we had specified key columns we would have an index seek (which provides better performance).

Now, the key aspect of this is if you move your mouse over the Clustered Index Scan you will see something like the following:

Cursor-2

The scary part about this is the estimated number of rows at 1 million and the Operator Cost and Subtree Cost of 20.5966.

Set-Based Operations

We are going to run our set based operation (query taken from Part 2 but without the loop for demonstration purposes)

 

DECLARE
    @keycol AS INT,
    @filler AS CHAR(200) 

SELECT
    @keycol = keycol, @filler = filler
FROM
    (SELECT TOP (1) keycol,filler
      FROM dbo.T1 ORDER BY keycol) AS D 

SELECT
    @keycol = keycol,
    @filler = filler
FROM
    (SELECT TOP (1) keycol, filler
      FROM dbo.T1 WHERE keycol = @keycol
      ORDER BY keycol
    ) AS D

You will see that the STATISTICS IO is the same on the logical and physical reads. The best part about this is the execution plan:

Cursor-4

Here you can see that we only have 1 estimated row versus 1 million and the Operator Cost is significantly lower then before!

I ran the same process using a temporary table and with the results I have created a table to show you the output from the Execution Plan:

 

 

Cursor

Set-Based /w Top

Set-Based /w Temp

Estimated I/O

19.4965

19.4965

0.006250

Estimated CPU

1.10016

1.10016

0.000336

Estimated Operator Cost

20.5966

0.0032831

0.006586

Estimated Subtree Cost

20.5966

0.0032831

0.006586

Estimated Number of Rows

1000000

1

1

As you can see, running the cursor (even with the TOP 1 operator) will still have significant overhead on the machine. Using a set-based operation has less Operator and Subtree cost and significant estimated number of rows. Using the temporary table method, you can see that all numbers are drastically reduced.

So, just by looking at this, you can see that a Set-Based operation using a temporary table for row by row manipulation has the best impact on the machine.

 

Continue to Cursors: An In Depth Look Part 4

This article is a continuation of Cursors : An In Depth Look – Part 1

Testing Cursors and Set-Based Operations (Single Query Execution)

To help speed up the testing (not the process its self)  you can check the box “Discard Results After Execution” option in SSMS. This will ensure that when your query executes it will throw away the data and will not generate the output, hence giving you a little more time for testing.

We are first going to setup our test environment and then we will scan the same amount of data with set-based code and a cursor. We will be testing with 1 million records, each row containing a bit more than 200 bytes.

Note: Testing results may vary on the computer you are using. Hard drive and processor differences will drastically change the results of these operations.

If you would like to follow along with the examples I have step by step instructions on what you need to do. I entered these examples as I was doing them in SSMS so there should not be anything missing.

Setup the Environment

We are going to create a temporary database for these examples. This same database will be used in all parts of this article.

Create the database:

USE Master
IF EXISTS ( SELECT
                NULL
            FROM
                sys.databases
            WHERE
                name = 'CursorTest' )
    BEGIN
        ALTER DATABASE CursorTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
        DROP DATABASE CursorTest
    END

GO
CREATE DATABASE CursorTest
GO
USE CursorTest

Next we are going to create a table that we will be using to scan data from..

IF OBJECT_ID('dbo.T1') IS NOT NULL
    DROP TABLE dbo.T1
GO
CREATE TABLE T1
    (
     keycol INT NOT NULL
                IDENTITY(1, 1)
    ,filler CHAR(200)
    )

Okay, now we are going to insert the 1,000,000 records into this table. You can do this simply by adding an insert statement with GO 1000000 after the statement (by the way, this is an undocumented feature of TSQL).

INSERT INTO T1 (filler) VALUES ('a')
GO 1000000 

When you execute this statement you will see in your messages window, after it has finished, the following message:

Beginning execution loop
Batch execution completed 1000000 times.

Now that you have a million rows, you want to apply a clustered unique index to the table. This will improve your performance greatly. Not only that, we don’t want to wait hours for a simple test like this.

CREATE UNIQUE CLUSTERED INDEX idx_keycol on dbo.T1(keycol);


Okay, we now have our data setup. We can write the code for the cursor and the set-based operation. Before we run the code we want to make sure that our cache is cleared so we do not use memory for this operation. In a production environment, your queries will most likely hit the cache, but for demonstration purposes, we want to create as much as a what-if environment as possible.

So, how do you clear the cache in SQL? Run the following statement:

DBCC DROPCLEANBUFFERS;


We will be using this statement quite a bit during this process, so if I am referring to the “Clear Cache” statement, we will be running this statement.

Let the Testing Begin

Okay, we are finally to the fun part. The actual testing. We are going to run a query against what is called “cold cache” and then run the same query against what is called “warm cache”.

Make sure you have executed your clear cache statement first.

Execute the following query:

SELECT keycol, filler FROM dbo.T1

When running the query against cold cache, it took 6 seconds on my machine to execute.

Note: If you had not enabled the “Discard Results After Execution" your query would have taken substantially longer to produce the results in your output window.

Execute the query again. You are now running this against the warm cache. This time it only took 1 second to execute on my machine. So, as you can see, cached queries will outperform non cached queries quite a bit.

The Cursor

Now we are going to test our cursor operation.

Clear your cache with the clear cache statement and write the following code. We are going to execute the code twice, once against cold cache and the other against the warm cache. Initially we are only going to look at the speed of the query execution. We will dive into the statistics a bit later.

DECLARE
    @keycol AS INT
   ,@filler AS CHAR(200)
DECLARE C CURSOR FAST_FORWARD
    FOR SELECT
            keycol
           ,filler
        FROM
            dbo.T1
OPEN C
FETCH NEXT FROM C INTO @keycol, @filler
WHILE @@fetch_status = 0
    BEGIN
        FETCH NEXT FROM C INTO @keycol, @filler
    END
CLOSE C
DEALLOCATE C

Running against cold cache on my machine this query took 27 seconds to execute. When I executed it again against the warm cache, it took 15 seconds. Using the warm cache you can see that the query exhibited the same type of speed increase based on our initial query (1 second result). However, as you can see, we have a significantly higher execution time for both the cold and warm cache for the same query. The only difference is we now have the overhead of the cursor.

If you really look at the cursor, you will notice that it is not doing anything. It is just a simple cursor that loops. There is no complex row by row manipulation happening. Imagine the overhead if we added more complex code to it.

Set-Based Operation

First we want to clear our cache (using the clear cache statement above). We then want to execute the following set-based operation query.

DECLARE
    @keycol AS INT,
    @filler AS CHAR(200) 

SELECT
    @keycol = keycol, @filler = filler
FROM
    (SELECT TOP (1) keycol,filler
      FROM dbo.T1 ORDER BY keycol) AS D 

WHILE @@rowcount = 1
    BEGIN
        SELECT
            @keycol = keycol,
            @filler = filler
        FROM
            (SELECT TOP (1) keycol, filler
              FROM dbo.T1 WHERE keycol > @keycol
              ORDER BY keycol
            ) AS D
    END

Against the cold cache, this process took 29 seconds. This was 2 seconds slower then a cursor against the cold cache! Against the warm cache it took 11 seconds, which comes out to 4 seconds faster. At this point you are probably thinking there is not much of a difference between the set-based operation and the cursor.

You may also be thinking that a cursor might be faster then a set-based operation. Well, as I pointed out initially, they can be faster in some operations. Looking at the queries we have here, they are not full of complex operations inside of the loops.

The important part here is the process is slightly cleaner then working with a cursor. Also, it is not just the speed of the execution that is important, it is what goes on behind the scenes. We are getting to that, just be patient.

Temporary Table Option (Set-Based Operation Modified)

There is another way to write this query (which is not in the book) by using a temporary table. Using a temporary table would allow you to gather specific data first versus doing key column identification on a table.

Here is how the temporary table would work.

DECLARE @Counter INT
DECLARE @TempValue CHAR(1)

CREATE TABLE #tmpCursor
    (
      id INT NOT NULL
             IDENTITY(1, 1)
             PRIMARY KEY,
      filler CHAR(200)
    )

CREATE NONCLUSTERED INDEX [IX_CursorDump] ON #tmpCursor ( filler ASC )

INSERT INTO
    #tmpCursor
    SELECT
        filler
    FROM
        dbo.T1
    TABLESAMPLE(10 PERCENT)

SELECT @Counter = COUNT(*) FROM #tmpCursor
PRINT @Counter
WHILE( @Counter > 0 )
    BEGIN
        SET @Counter = @Counter - 1
    END

DROP TABLE #tmpCursor

As you can see, I used the TABLESAMPLE at 10 percent. This will gather ~10% of the rows in the table. Which is what you are most likely going to do when working with row by row operations. Now, if you really did need to process every single row in our 1 million rows in the table, I would not recommend using a temporary table as the overhead of filling up the tempdb would be tremendous versus using the TOP operator.

Running this query on cold cache took 10 seconds and returned 97432 rows. On warm cache it took 9 seconds at 98800 rows. Not much of a difference because we are gathering a random 10% of rows. If we use the REPEATTABLE(SEED) and gather the same rows each time, cold cache took 10 seconds at 100738 rows and warm cache took 5 seconds resulting in the same rows.

As you can see, if you need to do row by row operations with a specific set of query based data, you can get substantial improvements to your set-based operations.

 

Continue to Cursors : An In Depth Look Part 3

I am breaking this article out into a few parts, only due to the length of the content. I don’t want to overwhelm you with tons of reading that you will not finish.

So, first things first:

Copyright Information:

This information on Cursors comes from “Inside Microsoft SQL Server 2005 T-SQL Programming” written by Itzik Ben-Gan. All examples on this page are taken directly from Chapter 3 of the book. Some of the examples have been modified only to eliminate processes that occurred in previous chapters of the book, such as creating test data in other tables.

Okay, now that that is out of the way, let us get to the meat of the article. As stated in my copyright information, this information is taken from Chapter 3 of the book. One thing I really like about Inside Microsoft SQL Server 2005 T-SQL Programming is that it provides great in depth knowledge on how things work. The one thing that I wish it had was the gory details on how things work.

Well, I am here to provide you with the gory details on cursors.

Overview

Now, one way to start an argument is to talk about coding standards in a room full of developers. If you ever get into this situation you know you will be spending a ton of time figuring out what is the best approach on your current argument. Eventually you will be arguing about cursors. Some people love them and some people hate them.

Cursors are something in SQL that can be a good thing and a bad thing and both at the same time.

The purpose of this article is to show you the good and the bad and also to show you how you can use set-based operations to avoid cursors. According to Itzik Ben-Gan the goal of his chapter on cursors is to “show you how to use them wisely”.

Why Avoid Cursors

Cursors conflict with the foundation of the relational model. When you use a cursor you are applying procedural logic versus using set-based logic.

This means that you write code with iterations and you focus on the “how” to deal with data.

Applying set-based logic you will typically, but not all cases, write far less code as you focus on “what” you want and not how to get it.

Why Use Cursors

Cursors have a lot of overhead involved with the row-by-row manipulation. You will need to calculate and estimate the cursor overhead of your query and make sure you cannot do your query with a set based operation before moving to a cursor. There are cases where a cursor will yield better performance then a set based operation. This article will show you how to see the performance differences between the set based operation and the cursor.

One important aspect of cursors is that they can request and assume ordered data as input. Queries can accept only a relational input, which by definition cannot assume a particular order. This is where the difference comes in when identifying the case where you may actually want to use a cursor as it might be faster. Examples of these are running aggregations and ranking calculations. The I/O cost involved with the cursor activity plus the cursor overhead might actually be lower then a set-based operation which does perform a greater amount of I/O.

What is the Cursor Overhead

For the most part, efficiently writing set-based solutions will outperform cursor-based solutions for two reasons:

  1. You empower the optimizer to do what it is good at, which is generating multiple valid execution plans and then choosing from the most efficient one. When you use a cursor-based solution, you are pretty much forcing the optimizer to go with a rigid plan that does not have much room for optimization.
  2. Row by row manipulation can create a ton of overhead. You can see that scanning a table with a simple query versus scanning with a cursor has a significant difference in the processing time.

 

Okay, so how do we go about “proving” that a cursor has more overhead then a set-based query? Well, we test it of course!

 

Continue to Cursors : An In Depth Look Part 2