Analyzing GitHub LINQ usage – the results

A quick recap – we’ve written a small application to analyze how C# developers are using LINQ. if you haven’t read the first part and want to learn about how we did it go to the previous blog post: Analyzing GitHub LINQ usage – Introducing LinqAnalyzer.

But before we begin let’s discuss what exactly happened once we run LinqAnalyzer.

Lies, damned lies, and statistics

After running LinqAnalyzer for a few hours we got very interesting results.

We’ve also found a few bugs along the way – which solved, you can see how in the webinar: Debugging Comple Code.

Analyzing many open source projects does have its share of challenges –  we’ve discovered that not all projects marked as “C#” did not parse well and that we could not create semantic models for them. AT the moment we’ve decided to leave those project out of our analysis and they were not included in the final results.

We did manage to gather 200 projects which seems enough in order to determine how C# developers are using LINQ. Among the projects we’ve sampled you can see some of the leading open source projects in our world -from many disciplines:

  • Caliburn.Micro and Prism from the MVVM world
  • FakeItEasy, nSubtitute and FluentAssertions of unit testing galore
  • SignalR, Nancy, AutoMapper, Newsoft.JsonReactiveUI – and more

I’ve exported the results to an excel file which you can download and analyze yourself.

Which flavor of LINQ developers use?

The first question we needed answered was how many C# projects use LINQ. From the projects we’ve checked it seemed that most  indeed do:

linqusage

Out of 200 projects less than 10% (19) do not use LINQ and out of the rest most use both Fluent and Query APIs.

Upon seeing those results we immediately understood that we need to support debugging of both LINQ flavours…

When we’ve set to add LINQ debugging capabilities to OzCode we thought that most developers prefer to use the Fluent/extension method based calls, and while we were right at least according to the results above – even more developers preferred to use both – in some cases even mixed one inside the other.

Lesson learnt – when you have a theory about your users needs it’s easier to perform an experiment and make sure you’re on the right track. This method is preferable to the usual method of arguing till you’re blue in the face.

Deep Diving into query usage

It was interesting to check the 9 projects who only used the query syntax and see what made them use that

Repository name Lines of code LINQ calls LOC per LINQ call Operators used
libgit2sharp 63813 64 15953.25 from
where
let
select
FakeItEasy 38045 63 9511.25 from
where
select
let
kudu 68304 40 13660.8 from
where
select
let
descending
Xamarin.Forms 26083 28 3260.375 from
join
equals
select
where
let
group
by
letsencrypt-win-simple 3526 18 1175.333333 from
where
select
shadowsocks-windows 24953 8 6238.25 from
let
where
select
react-native-windows 42230 4 21115 from
select
Exceptionless 33613 3 11204.33333 from
where
select
Bonobo-Git-Server 20359 2 10179.5 from
select

I expected to see a lot of let operators and multiple form calls – which is where the SQL like syntax shines but was amazed to find out this is not the case.

In fact looking at this table I can see that the bottom two (maybe three) repositories used simple from..where…select which I always found to be more readable using method calls.

It seems that some people prefer the query API and use extensively. I know I’m the other way around, I prefer to use method calls as oppose to the from x in y but I guess it’s mostly a matter of personal taste.

Most used LINQ operators

Now that I had populated the database I was able to run a simple query to count and sort the operators and find out which operators were used the most – out of 87,615 LINQ operators:

LINQ operators count

The mostly used operators were:

Select 9450
ToList 9013
Where 6584
ToArray 6350
Single 5790
Any 5479
from 5046
Count 4841
FirstOrDefault 4666
First 4311
select 4064
where 2333
Contains 1760
OrderBy 1499
All 1284
SelectMany 1262
Range 1179
Concat 982
Last 696

I’ve marked the query syntax operators in orange and there’s not surprises there – most used operators are from (which is kind of mandatory) followed by the classics – select and where.

By now we know enough about how developers prefer to use LINQ and so the fact that most of the operator on this list are from the fluent variety does not shock us. Just like in the query syntax the first 5 places have the basic Select (1st) and Where (3rd) and we also see ToList/ToArray – which feels like a bit of cheating since they’re not necessarily used as part of a “normal” LINQ query.

I find the fact that Single is more used than SingleOrDefault a good sign since it means that the code using it is not riddled by endless null checks although I’m left wondering why FirstOrDefault comes before First – although ther are pretty close.

Other than that I think anyone who ever used LINQ would find the results aligns with his/her experience.

Least used LINQ operators

Another interesting data we were after is which operators were least frequently used and we got the following:

Union 213
Repeat 200
Min 199
group 167
by 167
ThenBy 151
DefaultIfEmpty 146
descending 128
ThenByDescending 85
Intersect 84
Zip 80
ElementAtOrDefault 73
Average 65
TakeWhile 48
ascending 39
SkipWhile 35
ToLookup 29
Join 12
LongCount 8
GroupJoin 5

Note that we have a minor “feature” – group and by are shown as two different operators, in a way they are (at least implementation-wise).

Other than that we can see the “order by” operators – Decending/ThenByDecending and ascending are not that common. Another point of interest is that the fluent Join/GroupJoin is least used  – we’ve noticed that join (query API) was only used 383 – which means that developers do not use LINQ to join data that much – I guess they prefer to hold the data in the way easiest to consume – and use Where/Select instead. I get it, in code we can (and IMHO should) use pointers instead of trying to normalize data as if it’s saved in a rational database.

Looking at the list above you’ll notice that TakeWhile and SkipWhile came in the last 10 operators on the other hand Take (625) and Skip (655) are more popular – I guess there are more scenarios in which the simple form is easier to use and/or more readable.

Conclusion

Here at OzCode we’ve learnt quite a lot from this research and we promise the keep developing LINQ debugging according to the community needs.

We’ve learnt that we need to support the less used LINQ query syntax – especially for developers who write both LINQ syntax mixed one inside the other.

We’ve learnt which operators are more popular – and which are the “bread and butter” of C# developers.

All in all a good days work.

  • Greg Dennis

    While it’s a great idea to validate usage in order to determine what you need to support, I don’t think that you can draw conclusions of “popularity” from these results. The “least used” methods are that because either the opportunity or need to use them is less frequent (Join, Union, Intersect) or they are lesser known (Zip, DefaultIfEmpty). Maybe a weighted result is in order, but then you’d have to determine weights for each method…

  • bondsbw

    I just noticed, every repository in this analysis was modified between 10/8/2016 and 10/18/2016. I’m curious if that date bias is skewing your results.