Analyzing GitHub LINQ usage – Introducing LinqAnalyzer

Are you a C# developer? If you are reading this blog, we bet you are.

Do you use LINQ?

If you do, you know that while LINQ’s declarative nature makes very readable code – it’s almost impossible to debug. That’s why we felt that no debugging tool (e.g. OzCode) will be complete without the ability to debug LINQ code in a simple and intuitive way.

Want to try our amazing LINQ debugging feature?
Register to our Early Access Preview for free and get the latest bits!

If you know your LINQ, you know that there are two ways to use it – either use the fluent/extension method based API, or the SQL Query like syntax (in fact there’s a third way, which is to use both together). We were curious as to what “flavor” of LINQ most developers out there prefer – and if such a thing even exists.

Being such an important task for our future development, we decided to put our best man on the job. We gave him a specific objective, something along the lines of: “find a way to check how developers out there are using LINQ”…

What we got was a program that uses projects on GitHub to find out exactly what we wanted.

Architecture overview

The idea was to analyze GitHub, where quite a few open source projects live.

The application uses OctoKit.NET to acquire the repository’s metadata and its source code, which are then passed to several analyzers using Roslyn, both of which are long time GitHub tenets.

The flow of the program is simple:

  1. Download a few repositories’ metadata (name, url etc.)
  2. For each repository download a zipped source code
  3. Extract
  4. Analyze each solution found, and look for both types of LINQ API
  5. Save the results in MongoDB


linqananlyzerarchitecture

 

The source code for this project can be found at OzCode’s GitHub repository.

Points of interest

As far as the code goes it’s pretty straightforward, and you can go right ahead and browse through it.

The analysis part of the project is done using the following classes:

linqanalyzerdependencygraphshort

The driving force behind this whole process is called AnalyzerManager, and it performs the following tasks:

  1. Get a list of repositories from GitHub
  2. A few simple validations
  3. Run the following method for each repository
private async Task<RepositoryStatistics> AnalizeProjectAsync(Repository repository)
{
	var downloaded = await _codeRepository.DownloadSourceRepositoryCodeAsync(repository, ProjectTempDownloadDirectory)

	var projectFolder = _fileEngine.Extract(downloaded);

	var statistics = new RepositoryStatistics(repository);

	foreach (var semanticModel in _semanticModelFactory.CreateSemanticModels(projectFolder))
	{
		_semanticModelAnalyzer.Analyze(semanticModel, statistics);
	}

	return statistics;
}

 

As you can see this method downloads the repository’s source code (main branch), and then extracts to a temporary folder.

The interesting bit starts when calling SemanticModelFactory to create the semantic models which will later be analyzed. That class would find all of the solutions in the source folder and iterate each file in every project and try to build a semantic representation of that file.

The other interesting class is SematicModelAnalyzer, which in turn would analyze those semantic models using two classes: FluentLinqAnalyuzer and QueryLinqAnalyzer.

The result is returned using the RepositoryStatistics class, which is then saved to the DB.

The story so far

After running the application for a while, we got some interesting results. Obviously we could not sample the whole of GitHub, but after a few runs we ended up with a nice amount of repositories. Since we started with the most popular ones, we have some very interesting results regarding some big players…but all of that is the topic for the next blog post.

Stay tuned!