Collaboration and Engagement Measurement Demo Tool

I believe that Human Resources are the most important Assets of any project. The only single way to measure Engagement of them - to track Collaboration activities and contributions to team's growth.

Preamble

For the last several years, I have been working in big organizations. And to me, it looks like the same problem present in each of them – it’s very difficult to find a right person to make things happen. Official structure of the Company usually doesn’t reflect the actual state and real distribution of forces. Definitely, you know that feeling when only after 3-6 months working in a corporate environment you begin to understand who is more responsive, who you need to talk to get essential information and move further with your project. But it becomes significantly more complicated when you start working on cross-domain initiatives. You need to talk to several people you've never met before and try to build your productive collaboration quickly.

Collaboration drives projects.

I have already found the solution working fine for me. Before the first meeting with some person, I check the “Context”. This “Context” could be gathered with different resources available for you: the most reliable are, of course, internal corporate resources like code repository, confluence or other tools for collaboration. Data gives me some insight into the person I need to talk to and who is the main contributor in the target scope, but it is important to make a proper interpretation of this information. Some of the conclusions I have already come up with:
  - A lot of isolated hard work(like coding, accounting, etc) doesn’t mean that the person is keen to assist you with that, diving deep into only the duties is one of the ways to avoid teamwork.
  - Activities related to professional growth like workshops, learning sessions and talks might be misleading, be careful about them.
  - Commenting and improving someone else’s work has great meaning.
  - Statics should not be considered without trends, people might get tired or experiencing some tough life situations. (The worth case is to approach the right person at the wrong time)

Then I came up with the idea, why not to apply Data Engineering and some Data Science techniques to do this research automatically much effectively. When you manually do this investigation you can find an answer for the question “Is he a good person to talk to about X ?”, but I want to have an answer for the question “Who is a good person to talk to about X ?”

In general I believe in theory of Givers and Takers by Adam Grant. Check this TED talk for more details.

Idea
The initial idea is to crawl some platforms for communication(chat, social network, forum, etc) and persist all the communication activities. This data could be analyzed by different approaches in static and dynamic ways. The main requirements for selected platforms are completeness and openness. The requirement of Completeness is important for us because we want to be sure that the data is valuable and we haven't missed any important part of it. The requirement of Openness is important for demo project because I don't want to be limited by security settings during my research as I don't know how deep it will go.

This tool has only demonstrative objectives - to show what are the collected metrics and how they could be visualized for further investigation

Github as a Source of Data
For demo example, I have chosen Github as a platform of communication. Github is quite complete, very friendly and well-documented API for crawling. I understand that some of the conversations happen in Gitter and are invisible for me, but I suppose that the most valuable ones are "in place" i.e in the code
Each communication activity is persisted as a weighted directed edge between 2 objects(in this project the objects are persons) with timestamp. Hence I want to build capabilities to explore some specified scopes the SET of tags is assigned to each edge based on where it happened: in current implementation the tags are repository owner, repository name and programming language. As my initial idea is to calculate the amount of intent to communicate and to contribute the weights of edges should reflect exactly this aspect(starting a conversation, asking for advice should have a higher weight than answering) There are different types of communication between developers in Github.
1. Creation a pull request - the edge from developer to the person who merges it, low weight
2. Leaving a comment in pull request - the edges between the author of the comment and all participants above in particular conversation, entering into the conversation has high weight, responding is lower
3. Merging pull request - the edge from "merged by" person to the developer, low weight
4. Communication activities related to issues(not implemented yet because of some concerns)
So at the end of the day, I have a collection of weighted edges with tags assigned to each edge and timestamps.

Implementation

I have build an online application to validate this idea

Data Retrieval
To retrieve data you have to input list of tags into text-field and click Load button.

As the result all the edges containing at least one of input tags would be retrieved into different perspectives.
Two types of specifiers are supported:
'+' specifier - all the result edges should contain tags with '+' specifiers
'-' specifier - the edges containing '-' specifier would be exempted from result set
Examples:
- all the vertices with 'akka' tag
- all vertices with 'akka' or 'scala' tags
- all vertices with 'akka' and 'scala' tags
- all vertices with 'scala' tags but not related to 'akka'

When edges are loaded you can see the time range of retrieved data on the slider below.
By changing slider's pointers you can specify time range for retrieving edges.

Data Representation as Graph with Weighted Vertices

On the first tab Graph the Data is represented as Directed Graph with Weighted Vertices. The weight of Vertex is equal to sum of weights of all outgoing edges. The vertices with the maximum weight are highlighted with Green color. When you hover the mouse on a Vertex, highlighted information appears about the Vertex and all adjusted vertices. Gold label displays information about selected Vertex with following format:
<vertex name>: <vertex weight> (<number of adjusted vertices>)

Green labels display information about adjusted vertices with following format:
<vertex name>: <what percentage of the weight of the selected vertex is associated with adjusted vertex> (<number of adjusted vertices>)

Analysis of Trends

On the second tab Trends the Data is represented as Scatters for each vertex in result set. Legends could be disabled/enabled.

Analysis of Centrality
On the third tab the Data is represented as Graph with Vertices weighted according to theirs Betweenness Centrality(BC)
Betweenness Centrality is an important property of the topology for social graphs
It is equal to the number of the shortest paths from all vertices to all other vertices that pass through that vertex.

From Directed Graph to Undirected
For some exercises, I need to have Undirected graph instead of directed one. The initial graph I have is Directed Graph with Weighted Edges.
To build Undirected graph based on Directed graph I use the idea of the Coefficient of Reciprocity (CR)

So the Coefficient of Reciprocity is based on the relative difference between the amount of interaction from A to B and the amount of interaction from B to A.
If this difference is low CR converges to 1 but in the case when there is only one directed edge between A and B vertices or one of the directed edges is much less than inverse one then CR converges to 0.
User can leverage the transformation from Directed graph to Undirected using Epsilon parameter.
Epsilon parameter defines the minimum value of CF between A and B to have undirected edge to be added. Epsilon could be in range (0, 1] (0 is temporary excluded for computational reasons)

Examples:
When Epsilon = 0 % then undirected edge is added in case there is at least one edge A -> B or B -> A with any weight greater than 0;
When Epsilon = 100 % then undirected edge is added only in case if weight of edge A -> B equals to weight of edge B -> A;

If you want to have your repository crawled just contact me.
Any feedback, comments and especially critics are welcome.

Back to Project