Skip to content

Commit d8b0836

Browse files
authored
2 parents 50609e1 + 341d107 commit d8b0836

File tree

4 files changed

+497
-16
lines changed

4 files changed

+497
-16
lines changed

README.md

Lines changed: 91 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,10 +113,99 @@ This is an example of a valid SPDX file:
113113

114114
This is the list of the metrics generated by this tool:
115115

116+
### Pony factor
117+
118+
The metric is defined as the number of individuals, who produce up to the
119+
first 50% of the total number of code contributions (in descending order)
120+
within a given time period.
121+
122+
A low Pony Factor implies a high dependency on these individuals, making the
123+
project vulnerable if they were to leave.
124+
125+
*Also known as: Lottery Factor, Bus Factor, Contributor Absence Factor.*
126+
127+
### Elephant factor
128+
129+
The metric is defined as the number of unique organizations producing up to the
130+
first 50% of the total number of code contributions (in descending order)
131+
within a given time period.
132+
133+
It was first defined by Bitergia, and it applies the concept of the
134+
Pony Factor metric and takes it to contributing Organizations.
135+
136+
Contributions are focused on Git commits, and the organization
137+
is determined by the email address of the commit author.
138+
139+
### Number of contributing organizations
140+
141+
This metric quantifies the total number of distinct organizations whose members
142+
have made contributions to an open source project over a specified period.
143+
144+
Contributions are focused on Git commits, and the organization
145+
is determined by the email address of the commit author.
146+
147+
### Number of organizations contributing recently
148+
149+
This metric quantifies the number of unique organizations whose members have
150+
actively made contributions to an open source project within the last 90 days.
151+
152+
Contributions are focused on Git commits, and the organization
153+
is determined by the email address of the commit author.
154+
155+
### Number of recent contributors
156+
157+
This metric quantifies the total count of unique individuals who contributed
158+
within the last 90 days.
159+
160+
### Number of recent commits
161+
162+
This metric counts the total number of commits made to the project within the
163+
last 90 days.
164+
165+
### Contributor Growth Rate
166+
167+
This metric measures the growth rate of active contributors, defined as the
168+
number of people sending one or more code contributions (in this case,
169+
Git commits) in a given period.
170+
171+
To calculate it, the period is split into two halves, and the number of active
172+
contributors in each half is compared. The growth rate is the difference
173+
between the second half and the first half, divided by the number of active
174+
contributors in the first half.
175+
176+
```math
177+
GrowthRate (t_1, t_2) = \frac{C_a(t_2) - C_a(t_1)}{C_a(t_1)}
178+
```
179+
180+
### Contributor Growth
181+
182+
This metric measures the growth of active contributors, defined as the
183+
number of people sending one or more code contributions (in this case,
184+
Git commits) in a given period.
185+
186+
To calculate it, the period is split into two halves, and the number of active
187+
contributors in each half is compared. Growth is the difference between the
188+
second half and the first half.
189+
190+
```math
191+
Growth (t_1, t_2) = C_a(t_2) - C_a(t_1)
192+
```
193+
194+
### Number of active branches
195+
196+
This metric refers to the count of branches within a project's version control
197+
repository (for this case, Git) that have seen recent development activity,
198+
usually indicated by new commits.
199+
200+
### Days since last commit
201+
202+
This metric shows the number of days since the last commit was submitted to the
203+
repository or the project.
204+
205+
### Other metrics
206+
116207
- Number of commits per repository
117208
- Number of developers per repository
118-
- Number of developers producing up to 50% of the total contributions
119-
- Number of companies producing up to 50% of the total number of code contributions
120209
- File type metrics (code, binaries or other)
121210
- Commit side metrics (added lines and removed lines)
122211
- Message size metrics (total, mean and median)

grimoirelab_metrics/metrics.py

Lines changed: 170 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,12 @@
2727

2828
from opensearchpy import OpenSearch, Search
2929

30-
from grimoirelab_toolkit.datetime import str_to_datetime
30+
from grimoirelab_toolkit.datetime import (
31+
str_to_datetime,
32+
datetime_utcnow,
33+
datetime_to_utc,
34+
InvalidDateError,
35+
)
3136

3237
logging.getLogger("opensearch").setLevel(logging.WARNING)
3338

@@ -52,15 +57,31 @@
5257
class GitEventsAnalyzer:
5358
def __init__(
5459
self,
60+
from_date: datetime.datetime | None = None,
61+
to_date: datetime.datetime | None = None,
5562
code_file_pattern: str | None = None,
5663
binary_file_pattern: str | None = None,
5764
pony_threshold: float = 0.5,
5865
elephant_threshold: float = 0.5,
5966
dev_categories_thresholds: tuple[float, float] = (0.8, 0.95),
6067
):
68+
# Define the default dates if not provided
69+
if from_date:
70+
self.from_date = datetime_to_utc(from_date)
71+
else:
72+
self.from_date = datetime_utcnow() - datetime.timedelta(days=365)
73+
if to_date:
74+
self.to_date = datetime_to_utc(to_date)
75+
else:
76+
self.to_date = datetime_utcnow()
77+
6178
self.total_commits: int = 0
79+
self.recent_commits: int = 0
6280
self.contributors: Counter = Counter()
63-
self.companies: Counter = Counter()
81+
self.contributors_growth: dict[str, set] = {"first_half": set(), "second_half": set()}
82+
self.organizations: Counter = Counter()
83+
self.recent_organizations: set = set()
84+
self.recent_contributors: set = set()
6485
self.file_types: dict = {"code": 0, "binary": 0, "other": 0}
6586
self.added_lines: int = 0
6687
self.removed_lines: int = 0
@@ -74,6 +95,8 @@ def __init__(
7495
self.last_commit: str | None = None
7596
self.first_commit_date: datetime.datetime | None = None
7697
self.last_commit_date: datetime.datetime | None = None
98+
self.active_branches: set = set()
99+
self._half_period = self.from_date + (self.to_date - self.from_date) / 2
77100

78101
def process_events(self, events: iter(dict[str, Any])):
79102
for event in events:
@@ -82,9 +105,10 @@ def process_events(self, events: iter(dict[str, Any])):
82105

83106
event_data = event.get("data")
84107

85-
self.total_commits += 1
86-
self.contributors[event_data[AUTHOR_FIELD]] += 1
87-
self._update_companies(event_data)
108+
self._update_commit_count(event_data)
109+
self._update_branches(event_data)
110+
self._update_contributors(event_data)
111+
self._update_organizations(event_data)
88112
self._update_file_metrics(event_data)
89113
self._update_message_size_metrics(event_data)
90114
self._update_first_and_last_commit(event_data)
@@ -95,6 +119,9 @@ def get_commit_count(self):
95119
def get_contributor_count(self):
96120
return len(self.contributors)
97121

122+
def get_organization_count(self):
123+
return len(self.organizations)
124+
98125
def get_pony_factor(self):
99126
"""Number of individuals producing up to 50% of the total number of code contributions"""
100127

@@ -113,15 +140,15 @@ def get_pony_factor(self):
113140
return pony_factor
114141

115142
def get_elephant_factor(self):
116-
"""Number of companies producing up to 50% of the total number of code contributions"""
143+
"""Number of organizations producing up to 50% of the total number of code contributions"""
117144

118145
partial_contributions = 0
119146
elephant_factor = 0
120147

121-
if len(self.companies) == 0:
148+
if len(self.organizations) == 0:
122149
return 0
123150

124-
for _, contributions in self.companies.most_common():
151+
for _, contributions in self.organizations.most_common():
125152
partial_contributions += contributions
126153
elephant_factor += 1
127154
if partial_contributions / self.total_commits > self.elephant_threshold:
@@ -209,6 +236,48 @@ def get_developer_categories(self):
209236
"casual": casual,
210237
}
211238

239+
def get_recent_organizations(self):
240+
"""Return the number of recent organizations."""
241+
242+
return len(self.recent_organizations)
243+
244+
def get_recent_contributors(self):
245+
"""Return the number of contributors from the last 90d."""
246+
247+
return len(self.recent_contributors)
248+
249+
def get_recent_commits(self) -> int:
250+
"""Return the number of commits in the last 90d."""
251+
252+
return self.recent_commits
253+
254+
def get_growth_of_contributors(self):
255+
"""Return the growth of contributors by period."""
256+
257+
first_half = len(self.contributors_growth["first_half"])
258+
second_half = len(self.contributors_growth["second_half"])
259+
260+
return second_half - first_half
261+
262+
def get_growth_rate_of_contributors(self):
263+
"""Return the growth of contributors by period."""
264+
265+
first_half = len(self.contributors_growth["first_half"])
266+
second_half = len(self.contributors_growth["second_half"])
267+
268+
if first_half == 0 and second_half == 0:
269+
return 0
270+
elif first_half == 0 and second_half != 0:
271+
# It increased infinitely
272+
return second_half
273+
else:
274+
return (second_half - first_half) / first_half
275+
276+
def get_active_branch_count(self):
277+
"""Return the number of active branches."""
278+
279+
return len(self.active_branches)
280+
212281
def get_analysis_metadata(self):
213282
"""Return metadata about the analysis."""
214283

@@ -226,13 +295,78 @@ def get_analysis_metadata(self):
226295

227296
return metadata
228297

229-
def _update_companies(self, event):
298+
def get_days_since_last_commit(self):
299+
"""Return the number of days since the last commit."""
300+
301+
if not self.last_commit_date:
302+
return None
303+
304+
days_since_last_commit = (self.to_date - self.last_commit_date).days
305+
306+
return days_since_last_commit
307+
308+
def _update_commit_count(self, event_data):
309+
"""Update the commit count and commits by period."""
310+
311+
# Update total commits
312+
self.total_commits += 1
313+
314+
# Update commits by period
230315
try:
231-
author = event[AUTHOR_FIELD]
232-
company = author.split("@")[1][:-1]
233-
self.companies[company] += 1
316+
commit_date = str_to_datetime(event_data.get("CommitDate"))
317+
days_interval = (self.to_date - commit_date).days
318+
except (ValueError, TypeError, InvalidDateError):
319+
return
320+
321+
if days_interval <= 90:
322+
self.recent_commits += 1
323+
324+
def _update_contributors(self, event_data):
325+
author = event_data[AUTHOR_FIELD]
326+
327+
self.contributors[author] += 1
328+
329+
# Update contributor growth
330+
try:
331+
commit_date = event_data.get("CommitDate")
332+
commit_date = str_to_datetime(commit_date)
333+
except (ValueError, TypeError, InvalidDateError):
334+
commit_date = None
335+
336+
if commit_date and self._half_period:
337+
if commit_date < self._half_period:
338+
self.contributors_growth["first_half"].add(author)
339+
else:
340+
self.contributors_growth["second_half"].add(author)
341+
342+
# Update contributors by period
343+
try:
344+
commit_date = str_to_datetime(event_data.get("CommitDate"))
345+
days_interval = (self.to_date - commit_date).days
346+
except (ValueError, TypeError, InvalidDateError):
347+
pass
348+
else:
349+
if days_interval <= 90:
350+
self.recent_contributors.add(author)
351+
352+
def _update_organizations(self, event_data):
353+
try:
354+
author = event_data[AUTHOR_FIELD]
355+
organization = author.split("@")[1][:-1]
234356
except (IndexError, KeyError):
357+
return
358+
359+
self.organizations[organization] += 1
360+
361+
# Update organizations by period
362+
try:
363+
commit_date = str_to_datetime(event_data.get("CommitDate"))
364+
days_interval = (self.to_date - commit_date).days
365+
except (ValueError, TypeError, InvalidDateError):
235366
pass
367+
else:
368+
if days_interval <= 90:
369+
self.recent_organizations.add(organization)
236370

237371
def _update_file_metrics(self, event):
238372
if "files" not in event:
@@ -241,6 +375,7 @@ def _update_file_metrics(self, event):
241375
for file in event["files"]:
242376
if not file["file"]:
243377
continue
378+
244379
# File type metrics
245380
if self.re_code_pattern.search(file["file"]):
246381
self.file_types["code"] += 1
@@ -283,6 +418,19 @@ def _update_first_and_last_commit(self, event):
283418
self.last_commit = commit
284419
self.last_commit_date = commit_date
285420

421+
def _update_branches(self, event_data):
422+
"""Identify the refs that are branches and update the active branches."""
423+
424+
if "refs" not in event_data:
425+
return
426+
427+
for ref in event_data["refs"]:
428+
if "refs/heads/" not in ref:
429+
continue
430+
431+
branch_name = ref.split("refs/heads/")[1]
432+
self.active_branches.add(branch_name)
433+
286434

287435
def get_repository_metrics(
288436
repository: str,
@@ -331,6 +479,8 @@ def get_repository_metrics(
331479
events = get_repository_events(os_conn, opensearch_index, repository, from_date, to_date)
332480

333481
analyzer = GitEventsAnalyzer(
482+
from_date=from_date,
483+
to_date=to_date,
334484
code_file_pattern=code_file_pattern,
335485
binary_file_pattern=binary_file_pattern,
336486
pony_threshold=pony_threshold,
@@ -341,8 +491,16 @@ def get_repository_metrics(
341491

342492
metrics["metrics"]["total_commits"] = analyzer.get_commit_count()
343493
metrics["metrics"]["total_contributors"] = analyzer.get_contributor_count()
494+
metrics["metrics"]["total_organizations"] = analyzer.get_organization_count()
344495
metrics["metrics"]["pony_factor"] = analyzer.get_pony_factor()
345496
metrics["metrics"]["elephant_factor"] = analyzer.get_elephant_factor()
497+
metrics["metrics"]["recent_organizations"] = analyzer.get_recent_organizations()
498+
metrics["metrics"]["recent_contributors"] = analyzer.get_recent_contributors()
499+
metrics["metrics"]["recent_commits"] = analyzer.get_recent_commits()
500+
metrics["metrics"]["contributor_growth"] = analyzer.get_growth_of_contributors()
501+
metrics["metrics"]["contributor_growth_rate"] = analyzer.get_growth_rate_of_contributors()
502+
metrics["metrics"]["active_branches"] = analyzer.get_active_branch_count()
503+
metrics["metrics"]["days_since_last_commit"] = analyzer.get_days_since_last_commit()
346504

347505
if from_date and to_date:
348506
days = (to_date - from_date).days

tests/data/events.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1166,7 +1166,9 @@
11661166
"parents": [
11671167
"c2ecce0815cae0f293201541bbd557b1549774e7"
11681168
],
1169-
"refs": [],
1169+
"refs": [
1170+
"refs/heads/main"
1171+
],
11701172
"Author": "User One <[email protected]>",
11711173
"AuthorDate": "Thu Feb 22 18:01:08 2024 +0100",
11721174
"Commit": "User One <[email protected]>",

0 commit comments

Comments
 (0)