What is Proof of Contribution? (technical)

Proof of Contribution measures project impact by modeling open-source software projects, their versions, and their dependencies as nodes and edges in a graph. The algorithm, built upon the foundation of Google's PageRank, uses probability distribution to assign scores to the nodes—teaRank scores—which represent the likelihood of randomly navigating to a particular node.

Proof of Contribution uses various inputs and anti-spam mechanisms to accurately measure a software project’s impact. To output a teaRank score, the algorithm may consider these aspects of a software project:

  • Project influence, using the project’s number of dependents as a proxy

  • Project impact over time, using the age of the project as a proxy

Where does the data come from?

A project’s teaRank is the measure of its impact in the open-source software ecosystem, and that value depends largely on its number of dependents. However, in the ever-expanding landscape of software development, the efficiency and reliability of tracking said dependents can be very difficult. Proof of Contribution uses a revolutionary approach that achieves this by delving deep into the intricate web of dependencies, dependents, and correlations within project ecosystems.

Proof of Contribution uses an algorithm that retrieves data for a project's dependents and dependencies from package managers. This data is then correlated and used by the Proof of Contribution algorithm to calculate a project’s teaRank.

The package managers that we’ll be supporting in the tea Protocol V1 are:

How does data collection and processing work?

Proof of Contribution sources data on each individual project from the package managers that have the project in their registries. By sourcing data from multiple package managers, within a degree of accuracy, we can determine a project's impact on the open-source software ecosystem as a whole.

Here's a high-level step-by-step explanation of how the algorithm works:

  1. Query package managers: Proof of Contribution starts by querying package manager registries daily, to gather information on projects and their dependencies.

  2. Finds and indexes projects: Once a project is found, Proof of Contribution adds the project's metadata, including dependents and dependencies to its database and indexes it to query it later.

  3. Adds projects to the tea graph: Proof of Contribution builds a comprehensive OSS projects graph that maps the relationships between projects and their dependencies.

Using the built-in graph, Proof of Contribution can categorize incoming relationships as dependents and outgoing relationships as dependencies. The Proof of Contribution algorithm then uses this data to calculate a project’s teaRank. The image below shows that A depends on B, D, and F while all others depend on A or one another.

Measuring cumulative impact

Proof of Contribution evaluates how the impact of every project in the OSS ecosystem changes over time and delivers an analysis that demonstrates which OSS projects consistently deliver substantial value. To measure this, it splits the open-source graph into distinct time periods, or delta (δ).

For each time period, the algorithm calculates the teaRank of each project using information from previous periods to inform the teaRank calculation in the current period. This provides a nuanced and long-term history of a project’s significance, plus incorporates all the versions of a software project when determining its impact.

The number of δ time periods that Proof of Contribution uses for its analysis is an important design choice. Using too few δ intervals may not quantify the full impact of an OSS project, while using too many intervals can significantly disincentivize newly established projects. Using excessive δ intervals also creates a computational burden by requiring the tea Protocol to reconstruct the entire open-source graph δ times to generate teaRank.

Spam resistance

Proof of Contribution is modeled to resist spam based on two primary frameworks—tree and width attacks. Taking this approach to combat spam is effective because spam techniques for open-source software frequently involve hijacking existing software projects or creating complex dependency networks that contain malicious code.

Tree attack

Imagine a pyramid of software dependents. A malicious actor creates a long chain of dependent projects to artificially inflate the importance of each project in the chain, aiming to unfairly generate rewards for each software dependency.

Width attack

Picture a single software project surrounded by many dependent projects. A malicious actor creates a large number of false dependents to artificially inflate the impact of the project at the center.

Combating tree & width attacks

Proof of Contribution combats tree attacks in part by quantifying the longest dependency path that a software project traces to a project with no dependents—known as its tree limit. The algorithm defends against width attacks in part by calculating a software project’s width limit, or its total number of software dependents. Proof of Contribution consistently monitors tree and width limits over δ time intervals to identify—and potentially flag as spam—any abrupt changes to a software project’s position in the OSS ecosystem.

Proof of Contribution also defends against tree attacks by using the advanced notation Kappa, or κ, to throttle the influence of open-source software projects. κ is a measure of self influence, which is derived from Proof of Contribution’s use of self edges—edges from nodes (OSS projects) to themselves. Kappa ranges in value from 0 to 1.

κ (Kappa)

We designed κ based on the concept of influence throttling so that spammers find it difficult to take advantage of the ranking system, even when they gain control of a significant portion of projects in the network via existing or new projects.

The approach entails inducing self-edges on every project in the graph and assigning an edge weight between 0 and 1, which we represent as κ. All the remaining edges for each project are reindexed such that their edge weights sum to 1 - κ. We effectively constrict the flow of rank from one project to another. This mechanism has two effects:

  1. Mitigates the potential for spamming the network by inserting dependencies into projects that are deep in the dependency graph.

  2. Simplifies the problem to allow fair teaRank score for impactful base projects, i.e., ones residing at the bottom of the dependency stack

When designing the value of κ, we considered the implications of tree spam networks and the need to establish limits on the extent to which a spam network could receive undue influence. κ acts as an adjustable tool to fine-tune the influence a project has on itself, striking a balance between preventing spam and accurately measuring the impact of open-source projects.

δ (Delta)

While κ addresses the influence of projects in the immediate neighborhood, δ focuses on impact over time. It allows teaRank to capture the evolving impact of projects as the open-source ecosystem grows.

We define δ as a period in which we will iteratively construct the open-source graph. At each period, we will calculate the teaRank scores using the scores from previous periods as edge weights in the current period. Essentially, we capture the cumulative value and impact of projects over time, providing a holistic view of their significance.

Selecting an optimal value for δ requires some trade-offs. Lower δ values indicate that the graph is built at more frequent intervals, allowing teaRank to quickly adapt and identify projects that consistently exhibit significant impact across different periods. It is also computationally intensive and involves numerous graph recalculations. Higher δ values build the graph less frequently and allows newer projects to have an equal opportunity to demonstrate their value per cycle.

Balancing κ and δ

Assigning too much computational authority to κ can skew the distribution of rewards heavily toward established open-source software projects—while relying too heavily on δ may stifle innovation by disfavoring OSS projects that are just emerging. The Proof of Contribution algorithm is calibrated to avoid these pitfalls while simultaneously operating in a resource-efficient way.

The initial values of δ and κ have been defined by conducting extensive data analysis but can be adjusted. Starting during tea’s Incentivized Testnet phase, tea developers will closely monitor the performance of teaRank and potentially recommend adjustments to δ or κ to the teaDAO.

While teaRank parameters are necessary, solely relying on them may not be sufficient. Excessive tweaking of the algorithm to counter spam attacks risks distorting the true impact of each project. To strike the right balance, we have developed two strategies—limits and reward thresholds—that not only aid in spam detection but also introduce a cost to malicious actors for manipulating the system to acquire undeserved rewards.

Compression

To decrease the disparity between the most dependent on and the least dependent on projects, the tea Protocol compresses the raw teaRank scores by carrying out a few actions. These actions involve converting raw teaRank scores into their logarithmic equivalents which significantly reduces that disparity while keeping that score's relative magnitude. Through this compression, the disparity between the two projects changes from 2,500,000 times to 100,000 times.

Using this compression algorithm, the tea Protocol can differentiate between niche projects with low contributors and dependents, and highly used projects with a significant number of dependents and contributors. Through that differentiation, the tea Protocol assigns a teaRank reward threshold to be a value that is above that of niche projects and minimize spam rewards. The ideal threshold eliminates any token rewards distributed to malicious actors while minimizing the rewards lost by legitimate, impactful projects.

Rewards Thresholds and Limits

Proof of Contribution uses a combination of thresholds and limits to further boost its resistance to spam. The algorithm’s threshold and limit mechanisms are focused around project teaRank scores.

  • teaRank score threshold - The tea Protocol uses Proof of Contribution to calculate and distribute teaRank rewards only to protocol-registered OSS projects with teaRank scores above a governance-defined threshold.

  • teaRank score limit - Proof of Contribution applies edge weights and uses self edges to establish upper bounds on the teaRank score that a software project can obtain.

Software projects registered with the tea Protocol benefit from Proof of Contribution’s use of thresholds and limits, which allows TEA tokens to be distributed fairly among only OSS projects with substantial impact. The teaRank score threshold deters spammers, while the teaRank score limit constricts the ability of spam projects to earn undeserved rewards. Using thresholds and limits also aids in spam detection.

Proof of Contribution aims to set the teaRank score threshold to the precise level that eliminates the distribution of token rewards to malicious actors while minimizing the instances of legitimate, impactful projects with teaRank scores below that threshold. The algorithm aims to set the teaRank score limit to a level that simultaneously rewards long-standing, impactful OSS projects and welcomes important new projects to the OSS ecosystem.

Last updated