TortoiseSVN统计中作者身份的百分比是多少?

 三个人999 发布于 2023-01-18 09:19
  • svn
  • 在TortoiseSVN的统计部分中,有一些称为作者身份的百分比.这是什么?这是怎么计算的?它怎么有用呢?

    1 个回答
    • 作者身份的百分比是一个度量标准,旨在量化每个提交者的贡献.

      从理论上讲,它确实应该是线条变化,但是在文件的整个历史中汇总,并且权重会减小.另外,可以应用某种启发式来减少仅空白变化的权重,例如缩进修正.粗略地说,这个指标应该回答"如果我想理解/修复/改进这部分代码,我应该与谁交谈"的问题.

      实际上这里是实际的代码:

      void CStatGraphDlg::GatherData()
      {
          // Sanity check
          if ((m_parAuthors==NULL)||(m_parDates==NULL)||(m_parFileChanges==NULL))
              return;
          m_nTotalCommits = m_parAuthors->GetCount();
          m_nTotalFileChanges = 0;
      
          // Update m_nWeeks and m_minDate
          UpdateWeekCount();
      
          // Now create a mapping that holds the information per week.
          m_commitsPerUnitAndAuthor.clear();
          m_filechangesPerUnitAndAuthor.clear();
          m_commitsPerAuthor.clear();
      
          int interval = 0;
          __time64_t d = (__time64_t)m_parDates->GetAt(0);
          int nLastUnit = GetUnit(d);
          double AllContributionAuthor = 0;
      
          // Now loop over all weeks and gather the info
          for (LONG i=0; i<m_nTotalCommits; ++i)
          {
              // Find the interval number
              __time64_t commitDate = (__time64_t)m_parDates->GetAt(i);
              int u = GetUnit(commitDate);
              if (nLastUnit != u)
                  interval++;
              nLastUnit = u;
              // Find the authors name
              CString sAuth = m_parAuthors->GetAt(i);
              if (!m_bAuthorsCaseSensitive)
                  sAuth = sAuth.MakeLower();
              tstring author = tstring(sAuth);
              // Increase total commit count for this author
              m_commitsPerAuthor[author]++;
              // Increase the commit count for this author in this week
              m_commitsPerUnitAndAuthor[interval][author]++;
              CTime t = m_parDates->GetAt(i);
              m_unitNames[interval] = GetUnitLabel(nLastUnit, t);
              // Increase the file change count for this author in this week
              int fileChanges = m_parFileChanges->GetAt(i);
              m_filechangesPerUnitAndAuthor[interval][author] += fileChanges;
              m_nTotalFileChanges += fileChanges;
      
              //calculate Contribution Author
              double  contributionAuthor = CoeffContribution((int)m_nTotalCommits - i -1) * fileChanges;
              AllContributionAuthor += contributionAuthor;
              m_PercentageOfAuthorship[author] += contributionAuthor;
          }
      
          // Find first and last interval number.
          if (!m_commitsPerUnitAndAuthor.empty())
          {
              IntervalDataMap::iterator interval_it = m_commitsPerUnitAndAuthor.begin();
              m_firstInterval = interval_it->first;
              interval_it = m_commitsPerUnitAndAuthor.end();
              --interval_it;
              m_lastInterval = interval_it->first;
              // Sanity check - if m_lastInterval is too large it could freeze TSVN and take up all memory!!!
              assert(m_lastInterval >= 0 && m_lastInterval < 10000);
          }
          else
          {
              m_firstInterval = 0;
              m_lastInterval = -1;
          }
      
          // Get a list of authors names
          LoadListOfAuthors(m_commitsPerAuthor);
      
          // Calculate percent of Contribution Authors
          for (std::list<tstring>::iterator it = m_authorNames.begin(); it != m_authorNames.end(); ++it)
          {
              m_PercentageOfAuthorship[*it] =  (m_PercentageOfAuthorship[*it] *100)/ AllContributionAuthor;
          }
      
          // All done, now the statistics pages can retrieve the data and
          // extract the information to be shown.
      
      }
      

      该指标受到Git统计数据的一些想法的启发(我将它们复制到这里,因为我发现它们很有趣,但链接很容易被破坏):

      Terminology
       There are four types of users: Maintainers, Developers, Bug-fixers, 
       and regular Users. The first three are all Contributors.
      
      Name: Maintainer (Contributor)
      Description: The Maintainer reviews commits and branches from other 
      Contributors and decided which ones to integrate into a 'master' branch.
      
      Name: Developer (Contributor)
      Description: The Developer contributes enhancements to the project, 
      e.g. they add new content or improve existing content.
      
      Name: Bug-fixer (Contributor)
      Description: The Bug-fixer locates 'bugs' (as something unwanted that 
      needs to be corrected) in the content and 'fixes' them.
      
      Name: User
      Description: The User uses the content, be it in their daily work or 
      every now and then for a specific purpose. 
      
      
      Use cases
      
      A model where other Contributors review commits is assumed in all use 
      cases. When referenced are made to a Contributor addressing another 
      Contributor to adjust their behavior as the result of data mined, it
      should be kept in mind that the Contributor should foremost be the one 
      to do this. Using this information to, say, spend more time checking 
      ones own commits for bugs when working on a specific part of the
      content on ones own accord is is often more effective then doing
      so only after being asked. </disclaimer>? :P
      
      Name: Finding a Contributor that is active in a specific bit of content.
      Description:
            Whenever a Contributor needs to know about other Contributors 
      that are active in a specific part of the content they query git for
      this information. This could be used to figure out whom to send a copy 
      of a commit (someone who has recently worked on the content a commit 
      modifies is likely to be interested in such a commit). This 
      information may be easily gathered with, say, git blame. Aggregating 
      it's output (in the background if need be to maintain speedy response
      times), it is trivial to determine whether a Contributor has more 
      commits/lines of change than a predefined amount. The main difference 
      with git blame is that it's output is aggregated over the history of 
      the content, for a specific Contributor, whereas git blame only shows 
      the latest changes.
      
      Name: Finding which commits touches the parts of the content that a 
            commit touches.
      Description:
            There are several reasons that one might want to know which 
      commit touches the parts of the content that a commit touches. This 
      may be implemented similar to how git blame works only instead of 
      'stopping' after having found the author of a line, the search 
      continues up to a certain date in the past.
      
      Name: Integrating the found 'bug introducing' commit with the git 
            commit message system.
      Description:
            When a Bug-fixer sends out a commit to fix a bug it might be 
      useful for them to find out where exactly the bug was introduced. 
      Using the 'which commit touched the content this commit touches' 
      technique optional candidates may be retrieved. After picking which of
      the found commits caused the bug, this information may then 
      automatically added to the commit's description. This does not only 
      allow the Bug-fixer to make clear the origin of their commit, but also 
      make it possible to later unambiguously determine a bug/fix pair. Note 
      that this is automated, no user input is required to determine which 
      commit caused the bug, only the picking of 'cause' commits requires 
      input from the user.
      
      Name: Finding the Author that introduce a lot of/almost no bugs to 
            the content.
      Description:
            Contributors might be interested to know which of the Developers 
      introduce a lot of bugs, or the contrary, which introduce almost no 
      bugs to the content. This information is highly relevant to the 
      Maintainer as they may now focus the time they spend on reviewing 
      commits on those that stem from Developers that turn out to often
      introduce bugs. On the other hand, Developers that usually do not 
      introduce bugs need less reviewing time. While such information is 
      usually known to the experienced Maintainer (as they know their main 
      contributors well), it can be helpful to new maintainers, or as a 
      pointer that the opinion of the Maintainer about a specific Developer 
      needs to be adjusted. Bug-fixers on the other hand can use this 
      information to address the Developer that introduces most of the bugs 
      they fix, perhaps with advice on how to prevent future bugs from being
      introduced.
      
      Name: Finding the Contributor that accepted a lot of/almost no bugs 
            into the content.
      Description:
            Similar to the finding Authors that write the bugs, there are 
      other Contributors that 'accept' the commit. Either passively, by not
      commenting when the commit is sent out for review, or actively, by 
      'acknowledging' (acked-by), 'signing off' (signed-off-by) or 'testing' 
      (tested-by) a commit. When actively doing so, this can later be traced
      and then be used in the same ways as for Authors.
      
      Name: Finding parts of the content in which a lot of bugs are 
            introduced and fixed
      Description:
            When a Developer decides to change part of the content, it would 
      be interesting for them to know that many before them introduced bugs 
      when working on that part of the content. Knowing this the Developer 
      might ask for all such buggy commits to try and learn from the 
      mistakes made by others and prevent making the same mistake. A 
      Maintainer might use this information to spend extra time reviewing a
      commit from a 'bug prone' part of the content. 
      
      Name: Finding parts of the content a particular Contributor introduces
            a lot of/almost no bugs to.
      Description:
            When trying to decide whether to ask a specific Contributor to 
      work on part of the content it might be useful to not only know how 
      active they work on that part of the content, but also if they 
      introduced a lot of bugs to that part, or perhaps fixed many. Similar 
      to the more general case, this can be split out between modifying 
      content and 'accepting' modifications. This information may be used to 
      decide to ask a Contributor to spend more time on a specific part of 
      the content before sending in a commit for review.
      
      Name: Finding how many bugs were introduced/fixed in a period of time
      Description:
            As bugs are recognized by their fixes, it is always possible to 
      match a bug to it's fix. Both commits have a time stamp and with those 
      the time between bug and fix can be calculated. Aggregating this data 
      over all known bug(fixes) the amount of unfixed bugs may be found over 
      a specified period of time. For example, finding the amount of fixed 
      bugs between two releases, or how many bugs were not fixed within one 
      release cycle. This number might then be calculated over several time
      frames (say, each release), after which it is possible to track 
      'content quality' throughout releases. If this information is then 
      graphed one can find extremes in this figure (for example, a release 
      cycle in which a lot of bugs were fixed, or one that introduced many). 
      Knowing this the Contributors may then determine the cause of such and 
      learn from that.
      
      Name: Finding how much work a contributor has done over a period of 
            time.
      Description:
            When working in a team in which everybody is expected to do 
      approximately the same amount of work it is interesting to see how 
      much work each Contributor actually does. This allows the team to 
      discuss any extremes and attempt to handle these as to distribute the 
      work more evenly. 
            When work is being done by a large group of people it is 
      interesting to know the most active Contributors since these usually 
      are the ones with most knowledge on the content. The other way around, 
      it is possible to determine if a specific Contributor is 'active 
      enough' for a specific task (such as mentoring).
      
      Name: Finding whether a Contributor is mostly a Developer or a
            Bug-fixer.
      Description:
            To all Contributors it is interesting to know if they spend most
      of their time fixing bugs, or contributing enhancements to the content.
      This information could also be queried over a specific time frame, for
      example 'weekends vs. workdays' or 'holidays vs. non-holidays'.
      

      在此输入图像描述

      2023-01-18 09:23 回答
    撰写答案
    今天,你开发时遇到什么问题呢?
    立即提问
    热门标签
    PHP1.CN | 中国最专业的PHP中文社区 | PNG素材下载 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
    Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有