Mining a Change Based Software Repository



Although state-of-the-art software repositories based on versioning system information are useful to assess the evolution of a software system, the information they contain is limited in several ways. Versioning systems such as CVS or SubVersion store only snapshots of text files, leading to a loss of information: The exact sequence of changes between two versions is hard to recover. In this paper we present an alternative information repository which stores incremental changes to the system under study, retrieved from the IDE used to build the software. We then use this changebased model of system evolution to assess when refactorings happen in two case studies, and compare our findings with refactoring detection approaches on classical versioning system repositories.

The nature of information found in software repositories determines what we can infer from it. Conversely, information missing from a software repository hampers the quality of the research we perform: What is stored in a repository is of prime importance. However, another characteristic limits the choice of

Open source software development should strive for even greater code maintainability
free download

Unlike the traditional closed source software (CSS), OSS can be freely used, modified, and redistributed. Its source code is also freely accessible A study of almost six million lines of code tracks how freely accessible source code holds up against time and multiple iterations

Static source code checking for user-defined properties
free download

Only a small fraction of the output generated by typical static analysis tools tends to reveal serious software defects. There are two main causes for this phenomenon. The first is that the typical static analyzer casts its nets too broadly, reporting everything reportable, rather

Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises
free download

ABSTRACT A source code plagiarism detection engine Plaggie is presented. It is a stand- alone Java application that can be used to check Java programming exercises. Plaggies functionality is similar with previously published JPlag web service but unlike JPlag, Plaggie

A convolutional attention network for extreme summarization of source code
free download

Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the models attention, but previous attentional

Using Heuristic Search Techniques To Extract Design Abstractions From Source Code .
free download

As modern software systems are large and complex, appropriate abstractions of their structure are needed to make them more understandable and, thus, easier to maintain. Software clustering tools are useful to support the creation of these abstractions. In this

Identifying authorship by byte-level n-grams: The source code author profile (scap) method
free download

Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code

Structured generative models of natural source code
free download

We study the problem of building generative models of natural source code (NSC); that is, source code written by humans and meant to be understood by humans. Our primary contribution is to describe new generative models that are tailored to NSC. The models are

Bimodal modelling of source code and natural language
free download

We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural

Reading source code .
free download

Source code is, among other things, a text to be read. In this paper I argue that reading source code is a key activity in software maintenance, and that we can profitably apply experiences and reading systems from text databases to the problem of reading source

Visualizing Software Product Line Variabilities in Source Code .
free download

Implementing software product lines is a challenging task. Depending on the implementation technique the code that realizes a feature is often scattered across multiple code units. This way it becomes difficult to trace features in source code which hinders

Source code instrumentation and quantification of events
free download

ABSTRACT Aspect-Oriented Programming is making quantified programmatic assertions over programs that otherwise are not annotated to receive these assertions. Varieties of AOP systems are characterized by which quantified assertions they allow, what they permit in the

A UNIX clone with source code for operating systems courses
free download

Students learn by doing, not by listening. Physicists and chemists have long understood this, which is why students in these fields are required to perform experiments in the laboratory and write up their findings. Computer scientists also realize this basic truth, so many courses

Architecture of a source code exploration tool: A software engineering case study
free download

We discuss the design of a software system that helps software engineers (SEs) to perform the task we call just in time comprehension (JITC) of large bodies of source code . We discuss the requirements for such a system and how they were gathered by studying SEs at

Phishing websites detection based on phishing characteristics in the webpage source code
free download

ABSTRACT World Wide Web Consortium (W3C) is the international standards organization for the World Wide Web (www). It develops standards, specifications and recommendations to enhance the interoperability and maximize consensus about the content of the web and

IRiSS-A Source Code Exploration Tool.
free download

Abstract IRiSS (Information Retrieval based Software Search) is a software exploration tool that uses an indexing engine based on an information retrieval method. IRiSS is implemented as an add-in to the Visual Studio .NET development environment and it allows

Source code review of the Diebold voting system
free download

This report is a security analysis of the Diebold voting system, which consists primarily of the AccuVote-TSX (AV-TSX) DRE, the AccuVote-OS (AV-OS) optical scanner, and the GEMS election management system. It is based on a study of the systems source code that we

Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code .
free download

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies

PDE4Java: Plagiarism Detection Engine For Java, Source Code : A Clustering Approach.
free download

The educational community across the world is facing the increasing problem of plagiarism. This widespread problem has motivated the need of an efficient, robust and fast detection procedure that is difficult to be achieved manually. The Plagiarism Detection Engine for Java

Intent operationalisation for source code generation
free download

In the research on software development, there was less achievement in an efficient general development methodology that could be effective and sufficient in dealing with a wide range of software problems related to different domains. Also a challenge of having a universal
repositories: the number of available case studies[12]. This pragmatic reason explains why researchers base themselves on popular repositories such as CVS and SubVersion despite their limitations. Indeed, most open source projects give free access to their repositories, including industrial-size case studies such as Apache, Eclipse, Mozilla or Linux. This large availability comes with a price. Versioning systems are designed to be used in a variety of contexts and hence must lower their assumptions about the objects they version. CVS and SubVersion – the most popular versioning systems in the open-source world – only assume that the objects they version are files. They are thus used in a variety of situations, from

Open source software development should strive for even greater code maintainability
free download

Unlike the traditional closed source software (CSS), OSS can be freely used, modified, and redistributed. Its source code is also freely accessible A study of almost six million lines of code tracks how freely accessible source code holds up against time and multiple iterations

Static source code checking for user-defined properties
free download

Only a small fraction of the output generated by typical static analysis tools tends to reveal serious software defects. There are two main causes for this phenomenon. The first is that the typical static analyzer casts its nets too broadly, reporting everything reportable, rather

Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises
free download

ABSTRACT A source code plagiarism detection engine Plaggie is presented. It is a stand- alone Java application that can be used to check Java programming exercises. Plaggies functionality is similar with previously published JPlag web service but unlike JPlag, Plaggie

A convolutional attention network for extreme summarization of source code
free download

Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the models attention, but previous attentional

Using Heuristic Search Techniques To Extract Design Abstractions From Source Code .
free download

As modern software systems are large and complex, appropriate abstractions of their structure are needed to make them more understandable and, thus, easier to maintain. Software clustering tools are useful to support the creation of these abstractions. In this

Identifying authorship by byte-level n-grams: The source code author profile (scap) method
free download

Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code

Structured generative models of natural source code
free download

We study the problem of building generative models of natural source code (NSC); that is, source code written by humans and meant to be understood by humans. Our primary contribution is to describe new generative models that are tailored to NSC. The models are

Bimodal modelling of source code and natural language
free download

We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural

Reading source code .
free download

Source code is, among other things, a text to be read. In this paper I argue that reading source code is a key activity in software maintenance, and that we can profitably apply experiences and reading systems from text databases to the problem of reading source

Visualizing Software Product Line Variabilities in Source Code .
free download

Implementing software product lines is a challenging task. Depending on the implementation technique the code that realizes a feature is often scattered across multiple code units. This way it becomes difficult to trace features in source code which hinders

Source code instrumentation and quantification of events
free download

ABSTRACT Aspect-Oriented Programming is making quantified programmatic assertions over programs that otherwise are not annotated to receive these assertions. Varieties of AOP systems are characterized by which quantified assertions they allow, what they permit in the

A UNIX clone with source code for operating systems courses
free download

Students learn by doing, not by listening. Physicists and chemists have long understood this, which is why students in these fields are required to perform experiments in the laboratory and write up their findings. Computer scientists also realize this basic truth, so many courses

Architecture of a source code exploration tool: A software engineering case study
free download

We discuss the design of a software system that helps software engineers (SEs) to perform the task we call just in time comprehension (JITC) of large bodies of source code . We discuss the requirements for such a system and how they were gathered by studying SEs at

Phishing websites detection based on phishing characteristics in the webpage source code
free download

ABSTRACT World Wide Web Consortium (W3C) is the international standards organization for the World Wide Web (www). It develops standards, specifications and recommendations to enhance the interoperability and maximize consensus about the content of the web and

IRiSS-A Source Code Exploration Tool.
free download

Abstract IRiSS (Information Retrieval based Software Search) is a software exploration tool that uses an indexing engine based on an information retrieval method. IRiSS is implemented as an add-in to the Visual Studio .NET development environment and it allows

Source code review of the Diebold voting system
free download

This report is a security analysis of the Diebold voting system, which consists primarily of the AccuVote-TSX (AV-TSX) DRE, the AccuVote-OS (AV-OS) optical scanner, and the GEMS election management system. It is based on a study of the systems source code that we

Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code .
free download

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies

PDE4Java: Plagiarism Detection Engine For Java, Source Code : A Clustering Approach.
free download

The educational community across the world is facing the increasing problem of plagiarism. This widespread problem has motivated the need of an efficient, robust and fast detection procedure that is difficult to be achieved manually. The Plagiarism Detection Engine for Java

Intent operationalisation for source code generation
free download

In the research on software development, there was less achievement in an efficient general development methodology that could be effective and sufficient in dealing with a wide range of software problems related to different domains. Also a challenge of having a universal
files to system documentation or binary files. Even if Estublier et al. write in [6], that one of the next steps for versioning systems research is to break the assumption of language independance, this is not yet the case in practice. We claim that these assumptions are too weak for researchers to perform precise research on

Open source software development should strive for even greater code maintainability
free download

Unlike the traditional closed source software (CSS), OSS can be freely used, modified, and redistributed. Its source code is also freely accessible A study of almost six million lines of code tracks how freely accessible source code holds up against time and multiple iterations

Static source code checking for user-defined properties
free download

Only a small fraction of the output generated by typical static analysis tools tends to reveal serious software defects. There are two main causes for this phenomenon. The first is that the typical static analyzer casts its nets too broadly, reporting everything reportable, rather

Plaggie: GNU-licensed source code plagiarism detection engine for Java exercises
free download

ABSTRACT A source code plagiarism detection engine Plaggie is presented. It is a stand- alone Java application that can be used to check Java programming exercises. Plaggies functionality is similar with previously published JPlag web service but unlike JPlag, Plaggie

A convolutional attention network for extreme summarization of source code
free download

Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the models attention, but previous attentional

Using Heuristic Search Techniques To Extract Design Abstractions From Source Code .
free download

As modern software systems are large and complex, appropriate abstractions of their structure are needed to make them more understandable and, thus, easier to maintain. Software clustering tools are useful to support the creation of these abstractions. In this

Identifying authorship by byte-level n-grams: The source code author profile (scap) method
free download

Source code author identification deals with identifying the most likely author of a computer program, given a set of predefined author candidates. There are several scenarios where digital evidence of this kind plays a role in investigation and adjudication, such as code

Structured generative models of natural source code
free download

We study the problem of building generative models of natural source code (NSC); that is, source code written by humans and meant to be understood by humans. Our primary contribution is to describe new generative models that are tailored to NSC. The models are

Bimodal modelling of source code and natural language
free download

We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural

Reading source code .
free download

Source code is, among other things, a text to be read. In this paper I argue that reading source code is a key activity in software maintenance, and that we can profitably apply experiences and reading systems from text databases to the problem of reading source

Visualizing Software Product Line Variabilities in Source Code .
free download

Implementing software product lines is a challenging task. Depending on the implementation technique the code that realizes a feature is often scattered across multiple code units. This way it becomes difficult to trace features in source code which hinders

Source code instrumentation and quantification of events
free download

ABSTRACT Aspect-Oriented Programming is making quantified programmatic assertions over programs that otherwise are not annotated to receive these assertions. Varieties of AOP systems are characterized by which quantified assertions they allow, what they permit in the

A UNIX clone with source code for operating systems courses
free download

Students learn by doing, not by listening. Physicists and chemists have long understood this, which is why students in these fields are required to perform experiments in the laboratory and write up their findings. Computer scientists also realize this basic truth, so many courses

Architecture of a source code exploration tool: A software engineering case study
free download

We discuss the design of a software system that helps software engineers (SEs) to perform the task we call just in time comprehension (JITC) of large bodies of source code . We discuss the requirements for such a system and how they were gathered by studying SEs at

Phishing websites detection based on phishing characteristics in the webpage source code
free download

ABSTRACT World Wide Web Consortium (W3C) is the international standards organization for the World Wide Web (www). It develops standards, specifications and recommendations to enhance the interoperability and maximize consensus about the content of the web and

IRiSS-A Source Code Exploration Tool.
free download

Abstract IRiSS (Information Retrieval based Software Search) is a software exploration tool that uses an indexing engine based on an information retrieval method. IRiSS is implemented as an add-in to the Visual Studio .NET development environment and it allows

Source code review of the Diebold voting system
free download

This report is a security analysis of the Diebold voting system, which consists primarily of the AccuVote-TSX (AV-TSX) DRE, the AccuVote-OS (AV-OS) optical scanner, and the GEMS election management system. It is based on a study of the systems source code that we

Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code .
free download

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source code according to a bug report remains a great challenge in software maintenance. Many previous studies

PDE4Java: Plagiarism Detection Engine For Java, Source Code : A Clustering Approach.
free download

The educational community across the world is facing the increasing problem of plagiarism. This widespread problem has motivated the need of an efficient, robust and fast detection procedure that is difficult to be achieved manually. The Plagiarism Detection Engine for Java

Intent operationalisation for source code generation
free download

In the research on software development, there was less achievement in an efficient general development methodology that could be effective and sufficient in dealing with a wide range of software problems related to different domains. Also a challenge of having a universal
evolution. Basing an analysis only on the successive versions of a source tree of code files implies a heavy pre-processing to raise the abstraction level beyond files and directories. Versioning systems have another limitation degrading the quality of the information they contain: They only update their repositories when a developer explicitly checks in his work. Ideally, updates to the repository should come often to be as small and precise as possible, but this cannot be guaranteed. This paper explores the benefits and drawbacks obtained by breaking the assumption that a popular repository must be used. We instead created a software repository designed to store a maximal amount of information about an evolving piece of software. In particular, we do not use a versioning system, but built from the ground up a change-based software repository which fetches domain-specific information from an Integrated Development Environment (IDE). Being change-based means that the evolution of a software system is not modelled as a sequence of versions anymore: Using an IDE allows us to store, as first-class citizens, the actual changes which were performed on the system to obtain its latest version. This model better matches the actual evolution of a system since we reproduce how developers actually change the system. To validate our approach we chose to study when a particular kind of changes, namely refactorings, are applied to a software system, based on two case studies. We compare the findings of this application of our approach with other work in which refactorings are detected in a classical versioning system repository

Free download research paper


CSE PROJECTS

FREE IEEE PAPER AND PROJECTS

FREE IEEE PAPER