cloc
Count Lines of Code
cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
Latest release: v1.84 (September 22, 2019)
cloc moved to GitHub in September 2015 after being hosted
at http://cloc.sourceforge.net/ since August 2006.
- Quick Start
- Overview
- Download
- License
- Why Use cloc?
- Other Counters
- Building a Windows Executable
- Basic Use
- Options
- Recognized Languages
- How it Works
- Advanced Use
- Complex regular subexpression recursion limit
- Limitations
- Requesting Support for Additional Languages
- Reporting Problems
- Acknowledgments
- Copyright
Quick Start ▲
Step 1: Download cloc (several methods, see below).
Step 2: Open a terminal (cmd.exe
on Windows).
Step 3: Invoke cloc to count your source files, directories, archives,
or git commits.
The executable name differs depending on whether you use the
development source version (cloc
), source for a
released version (cloc-1.82.pl
) or a Windows executable
(cloc-1.82.exe
). On this page, cloc
is the generic term
used to refer to any of these.
a file
a directory
an archive
We'll pull cloc's source zip file from GitHub, then count the contents:
a git repository, using a specific commit
This example uses code from
PuDB, a fantastic Python debugger.
each subdirectory of a particular directory
Say you have a directory with three different git-managed projects,
Project0, Project1, and Project2. You can use your shell's looping
capability to count the code in each. This example uses bash:
Overview ▲
Translations:
Arabic,
Armenian,
Belarussian,
Bulgarian,
Hungarian,
Portuguese,
Serbo-Croatian,
Romanian,
Slovakian,
Tamil
cloc counts blank lines, comment lines, and physical lines of source
code in many programming languages. Given two versions of
a code base, cloc can compute differences in blank, comment, and source
lines. It is written entirely in Perl with no dependencies outside the
standard distribution of Perl v5.6 and higher (code from some external
modules is embedded within
cloc) and so is
quite portable. cloc is known to run on many flavors of Linux, FreeBSD,
NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, z/OS, and Windows.
(To run the Perl source version of cloc on Windows one needs
ActiveState Perl 5.6.1 or
higher, Strawberry Perl,
Cygwin,
MobaXTerm with the Perl plug-in
installed,
or
a mingw environment and terminal such as provided by
Git for Windows.
Alternatively one can use the Windows binary of cloc
generated with PAR::Packer
to run on Windows computers that have neither Perl nor Cygwin.)
cloc contains code from David Wheeler's
SLOCCount,
Damian Conway and Abigail's Perl module
Regexp::Common,
Sean M. Burke's Perl module
Win32::Autoglob,
and Tye McQueen's Perl module
Algorithm::Diff.
Language scale factors were derived from Mayes Consulting, LLC web site
http://softwareestimator.com/IndustryData2.htm.
Install via package manager
Depending your operating system, one of these installation methods may work for you:
npm install -g cloc # https://www.npmjs.com/package/cloc
sudo apt install cloc # Debian, Ubuntu
sudo yum install cloc # Red Hat, Fedora
sudo dnf install cloc # Fedora 22 or later
sudo pacman -S cloc # Arch
sudo emerge -av dev-util/cloc # Gentoo https://packages.gentoo.org/packages/dev-util/cloc
sudo apk add cloc # Alpine Linux
sudo pkg install cloc # FreeBSD
sudo port install cloc # Mac OS X with MacPorts
brew install cloc # Mac OS X with Homebrew
choco install cloc # Windows with Chocolatey
scoop install cloc # Windows with Scoop
Note: I don't control any of these packages.
If you encounter a bug in cloc using one of the above
packages, try with cloc pulled from the latest stable release here
on github (link follows below) before submitting a problem report.
Stable release
https://github.com/AlDanial/cloc/releases/latest
Development version
https://github.com/AlDanial/cloc/raw/master/cloc
License ▲
cloc is licensed under the
GNU General Public License, v 2,
excluding portions which
are copied from other sources. Code
copied from the Regexp::Common, Win32::Autoglob, and Algorithm::Diff
Perl modules is subject to the
Artistic License.
Why Use cloc? ▲
cloc has many features that make it easy to use, thorough, extensible, and portable:
- Exists as a single, self-contained file that requires minimal installation effort---just download the file and run it.
- Can read language comment definitions from a file and thus potentially work with computer languages that do not yet exist.
- Allows results from multiple runs to be summed together by language and by project.
- Can produce results in a variety of formats: plain text, SQL, JSON, XML, YAML, comma separated values.
- Can count code within compressed archives (tar balls, Zip files, Java .ear files).
- Has numerous troubleshooting options.
- Handles file and directory names with spaces and other unusual characters.
- Has no dependencies outside the standard Perl distribution.
- Runs on Linux, FreeBSD, NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, and z/OS systems that have Perl 5.6 or higher. The source version runs on Windows with either ActiveState Perl, Strawberry Perl, Cygwin, or MobaXTerm+Perl plugin. Alternatively on Windows one can run the Windows binary which has no dependencies.
Other Counters ▲
If cloc does not suit your needs here are other freely available counters to consider:
Other references:
- QSM's directory of code counting tools.
- The Wikipedia entry for source code line counts.
Regexp::Common, Digest::MD5, Win32::Autoglob, Algorithm::Diff
Although cloc does not need Perl modules outside those found in the
standard distribution, cloc does rely on a few external modules. Code
from three of these external modules--Regexp::Common, Win32::Autoglob,
and Algorithm::Diff--is embedded within cloc. A fourth module,
Digest::MD5, is used only if it is available. If cloc finds
Regexp::Common or Algorithm::Diff installed locally it will use those
installation. If it doesn't, cloc will install the parts of
Regexp::Common and/or Algorithm:Diff it needs to temporary directories
that are created at the start of a cloc run then removed when the run is
complete. The necessary code from Regexp::Common v2.120 and
Algorithm::Diff v1.1902 are embedded within the cloc source code (see
subroutines Install_Regexp_Common()
and Install_Algorithm_Diff()
).
Only three lines are needed from Win32::Autoglob and these are included
directly in cloc.
Additionally, cloc will use Digest::MD5 to validate uniqueness among
equally-sized input files if Digest::MD5 is installed locally.
A parallel processing option, --processes=N, was introduced with
cloc version 1.76 to enable faster runs on multicored machines. However,
to use it, one must have the module Parallel::ForkManager installed.
This module does not work reliably on Windows so parallel processing
will only work on Unix-like operating systems.
The Windows binary is built on a computer that has both Regexp::Common
and Digest::MD5 installed locally.
Building a Windows Executable ▲
The Windows downloads
cloc-1.70.exe and
cloc-1.72.exe were
built with PAR::Packer
and Strawberry Perl 5.24.0.1
on an Amazon Web Services t2.micro instance running Microsoft Windows Server 2008
(32 bit for 1.70 and 1.72; 64 bit for 1.74).
Releases 1.74 through 1.84
were was built on a 32 bit Windows 7 virtual machine (IE11.Win7.For.Windows.VirtualBox.zip
pulled from https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/)
using Strawberry Perl 5.26.1.1.
The cloc-1.66.exe executable was built with PAR::Packer
on a 32 bit Windows 7 VirtualBox image
pulled from https://dev.windows.com/en-us/microsoft-edge/tools/vms/linux/
and running on an Ubuntu 15.10 host.
The virtual machine ran
Strawberry Perl version 5.22.1.
Windows executables of cloc versions
1.60 and earlier were built with
perl2exe on a 32 bit Windows
XP computer. A small modification was made to the cloc source code
before passing it to perl2exe; lines 87 and 88 were uncommented:
Is the Windows executable safe to run? Does it have malware?
Ideally, no one would need the Windows executable because they
have a Perl interpreter installed on their machines and can
run the cloc source file.
On centrally-managed corporate Windows machines, however, this
this may be difficult or impossible.
The Windows executable distributed with cloc
is provided as
a best-effort of a virus and malware-free .exe
.
You are encouraged to run your own virus scanners against the
executable and also check sites such
https://www.virustotal.com/ .
The entries for recent versions are:
cloc-1.84.exe:
https://www.virustotal.com/gui/file/e73d490c1e4ae2f50ee174005614029b4fa2610dcb76988714839d7be68479af/detection
cloc-1.82.exe:
https://www.virustotal.com/#/file/2e5fb443fdefd776d7b6b136a25e5ee2048991e735042897dbd0bf92efb16563/detection
cloc-1.80.exe:
https://www.virustotal.com/#/file/9e547b01c946aa818ffad43b9ebaf05d3da08ed6ca876ef2b6847be3bf1cf8be/detection
cloc-1.78.exe:
https://www.virustotal.com/#/file/256ade3df82fa92febf2553853ed1106d96c604794606e86efd00d55664dd44f/detection
cloc-1.76.exe:
https://www.virustotal.com/#/url/c1b9b9fe909f91429f95d41e9a9928ab7c58b21351b3acd4249def2a61acd39d/detection
cloc-1.74_x86.exe:
https://www.virustotal.com/#/file/b73dece71f6d3199d90d55db53a588e1393c8dbf84231a7e1be2ce3c5a0ec75b/detection
cloc 1.72 exe:
https://www.virustotal.com/en/url/8fd2af5cd972f648d7a2d7917bc202492012484c3a6f0b48c8fd60a8d395c98c/analysis/
cloc 1.70 exe:
https://www.virustotal.com/en/url/63edef209099a93aa0be1a220dc7c4c7ed045064d801e6d5daa84ee624fc0b4a/analysis/
cloc 1.68 exe:
https://www.virustotal.com/en/file/c484fc58615fc3b0d5569b9063ec1532980281c3155e4a19099b11ef1c24443b/analysis/
cloc 1.66 exe:
https://www.virustotal.com/en/file/54d6662e59b04be793dd10fa5e5edf7747cf0c0cc32f71eb67a3cf8e7a171d81/analysis/1453601367/
Why is the Windows executable so large?
Windows executables of cloc versions 1.60 and earlier, created with
perl2exe as noted above, are about 1.6 MB, while versions 1.62 and 1.54, created
with PAR::Packer
, are 11 MB.
Version 1.66, built with a newer version of PAR::Packer
, is about 5.5 MB.
Why are the PAR::Packer
, executables so
much larger than those built with perl2exe? My theory is that perl2exe
uses smarter tree pruning logic
than PAR::Packer
, but that's pure speculation.
Create your own executable
The most robust option for creating a Windows executable of
cloc is to use ActiveState's Perl Development Kit.
It includes a utility, perlapp
, which can build stand-alone
Windows, Mac, and Linux binaries of Perl source code.
perl2exe
will also do the trick. If you do have perl2exe
, modify lines
84-87 in the cloc source code for a minor code
modification that is necessary to make a cloc Windows executable.
Otherwise, to build a Windows executable with pp
from
PAR::Packer
, first install a Windows-based Perl distribution
(for example Strawberry Perl or ActivePerl) following their
instructions. Next, open a command prompt, aka a DOS window and install
the PAR::Packer module. Finally, invoke the newly installed pp
command with the cloc source code to create an .exe
file:
A variation on the instructions above is if you installed the portable
version of Strawberry Perl, you will need to run portableshell.bat
first
to properly set up your environment.
Basic Use ▲
cloc is a command line program that takes file, directory, and/or
archive names as inputs. Here's an example of running cloc against the
Perl v5.22.0 source distribution:
To run cloc on Windows computers, one must first open up a command (aka DOS) window and invoke cloc.exe from the command line there.
Options ▲
Recognized Languages ▲
The above list can be customized by reading language definitions from a
file with the --read-lang-def
or --force-lang-def
options.
These file extensions map to multiple languages:
cl
files could be Lisp or OpenCLcls
files could be Visual Basic, TeX or Apex Classcs
files could be C# or Smalltalkd
files could be D or dtracef
files could be Fortran 77 or Forthfnc
files could be Oracle PL or SQLfor
files could be Fortran 77 or Forthfs
files could be F# or Forthinc
files could be PHP or Pascalitk
files could be Tcl or Tkjl
files could be Lisp or Julialit
files could be PL or Mm
files could be MATLAB, Mathematica, Objective C, MUMPS or Mercuryp6
files could be Perl or Prologpl
files could be Perl or PrologPL
files could be Perl or Prologpp
files could be Pascal or Puppetpro
files could be IDL, Qt Project, Prolog or ProGuardts
files could be TypeScript or Qt Linguistui
files could be Qt or Gladev
files could be Verilog-SystemVerilog or Coq
cloc has subroutines that attempt to identify the correct language based
on the file's contents for these special cases. Language identification
accuracy is a function of how much code the file contains; .m files with
just one or two lines for example, seldom have enough information to
correctly distinguish between MATLAB, Mercury, MUMPS, or Objective C.
Languages with file extension collisions are difficult to customize with
--read-lang-def
or --force-lang-def
as they have no mechanism to
identify languages with common extensions. In this situation one must
modify the cloc source code.
How It Works ▲
cloc's method of operation resembles SLOCCount's: First, create a list
of files to consider. Next, attempt to determine whether or not found
files contain recognized computer language source code. Finally, for
files identified as source files, invoke language-specific routines to
count the number of source lines.
A more detailed description:
-
If the input file is an archive (such as a .tar.gz or .zip file),
create a temporary directory and expand the archive there using a
system call to an appropriate underlying utility (tar, bzip2, unzip,
etc) then add this temporary directory as one of the inputs. (This
works more reliably on Unix than on Windows.) -
Use File::Find to recursively descend the input directories and make
a list of candidate file names. Ignore binary and zero-sized files. -
Make sure the files in the candidate list have unique contents
(first by comparing file sizes, then, for similarly sized files,
compare MD5 hashes of the file contents with Digest::MD5). For each
set of identical files, remove all but the first copy, as determined
by a lexical sort, of identical files from the set. The removed
files are not included in the report. (The--skip-uniqueness
switch
disables the uniqueness tests and forces all copies of files to be
included in the report.) See also the--ignored=
switch to see which
files were ignored and why. -
Scan the candidate file list for file extensions which cloc
associates with programming languages (see the--show-lang
and
--show-ext
options). Files which match are classified as
containing source
code for that language. Each file without an extensions is opened
and its first line read to see if it is a Unix shell script
(anything that begins with #!). If it is shell script, the file is
classified by that scripting language (if the language is
recognized). If the file does not have a recognized extension or is
not a recognzied scripting language, the file is ignored. -
All remaining files in the candidate list should now be source files
for known programming languages. For each of these files:- Read the entire file into memory.
- Count the number of lines (= Loriginal).
- Remove blank lines, then count again (= Lnon_blank).
- Loop over the comment filters defined for this language. (For
example, C++ has two filters: (1) remove lines that start with
optional whitespace followed by // and (2) remove text between
/* and */) Apply each filter to the code to remove comments.
Count the left over lines (= Lcode). - Save the counts for this language:
- blank lines = Loriginal - Lnon_blank
- comment lines = Lnon_blank - Lcode
- code lines = Lcode
The options modify the algorithm slightly. The --read-lang-def
option
for example allows the user to read definitions of comment filters,
known file extensions, and known scripting languages from a file. The
code for this option is processed between Steps 2 and 3.
Advanced Use ▲
Remove Comments from Source Code ▲
How can you tell if cloc correctly identifies comments? One way to
convince yourself cloc is doing the right thing is to use its
--strip-comments
option to remove comments and blank lines from files, then
compare the stripped-down files to originals.
Let's try this out with the SQLite amalgamation, a C file containing all
code needed to build the SQLite library along with a header file:
The extension argument given to --strip-comments is arbitrary; here nc was used as an abbreviation for "no comments".
cloc removed over 31,000 lines from the file:
We can now compare the original file, sqlite3.c and the one stripped of
comments, sqlite3.c.nc with tools like diff or vimdiff and see what
exactly cloc considered comments and blank lines. A rigorous proof that
the stripped-down file contains the same C code as the original is to
compile these files and compare checksums of the resulting object files.
First, the original source file:
Next, the version without comments:
cloc removed over 31,000 lines of comments and blanks but did not modify the source code in any significant way since the resulting object file matches the original.
Work with Compressed Archives ▲
Versions of cloc before v1.07 required an
--extract-with=CMD
option to tell cloc how
to expand an archive file. Beginning with v1.07 this is extraction is
attempted automatically. At the moment the automatic extraction method works
reasonably well on Unix-type OS's for the following file types:
.tar.gz
,
.tar.bz2
,
.tar.xz
,
.tgz
,
.zip
,
.ear
,
.deb
.
Some of these extensions work on Windows if one has WinZip installed
in the default location (C:\Program Files\WinZip\WinZip32.exe
).
Additionally, with newer versions of WinZip, the
[http://www.winzip.com/downcl.htm](command line add-on)
is needed for correct operation; in this case one would invoke cloc with
something like
Ref. http://sourceforge.net/projects/cloc/forums/forum/600963/topic/4021070?message=8938196
In situations where the automatic extraction fails, one can try the
--extract-with=CMD
option to count lines of code within tar files, Zip files, or
other compressed archives for which one has an extraction tool.
cloc takes the user-provided extraction command and expands the archive
to a temporary directory (created with File::Temp),
counts the lines of code in the temporary directory,
then removes that directory. While not especially helpful when dealing
with a single compressed archive (after all, if you're going to type
the extraction command anyway why not just manually expand the archive?)
this option is handy for working with several archives at once.
For example, say you have the following source tarballs on a Unix machine
perl-5.8.5.tar.gz
Python-2.4.2.tar.gz
and you want to count all the code within them. The command would be
If that Unix machine has GNU tar (which can uncompress and extract in
one step) the command can be shortened to
On a Windows computer with WinZip installed in
c:\Program Files\WinZip
the command would look like
Java .ear
files are Zip files that contain additional Zip
files. cloc can handle nested compressed archives without
difficulty--provided all such files are compressed and archived in the
same way. Examples of counting a
Java .ear
file in Unix and Windows:
Differences ▲
The --diff
switch allows one to measure the relative change in
source code and comments between two versions of a file, directory,
or archive. Differences reveal much more than absolute code
counts of two file versions. For example, say a source file
has 100 lines and its developer delivers a newer version with
102 lines. Did the developer add two comment lines,
or delete seventeen source
lines and add fourteen source lines and five comment lines, or did
the developer
do a complete rewrite, discarding all 100 original lines and
adding 102 lines of all new source? The diff option tells how
many lines of source were added, removed, modified or stayed
the same, and how many lines of comments were added, removed,
modified or stayed the same.
Differences in blank lines are handled much more coarsely
because these are stripped by cloc early on. Unless a
file pair is identical, cloc will report only differences
in absolute counts of blank lines. In other words, one
can expect to see only entries for 'added' if the second
file has more blanks than the first, and 'removed' if the
situation is reversed. The entry for 'same' will be non-zero
only when the two files are identical.
In addition to file pairs, one can give cloc pairs of
directories, or pairs of file archives, or a file archive
and a directory. cloc will try to align
file pairs within the directories or archives and compare diffs
for each pair. For example, to see what changed between
GCC 4.4.0 and 4.5.0 one could do
Be prepared to wait a while for the results though; the --diff
option runs much more slowly than an absolute code count.
To see how cloc aligns files between the two archives, use the
--diff-alignment
option
to produce the file align.txt
which shows the file pairs as well
as files added and deleted. The symbols ==
and !=
before each
file pair indicate if the files are identical (==
)
or if they have different content (!=
).
Here's sample output showing the difference between the Python 2.6.6 and 2.7
releases:
A pair of errors occurred.
The first pair was caused by timing out when computing diffs of the file
Python-X/Mac/Modules/qt/_Qtmodule.c
in each Python version.
This file has > 26,000 lines of C code and takes more than
10 seconds--the default maximum duration for diff'ing a
single file--on my slow computer. (Note: this refers to
performing differences with
the sdiff()
function in the Perl Algorithm::Diff
module,
not the command line diff
utility.) This error can be
overcome by raising the time to, say, 20 seconds
with --diff-timeout 20
.
The second error is more problematic. The files
Python-X/Mac/Modules/qd/qdsupport.py
include Python docstring (text between pairs of triple quotes)
containing C comments. cloc treats docstrings as comments and handles them
by first converting them to C comments, then using the C comment removing
regular expression. Nested C comments yield erroneous results however.
Create Custom Language Definitions ▲
cloc can write its language comment definitions to a file or can read
comment definitions from a file, overriding the built-in definitions.
This can be useful when you want to use cloc to count lines of a
language not yet included, to change association of file extensions
to languages, or to modify the way existing languages are counted.
The easiest way to create a custom language definition file is to
make cloc write its definitions to a file, then modify that file:
creates the file my_definitions.txt
which can be modified
then read back in with either the --read-lang-def
or
--force-lang-def
option. The difference between the options is
former merges language definitions from the given file in with
cloc's internal definitions with cloc's taking precedence
if there are overlaps. The --force-lang-def
option, on the
other hand, replaces cloc's definitions completely.
This option has a disadvantage in preventing cloc from counting
languages whose extensions map to multiple languages
as these languages require additional logic that is not easily
expressed in a definitions file.
Each language entry has four parts:
- The language name starting in column 1.
- One or more comment filters starting in column 5.
- One or more filename extensions starting in column 5.
- A 3rd generation scale factor starting in column 5.
This entry must be provided
but its value is not important
unless you want to compare your language to a hypothetical
third generation programming language.
A filter defines a method to remove comment text from the source file.
For example the entry for C++ looks like this
C++ has two filters: first, remove lines matching
Regexp::Common's C++ comment regex.
The second filter using remove_inline is currently
unused. Its intent is to identify lines with both
code and comments and it may be implemented in the future.
A more complete discussion of the different filter options may appear
here in the future. The output of cloc's
--write-lang-def
option should provide enough examples
for motivated individuals to modify or extend cloc's language definitions.
Combine Reports ▲
If you manage multiple software projects you might be interested in
seeing line counts by project, not just by language.
Say you manage three software projects called MariaDB, PostgreSQL, and SQLite.
The teams responsible for each of these projects run cloc on their
source code and provide you with the output.
For example MariaDB team does
and provides you with the file mariadb-10.1.txt
.
The contents of the three files you get are
While these three files are interesting, you also want to see
the combined counts from all projects.
That can be done with cloc's --sum_reports
option:
The report combination produces two output files, one for sums by
programming language (databases.lang
) and one by project
(databases.file
).
Their contents are
Report files themselves can be summed together. Say you also manage
development of Perl and Python and you want to keep track
of those line counts separately from your database projects. First
create reports for Perl and Python separately:
then sum these together with
Finally, combine the combination files:
One limitation of the --sum-reports
feature is that the individual counts must
be saved in the plain text format. Counts saved as
XML, JSON, YAML, or SQL will produce errors if used in a summation.
SQL ▲
Cloc can write results in the form of SQL table create and insert
statements for use
with relational database programs such as SQLite, MySQL,
PostgreSQL, Oracle, or Microsoft SQL.
Once the code count information is in a database,
the information can be interrogated and displayed in interesting ways.
A database created from cloc SQL output has two tables,
metadata and t:
Table metadata:, Field, Type, ----------, ------, timestamp, text, project, text, elapsed_s, text, Table t:, Field, Type, ------------, --------, project, text, language, text, file, text, nBlank, integer, nComment, integer, nCode, integer, nScaled, real, The metadata table contains information about when the cloc run
was made. The --sql-append
switch allows one to combine
many runs in a single database; each run adds a
row to the metadata table.
The code count information resides in table t.
Let's repeat the code count examples of Perl, Python, SQLite, MySQL and
PostgreSQL tarballs shown in the
Combine Reports
example above, this time
using the SQL output options and the
SQLite
database engine.
The --sql
switch tells cloc to generate output in the form
of SQL table create
and insert
commands. The switch takes
an argument of a file name to write these SQL statements into, or,
if the argument is 1 (numeric one), streams output to STDOUT.
Since the SQLite command line program, sqlite3
, can read
commands from STDIN, we can dispense with storing SQL statements to
a file and use --sql 1
to pipe data directly into the
SQLite executable:
The --sql-project mariadb
part is optional; there's no need
to specify a project name when working with just one code base. However,
since we'll be adding code counts from four other tarballs, we'll only
be able to identify data by input source if we supply a
project name for each run.
Now that we have a database we will need to pass in the --sql-append
switch to tell cloc not to wipe out this database but instead add more data:
Now the fun begins--we have a database, code.db
, with lots of
information about the five projects and can query it
for all manner of interesting facts.
Which is the longest file over all projects?
sqlite3
's default output format leaves a bit to be desired.
We can add an option to the program's rc file,
~/.sqliterc
, to show column headers:
One might be tempted to also include
in ~/.sqliterc
but this causes problems when the output has more than
one row since the widths of entries in the first row govern the maximum
width for all subsequent rows. Often this leads to truncated output--not
at all desirable. One option is to write a custom SQLite output
formatter such as sqlite_formatter
, included with cloc.
To use it, simply pass sqlite3
's STDOUT into sqlite_formatter
via a pipe:
If the "Project File" line doesn't appear, add .header on
to your
~/.sqliterc
file as explained above.
What is the longest file over all projects?
What is the longest file in each project?
Which files in each project have the most code lines?
Which C source files with more than 300 lines have a comment ratio below 1%?
What are the ten longest files (based on code lines) that have no comments at all? Exclude header, .html, and YAML files.
What are the most popular languages (in terms of lines
of code) in each project?
Custom Column Output ▲
Cloc's default output is a text table with five columns:
language, file count, number of blank lines, number of comment
lines and number of code lines. The switches --by-file
,
--3
, and --by-percent
generate additional information but
sometimes even those are insufficient.
The --sql
option described in the previous section offers the
ability to create custom output. This section has a pair of examples
that show how to create custom columns.
The first example includes an extra column, Total, which is the
sum of the numbers of blank, comment, and code lines.
The second shows how to include the language name when running
with --by-file
.
Example 1: Add a "Totals" column.
The first step is to run cloc and save the output to a relational database,
SQLite in this case:
(the tar file comes from the
YAML-C++ project).
Second, we craft an SQL query that returns the regular cloc output
plus an extra column for totals, then save the SQL statement to
a file, query_with_totals.sql
:
Third, we run this query through SQLite using the counts.db
database.
We'll include the -header
switch so that SQLite prints the
column names:
The extra column for Total is there but the format is unappealing.
Running the output through sqlite_formatter
yields the desired result:
The next section,
Wrapping cloc in other scripts,
shows one way these commands can be combined
into a new utility program.
Example 2: Include a column for "Language" when running with --by-file
.
Output from --by-file
omits each file's language to save screen real estate;
file paths for large projects can be long and including an extra 20 or so
characters for a Language column can be excessive.
As an example, here are the first few lines of output using the same
code base as in Example 1:
The absence of language identification for each file
is a bit disappointing, but
this can be remedied with a custom column solution.
The first step, creating a database, matches that from Example 1 so
we'll go straight to the second step of creating the desired
SQL query. We'll store this one in the file by_file_with_language.sql
:
Our desired extra column appears when we pass this custom SQL query
through our database:
Wrapping cloc in other scripts ▲
More complex code counting solutions are possible by wrapping
cloc in scripts or programs. The "total lines" column from
example 1 of Custom Column Output
could be simplified to a single command with this shell script (on Linux):
Saving the lines above to total_columns.sh
and making it
executable (chmod +x total_columns.sh
) would let us do
to directly get
Other examples:
- Count code from a specific branch of a web-hosted
git repository and send the results as a .csv email attachment:
https://github.com/dannyloweatx/checkmarx
git and UTF8 pathnames ▲
cloc's --git
option may fail if you work with directory or
file names with UTF-8 characters (for example, see
issue 457).
The solution,
https://stackoverflow.com/questions/22827239/how-to-make-git-properly-display-utf-8-encoded-pathnames-in-the-console-window,
is to apply this git configuration command:
Your console's font will need to be capable of displaying
Unicode characters.
Third Generation Language Scale Factors ▲
cloc versions before 1.50 by default computed, for the provided inputs, a
rough estimate of how many lines of code would be needed to write the
same code in a hypothetical third-generation computer language.
To produce this output one must now use the --3
switch.
Scale factors were derived from the 2006 version of language gearing ratios
listed at Mayes Consulting web site,
http://softwareestimator.com/IndustryData2.htm, using this equation:
cloc scale factor for language X = 3rd generation default gearing ratio / language X gearing ratio
For example, cloc 3rd generation scale factor for DOS Batch = 80 / 128 = 0.625.
The biggest flaw with this approach is that gearing ratios are defined
for logical lines of source code not physical lines (which cloc counts).
The values in cloc's 'scale' and '3rd gen. equiv.' columns should be
taken with a large grain of salt.
Complex regular subexpression recursion limit ▲
cloc relies on the Regexp::Common module's regular expressions to remove
comments from source code. If comments are malformed, for example the
/*
start comment marker appears in a C program without a corresponding */
marker, the regular expression engine could enter a recursive
loop, eventually triggering the warning
Complex regular subexpression recursion limit
.
The most common cause for this warning is the existence of comment markers
in string literals. While language compilers and interpreters are smart
enough to recognize that "/*"
(for example) is a string and not a comment,
cloc is fooled. File path globs, as in this line of JavaScript
are frequent culprits.
In an attempt to overcome this problem, a different
algorithm which removes comment markers in strings can be enabled
with the --strip-str-comments
switch. Doing so, however,
has drawbacks: cloc
will run more slowly and the output of --strip-comments
will contain strings that no longer match the input source.
Limitations ▲
Identifying comments within source code is trickier than one might expect.
Many languages would need a complete parser to be counted correctly.
cloc does not attempt to parse any of
the languages it aims to count and therefore is an imperfect tool.
The following are known problems:
If you suspect your code has such strings, use the switch
--strip-str-comments
to switch to the algorithm which removes
embedded comment markers. Its use will render the five lines above as
and therefore return a count of five lines of code.
See the
previous section
on drawbacks to using --strip-str-comments
.
Requesting Support for Additional Languages ▲
If cloc does not recognize a language you are interested in counting,
create a GitHub issue
requesting support for your language. Include this information:
Reporting Problems ▲
If you encounter a problem with cloc, first check to see if
you're running with the latest version of the tool:
If the version is older than the most recent release
at https://github.com/AlDanial/cloc/releases, download the
latest version and see if it solves your problem.
If the problem happens with the latest release, submit
a new issue at https://github.com/AlDanial/cloc/issues only
if you can supply enough information for anyone reading the
issue report to reproduce the problem.
That means providing
Problem reports that cannot be reproduced will be ignored and
eventually closed.
Acknowledgments ▲
Wolfram Rösler provided most of the code examples in the test suite.
These examples come from his Hello World collection.
Ismet Kursunoglu found errors with the MUMPS counter and provided
access to a computer with a large body of MUMPS code to test cloc.
Tod Huggins gave helpful suggestions for the Visual Basic filters.
Anton Demichev found a flaw with the JSP counter in cloc v0.76
and wrote the XML output generator for the --xml
option.
Reuben Thomas pointed out that ISO C99 allows //
as a comment
marker, provided code for the --no3
and --stdin-name
options, counting the m4 language,
and suggested several user-interface enhancements.
Michael Bello provided code for the --opt-match-f
,
--opt-not-match-f
,
--opt-match-d
, and --opt-not-match-d
options.
Mahboob Hussain inspired the --original-dir
and
--skip-uniqueness
options, found a
bug in the duplicate file detection logic and improved the JSP filter.
Randy Sharo found and fixed an uninitialized variable bug for shell
scripts having only one line.
Steven Baker found and fixed a problem with the YAML output generator.
Greg Toth provided code to improve blank line detection in COBOL.
Joel Oliveira provided code to let --exclude-list-file
handle
directory name exclusion.
Blazej Kroll provided code to produce an XSLT file, cloc-diff.xsl
,
when producing XML output for the --diff
option.
Denis Silakov enhanced the code which generates cloc.xsl
when
using --by-file
and --by-file-by-lang
options, and
provided an XSL file that works with --diff
output.
Andy (awalshe@sf.net) provided code to fix several bugs:
correct output of --counted
so that only files that are used in the code count appear and
that results are shown by language rather than file name;
allow --diff
output from multiple runs to be summed
together with --sum-reports
.
Jari Aalto created the initial version of cloc.1.pod
and
maintains the Debian package for cloc.
Mikkel Christiansen (mikkels@gmail.com) provided counter definitions
for Clojure and ClojureScript.
Vera Djuraskovic from Webhostinggeeks.com
provided the
Serbo-Croatian
translation.
Gill Ajoft of Ajoft Softwares
provided the
Bulgarian
translation.
The
Knowledge Team
provided the
Slovakian translation.
Erik Gooven Arellano Casillas provided an update to the MXML counter to
recognize Actionscript comments.
Gianluca Casati created the
cloc CPAN package.
Mary Stefanova provided the
Polish
translation.
Ryan Lindeman implemented the --by-percent
feature.
Kent C. Dodds, @kentcdodds,
created and maintains the npm package of cloc.
Viktoria Parnak
provided the
Ukrainian
translation.
Natalie Harmann provided the
Belarussian
translation.
Nithyal at Healthcare Administration Portal
provided the
Tamil
translation.
Patricia Motosan
provided the
Romanian
translation.
The Garcinia Cambogia Review Team
provided the
Arabic translation.
Gajk Melikyan provided the
provided the
Armenian translation
for http://studybay.com.
Hungarian translation
courtesy of Zsolt Boros.
Sietse Snel implemented the parallel
processing capability available with the --processes=N
switch.
The development of cloc was partially funded by the Northrop Grumman
Corporation.
Copyright ▲
Copyright (c) 2006-2018, Al Danial