Description
I've recently come into a state of being where my thoughts can be jumbled and I am
constantly forgetting things. I have this page here to deal with the consequences of these
newfound afflictions. Let's hope I don't forget about this page, though.
Please note that if the writing seems raw, it is. I'm trying to get the thoughts down in
writing.
The thoughts
Odds and Ends
-
Software engineering is simply a game by which players attempt to find the best
mapping between some real-world thing and a programming language; preferably a
bijective one.
Ranking Text for Relevance vis-a-vis Newman-Girvan Modularity
Log:
11/7/2011: Over the summer of 2011 I participated in some research to develop
algorithms that use node attributes in addition to link information to find community
structure in networks. Like any good research, it gets you thinking about the subject matter
outside of your normal allocation of time to it, for me, usually while doing everyday tasks.
Given the community structure tilt, I read a lot of Mark Newman,
specifically regarding a particular metric of community struture that he developed with
Michelle Girvan, modularity. I began thinking about how modularity
compares the expected occurance of edges with the observed occurance, but a tinsy bit more
generally, which then led me to the applicability to text relevance-ranking.
1/18/2012: Continuing what I didn't finish above: It seems like the idea about the
difference between expected and observed and its indication of SIGNIFICANCE can be applied
in other arenas. Being a frequenter of www.google.com, the ideas regarding searching for
texts by keywords often comes to mind, and of course one day, those thoughts and my more
general expected-vs-observed thoughts of Newman-Girvan met. A light bulb turned on.
This modularity can be applied to searching for documents in a body of documents. I guess we
just call this "search" now, given the prevelance of search engines. Anyways, the
frequencies of each word can be calculated both on a per document basis and over the entire
corpus. The former would act as the observed frequency, and the latter as the expected (that
took me a little bit of time to figure out). Using both we can calculate a modularity for
each word per document, which will somehow be used as the building blocks for a rank of each
document for a given search phrase; some sort of aggregation of the modularities of the
words in the search phrase.
Branching/Versioning Strategies
Log:
08/27/2011: I've been thinking about the correct way to handle branching and
versioning of software from the perspective of using apache maven. This could have been
sparked by a recent incident at GSI where we realized the way we have gone about branching
and inherent merging of trunk changes is a little flawed.
Priciples
Some principles, some might say 'commandments', that seem like they 'make sense', or at
least have some sort of good structure to me are listed here.
-
Versioning format should be a tri-numbering system; A.B.C for instance. We'll call
numbers in the A-slot a major version, in the B-slot a minor version, and in the
C-slot, a patch version. As such, increment any of A, B, or C will be referred to as
a major, minor, or patch release, respectively.
-
This is simply because, it seems, that development (and here you may see my
inexperience come into play) breaks down into large-and-or-backwards-incompatible
changes, small-and-backwards-compatible-changes, and
backwards-compatible-bug-fixes.
-
With this in mind, we respetively map the version numbers A, B, and C above to
the three previously mentioned containers. The result is that differences in
patch versions may only differ by backwards compatible defect fixes, differences
in minor versions may only differ by at most small and backwards compatible
changes, and differences in major version may only differ by at most, large
and/or backwards incompatible changes; i.e. any change. Note too that if you are
changing a version number, then there should be a change contained within it
that respects the change in the version number.
That being said, one could put a limit on the number of small changes that can
go into a minor release, demanding that a major release occur at that limit.
There probably should be no limit to defect fixes.
-
Unfortunately this assumes that we have a definition of large/small changes and
backwards compatibility. This is for you to decide, really, but nonetheless I
like to think of large changes as new features and small ones as enhancements to
current features with shades of gray. I have no real definition for backwards
compatible changes, other than the stupid/obvious 'it shouldn't disallow other
software from using it upon upgrade'. Definitions of backwards compatibility
seems like it could vary highly by language choice, although I have yet to
confirm this to myself.
-
Branching should be done lazily. This has to be the case when branching for
projects, but a branch for a patch should never be made until a defect fix needs to
go into the source, and likewise for a minor release branch and small changes.
-
Branches that are created for patch releases or minor releases should only be made
from tags, never in-development snapshots. This wasn't exactly plain-as-day to me at
first, so I figured I'd mention it.
-
The branch naming convention and version naming convention examples follow.
Branching release A.B.C for patches should result in a SCM branch with name
"A.B.x" and should have version "A.B.(C+1)-SNAPSHOT". Likewise branches for
minor releases should have name of "A.x" and version "A.(B+1)-SNAPSHOT".
-
In general, versions that are X1.X2.X3-SNAPSHOT are the development for a
target of X1.X2.X3 version release, with the understanding that right most
zeros of the versions are left off. Hence the trunk should never have an
actual version number in its version other than the next major release
number, e.g. 10-SNAPSHOT.
-
When a defect is found, it should be traced back as far as possible, or at least, as
far back as required/desired. Once a solution is determined, it should be committed
to that "far-back place" and then merged to every current patch/minor release
branch.
-
Branches that are being cut for projects should have a branch name in all lowercase
with words being separated by hyphens. The name should respect the project,
descriptively. Its version should be 'X-branch-name-SNAPSHOT' where X is the planned
release version. For example a project that is aiming for version 4.0.0 and is
tasked with integrating apache tiles into the project could have a patch name of
'tiles-integration' and a version number of 2-tiles-integration-SNAPSHOT. Note the
non-specification of rightmost zeros, like above.