Description

I've recently come into a state of being where my thoughts can be jumbled and I am constantly forgetting things. I have this page here to deal with the consequences of these newfound afflictions. Let's hope I don't forget about this page, though.
Please note that if the writing seems raw, it is. I'm trying to get the thoughts down in writing.

The thoughts

Odds and Ends

Software engineering is simply a game by which players attempt to find the best mapping between some real-world thing and a programming language; preferably a bijective one.

Ranking Text for Relevance vis-a-vis Newman-Girvan Modularity

Log:

11/7/2011: Over the summer of 2011 I participated in some research to develop algorithms that use node attributes in addition to link information to find community structure in networks. Like any good research, it gets you thinking about the subject matter outside of your normal allocation of time to it, for me, usually while doing everyday tasks. Given the community structure tilt, I read a lot of Mark Newman, specifically regarding a particular metric of community struture that he developed with Michelle Girvan, modularity. I began thinking about how modularity compares the expected occurance of edges with the observed occurance, but a tinsy bit more generally, which then led me to the applicability to text relevance-ranking.

1/18/2012: Continuing what I didn't finish above: It seems like the idea about the difference between expected and observed and its indication of SIGNIFICANCE can be applied in other arenas. Being a frequenter of www.google.com, the ideas regarding searching for texts by keywords often comes to mind, and of course one day, those thoughts and my more general expected-vs-observed thoughts of Newman-Girvan met. A light bulb turned on.
This modularity can be applied to searching for documents in a body of documents. I guess we just call this "search" now, given the prevelance of search engines. Anyways, the frequencies of each word can be calculated both on a per document basis and over the entire corpus. The former would act as the observed frequency, and the latter as the expected (that took me a little bit of time to figure out). Using both we can calculate a modularity for each word per document, which will somehow be used as the building blocks for a rank of each document for a given search phrase; some sort of aggregation of the modularities of the words in the search phrase.

Branching/Versioning Strategies

Log:

08/27/2011: I've been thinking about the correct way to handle branching and versioning of software from the perspective of using apache maven. This could have been sparked by a recent incident at GSI where we realized the way we have gone about branching and inherent merging of trunk changes is a little flawed.

Priciples

Some principles, some might say 'commandments', that seem like they 'make sense', or at least have some sort of good structure to me are listed here.

Versioning format should be a tri-numbering system; A.B.C for instance. We'll call numbers in the A-slot a major version, in the B-slot a minor version, and in the C-slot, a patch version. As such, increment any of A, B, or C will be referred to as a major, minor, or patch release, respectively.

This is simply because, it seems, that development (and here you may see my inexperience come into play) breaks down into large-and-or-backwards-incompatible changes, small-and-backwards-compatible-changes, and backwards-compatible-bug-fixes.
With this in mind, we respetively map the version numbers A, B, and C above to the three previously mentioned containers. The result is that differences in patch versions may only differ by backwards compatible defect fixes, differences in minor versions may only differ by at most small and backwards compatible changes, and differences in major version may only differ by at most, large and/or backwards incompatible changes; i.e. any change. Note too that if you are changing a version number, then there should be a change contained within it that respects the change in the version number.
That being said, one could put a limit on the number of small changes that can go into a minor release, demanding that a major release occur at that limit. There probably should be no limit to defect fixes.
Unfortunately this assumes that we have a definition of large/small changes and backwards compatibility. This is for you to decide, really, but nonetheless I like to think of large changes as new features and small ones as enhancements to current features with shades of gray. I have no real definition for backwards compatible changes, other than the stupid/obvious 'it shouldn't disallow other software from using it upon upgrade'. Definitions of backwards compatibility seems like it could vary highly by language choice, although I have yet to confirm this to myself.

Branching should be done lazily. This has to be the case when branching for projects, but a branch for a patch should never be made until a defect fix needs to go into the source, and likewise for a minor release branch and small changes.
Branches that are created for patch releases or minor releases should only be made from tags, never in-development snapshots. This wasn't exactly plain-as-day to me at first, so I figured I'd mention it.
- The branch naming convention and version naming convention examples follow. Branching release A.B.C for patches should result in a SCM branch with name "A.B.x" and should have version "A.B.(C+1)-SNAPSHOT". Likewise branches for minor releases should have name of "A.x" and version "A.(B+1)-SNAPSHOT".
- In general, versions that are X1.X2.X3-SNAPSHOT are the development for a target of X1.X2.X3 version release, with the understanding that right most zeros of the versions are left off. Hence the trunk should never have an actual version number in its version other than the next major release number, e.g. 10-SNAPSHOT.
When a defect is found, it should be traced back as far as possible, or at least, as far back as required/desired. Once a solution is determined, it should be committed to that "far-back place" and then merged to every current patch/minor release branch.
Branches that are being cut for projects should have a branch name in all lowercase with words being separated by hyphens. The name should respect the project, descriptively. Its version should be 'X-branch-name-SNAPSHOT' where X is the planned release version. For example a project that is aiming for version 4.0.0 and is tasked with integrating apache tiles into the project could have a patch name of 'tiles-integration' and a version number of 2-tiles-integration-SNAPSHOT. Note the non-specification of rightmost zeros, like above.