Description

I've recently come into a state of being where my thoughts can be jumbled and I am constantly forgetting things. I have this page here to deal with the consequences of these newfound afflictions. Let's hope I don't forget about this page, though.
Please note that if the writing seems raw, it is. I'm trying to get the thoughts down in writing.

The thoughts

Odds and Ends

  1. Software engineering is simply a game by which players attempt to find the best mapping between some real-world thing and a programming language; preferably a bijective one.

Ranking Text for Relevance vis-a-vis Newman-Girvan Modularity

Log:

11/7/2011: Over the summer of 2011 I participated in some research to develop algorithms that use node attributes in addition to link information to find community structure in networks. Like any good research, it gets you thinking about the subject matter outside of your normal allocation of time to it, for me, usually while doing everyday tasks. Given the community structure tilt, I read a lot of Mark Newman, specifically regarding a particular metric of community struture that he developed with Michelle Girvan, modularity. I began thinking about how modularity compares the expected occurance of edges with the observed occurance, but a tinsy bit more generally, which then led me to the applicability to text relevance-ranking.
 
1/18/2012: Continuing what I didn't finish above: It seems like the idea about the difference between expected and observed and its indication of SIGNIFICANCE can be applied in other arenas. Being a frequenter of www.google.com, the ideas regarding searching for texts by keywords often comes to mind, and of course one day, those thoughts and my more general expected-vs-observed thoughts of Newman-Girvan met. A light bulb turned on.
This modularity can be applied to searching for documents in a body of documents. I guess we just call this "search" now, given the prevelance of search engines. Anyways, the frequencies of each word can be calculated both on a per document basis and over the entire corpus. The former would act as the observed frequency, and the latter as the expected (that took me a little bit of time to figure out). Using both we can calculate a modularity for each word per document, which will somehow be used as the building blocks for a rank of each document for a given search phrase; some sort of aggregation of the modularities of the words in the search phrase.

Branching/Versioning Strategies

Log:

08/27/2011: I've been thinking about the correct way to handle branching and versioning of software from the perspective of using apache maven. This could have been sparked by a recent incident at GSI where we realized the way we have gone about branching and inherent merging of trunk changes is a little flawed.

Priciples

Some principles, some might say 'commandments', that seem like they 'make sense', or at least have some sort of good structure to me are listed here.
  1. Versioning format should be a tri-numbering system; A.B.C for instance. We'll call numbers in the A-slot a major version, in the B-slot a minor version, and in the C-slot, a patch version. As such, increment any of A, B, or C will be referred to as a major, minor, or patch release, respectively.
  2. Branching should be done lazily. This has to be the case when branching for projects, but a branch for a patch should never be made until a defect fix needs to go into the source, and likewise for a minor release branch and small changes.
  3. Branches that are created for patch releases or minor releases should only be made from tags, never in-development snapshots. This wasn't exactly plain-as-day to me at first, so I figured I'd mention it.
  4. When a defect is found, it should be traced back as far as possible, or at least, as far back as required/desired. Once a solution is determined, it should be committed to that "far-back place" and then merged to every current patch/minor release branch.
  5. Branches that are being cut for projects should have a branch name in all lowercase with words being separated by hyphens. The name should respect the project, descriptively. Its version should be 'X-branch-name-SNAPSHOT' where X is the planned release version. For example a project that is aiming for version 4.0.0 and is tasked with integrating apache tiles into the project could have a patch name of 'tiles-integration' and a version number of 2-tiles-integration-SNAPSHOT. Note the non-specification of rightmost zeros, like above.