Search results for category: foss
Language learning on FOSS vs OSX
My primary desktop is OSX and for the most part, it suits my purpose well. 99% of the time, everything just works, there are few hassles and things behave the way you expect them to. The Ubuntu releases are close but not quite there yet for me.
As great a desktop as OSX is though, it only shines in the most commonly used functions. In many niche areas, FOSS can provide superior solutions by virtue of the huge variety of applications, developers and freedom to develop on it. One such area that comes to mind these days is language learning.
I'm barely able to read in Chinese and trying to work my way through a website or document is pretty much impossible without a dictionary. I particularly need a dictionary that can immediately translate words I highlight on screen. On OSX, the options are pretty limited. Most solutions are shareware or paid software. Right now I'm using TranslateIt! which works fairly well. I highlight the word(s) I want translated, hit a keyboard combination and TranslateIt! pops up, hopefully with the translations. It is one extra step though. The paid version for TranslateIt! includes the functionality to get immediate translation after I highlight it.
My colleagues on FOSS desktops don't need to pay for this. StarDict comes with most distributions and does the highlight/translate thing right out of the box. It's invaluable in a mixed Chinese/English environment like Exoweb. Besides the software, the power of FOSS shows up in StarDict's dictionaries, which are varied and extremely useful. So much so that TranslateIt! actually uses StarDict dictionaries and all the translations I am using are StarDict's. Without StarDict, TranslateIt! would actually be useless to me.
My favorite StarDict dictionaries include (all zh_CN -> en):
- cedict-gb dictionary (has pinyin and tone marks. Must have)
- langdao-ce-gb (a much larger vocabulary but translations sometimes not precise. No pinyin)
- Chinese idiom dictionary (a dictionary translating chinese idioms. Unfortunately the translations are chinese->chinese and some colleagues have said its translations are suspect)
Two are under the GPL while the cedict library is under its own license similar to cc-by-nc.
While I'm sticking to TranslateIt! until such time StarDict works natively under OSX, I would simply be unable to read anything at all in Chinese without StarDict's dictionaries.
What Happens When You Turn Fsync Off On Postgres
We use the PostgreSQL database extensively to handle a fairly large amount of data. Our largest single database is over 25G in size, with a fair amount of transactions going through it daily. As such, we've had to do a lot of optimization over time. One of our experiments was turning fsync off on one of our non-critical databases. In retrospect, this probably was not that great an idea ...
This database was a non-critical but fairly write intensive database. It logged a lot of information, largely in the form of inserts. Inserts in postgres can be a bit slow sometimes since a insertions tends to lock the same section of the index until the insert is complete, forcing all inserts to go in sequence. Updates are usually a lot faster if you're updating different rows since they don't all rely on the same section of the index and can often be done simultaneously.
The fsync option slows this down even further, since postgres then waits for the data to be flushed to disk successfully before continuing on with the next operation. Not a problem for low traffic databases but if you attempt to insert hundreds of transactions a second, the milliseconds spent waiting for the disk to write the data completely really hurts. fsync ensures data integrity but at the price of speed, especially in the case of unexpected power failure.
Since this was a non-critical database and losing data wasn't really a problem (we could either recreate it or live without it), we turned fsync off on this database. All went well for months, until we actually did suffer a power failure. During the busiest period possible. Good old Murphy.
At any rate, once we brought everything back up, things seemed to work as usual ... for about 30 minutes. Then we realized our servers were frequently losing connection to this particular database. Investigations revealed that the postgres processes were terminating themselves with messages like "Error: out of memory" or complaints about data inconsistency. Yep, we got our first corrupted Postgres database. The first one I've encountered in over 7 years of using this database.
I have to admit, I had very little clue on how to recover a corrupted database and each database was corrupted slightly differently. Initially it appeared only the indexes were damaged and a reindex removed most of the problems. Later we found that there was some damage to the tables themselves (took a long time to find that) and we attempted to restore through a backup. The Write Ahead Log (WAL) backup proved to be useless. Those were corrupted or inconsistent. Strangely enough, the database could still do a pg_dump, so we just dumped out all the data and reloaded it back in the database. This ultimately fixed everything.
Morale of story - don't turn fsync off unless you really know what you're doing, including how to detect database corruption and fix it. Our biggest problem was that postgres, unlike MySQL, does not scream "Table/database corruption!" immediately. It took us a while to determine what the problem was. Then again, unless you turn off fsync, it is probably something that almost never happens on postgres. I've had tons of corrupted MySQL databases. This is my first corrupted postgres database.
The Writing On The Wall ...
Just a quickie on a thought that struck me while responding to a friend's email - too many students are not aware of the quiet FOSS wave that is happening under the surface, particularly here in China. They are still focusing their learning efforts on Java, C# and Windows without a thought to picking up basic FOSS skills.
Yet if you pay attention, you can see the trend. From today's news, HP made $25 million in sales directly related to Debian GNU/Linux. This is just one Linux distribution mind you, and one with no commercial company or advertising budget behind it. Exoweb is slightly responsible for that as we (and our clients) bought more than $15k of servers from them last year. :)
In our daily work integrating with third party providers, we are seeing more and more FOSS based products. Not all are fully FOSS based - many are WIMP (Windows, IIS, MySQL and PHP). This FOSS trend is especially noticeable from startup companies or companies with limited legacy infrastructure. We love this of course, since they are typically easier for us to integrate, host and maintain. Yes, we (or our clients) do pay a fair amount for these products so they are making a lot of money while being FOSS based.
My point? For anyone who really wants to be in the tech industry, FOSS skills, no matter how basic, will give you a valuable advantage. Otherwise you will be locking yourself out of a rapidly growing portion of the industry. Besides, as a craftsman, wouldn't you want to know every tool available? Otherwise how can you use the right tool for the job?
Circular Dependencies When Upgrading Debian Testing (Etch)
With an office of 30+ users who run debian testing on their desktops, it's not a big surprise that any problems with debian testing can really come and bite us. Recently, a few developers who had been particularly slow with their upgrading hit a really bad circular dependency bug that basically stopped their upgrade cold in the water and prevented them from going any further. The bug in question is the initramfs-tools, kernel 2.6 and udev circular dependency.
The main problem is that udev requires a _running_ >= kernel 2.6.12 (soon to be >= kernel 2.6.15) to even be installed. It is not enough that you are just about to install the kernel. You must be running the latest kernel, which means the kernel must already be installed. The kernels on the other hand, depend on initramfs-tools .. which depends on udev. So udev will not install until you are running a kernel >= 2.6.12 but you cannot install those kernels unless udev is installed ... ouch.
Those who upgraded frequently enough hit that sweet spot when the latest debian kernel was 2.6.12 but did not require udev, so it could all be installed just fine. It did require a reboot after installing the kernel to install udev, as documented in the notes, but it was possible to continue. Those who took too long, or fixed their kernel to a particular version for various reasons eventually hit this bug when they did upgrade.
In the end, a few of the developers that were not quite so familiar with debian ended up reinstalling their system from scratch (debian testing install CD drops a >= 2.6.12 kernel in right away, avoiding the problem). There is a way to break this circular dependency without reinstalling though.
The 2.6.15 kernel (and possibly earlier versions as well. Did not check) does not absolutely require initramfs-tools. It is only the default option. Running dpkg -I on a kernel package shows:
Package: linux-image-2.6.15-1-k7 Version: 2.6.15-4 Section: base Priority: optional Architecture: i386 Depends: module-init-tools (>= 0.9.13), initramfs-tools | yaird | linux-initramfs-tool
linux-initramfs-tools is a virtual package, so useless for us there. However, yaird is also an acceptable dependency. The solution then is to install yaird first, removing initramfs-tools, then install the rest of the mess (linux-image-2.6.15-1-x, udev).
Users of kernels that are too old may still be out of luck though, as hints given in the debian bug report suggests that even yaird requires a not too old 2.6 kernel.
Ah well. It is an unstable time again in debian testing, after the relative calm while sarge was being prepared for debian stable. There are quite a few circular dependencies now and people are reporting problems upgrading. In some cases, those upgrading from rather old debian sarge systems to the latest testing report that their desktop environments have become flaky (gnome and kde both). Switching to the other desktop, or purging/reinstalling those desktops seems to fix things.
Amazingly though, KDE 3.5.1 has made it into debian testing a mere 20 days (or less, I only noticed it today) after its official release. Certainly not the slow debian days anymore.
Public Key Missing in Apt
Just a quick blog about some key wierdness in the debian testing apt-get setup. Not sure where the exact problem stems from, but since the new year started, all debian updates are signed with the 2006 gpg key, which my testing systems did not seem to have. So you would end up with this error after doing an update:
Get:1 http://box.exoweb.net testing Release.gpg [378B]
...
Fetched 2810kB in 24s (114kB/s)
Reading package lists... Done
W: GPG error: http://box.exoweb.net testing Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 010908312D230C5F
The problem being that the public key at the end is not recognized. Looking at the key management utility for apt (apt-key) didn't show any simple way for it to download the correct key from the debian keyring, so I ended up having to use a bit of a kludge. These were the commands I had to run (as root):
gpg --keyserver keyring.debian.org --recv-key 2D230C5F
gpg --armor --export 2D230C5F | apt-key add -
The first line downloads the public key and adds it to the root user's list of public keys. The command exports this from the root user's keylist to apt-key. The cleanest way to do this would probably be to use wget to get the actual key from its appropriate location, then pipe it to apt-key (it would be a one liner too). However, that is clunkier to do since one has to look up the appropriate location of the key, etc. In the end, adding just one public key to root's keyring was no real deal.
Ah well, back to your regularly scheduled hacking ...
Year End Thoughts
With 4 hours left to the end of 2005, it's time to reflect on how the year has gone and prepare for the coming of the new year. Overall, it has been a good year for me, though it has felt like I've been hanging on for dear life on to a runaway train. For FOSS, it has been a great year it continues to grow by leaps and bounds.
The greatest challenge in 2005 was for Exoweb to find and integrate good people into the team as we ramped up to satisfy client demands. Exoweb grew from an 8 man outfit when I joined in 2004 to a 34 person company today, and is still growing rapidly. We would actually be larger if we had not had such a tough time hiring good people. In the last year, we have streamlined our HR process, allowing us to screen 10 times the number of candidates we could previously, with minimal impact to the daily operations of Exoweb.
Beyond just increasing bodies though, Exoweb feels so much better a place to work in now. It was a nice place before that, but in the last year, we have managed to strengthen the company culture, added a bunch of really smart people and started processes to ensure that we are constantly improving. I can honestly say that the current Exoweb team is the smartest, most competent tech team I have had the pleasure of working with. It is both a joy and a challenge to work with intelligent people who are far more knowledgeable than you in their areas of expertise. We may or may not have the stellar brain matter that Google is reputed to have, but the current team can definitely give any other team a real challenge.
2006 will bring its own challenges, no doubt. The company culture is relatively young and it will be challenged quite a bit as it tries to accommodate the changing desires, needs and eccentricities of our growing and maturing developers.
Another thing that I am proud of is that Exoweb's contributions back to FOSS projects are growing. While we mostly filed bug reports in in 2003 and before, we started contributing code to small projects in 2004 and that trend has only accelerated in 2005. Since we became active users of django, several patches have been accepted into the main trunk and we should hopefully be contributing even greater functionality soon. We have patches in various other small projects such as EaseXML (formerly XMLObject), and identified performance improvements in projects such as PostgreSQL. We recently instituted a contribute-back policy, where developers can spend up to 10% of their working hours on FOSS projects, just contributing back to the community that makes our business possible.
Incidentally, the usage of FOSS is on the rise in China, even if the community does not appear to be that visible as yet. A growing percentage of the candidates going through our HR process are listing FOSS skills and projects on their resumes. More and more companies are using FOSS technologies in their daily work. We are also getting more business inquiries specifically seeking our FOSS skill set.
Finally, every competent FOSS person I know is fully employed and in huge demand. I know, because I tried to poach every single one not working in Exoweb :). For those who kept asking, "how can you find a job with FOSS skills?" a year or two ago ... HAH! Everyone I know has options - if they were not happy where they are right now, they could find a new job so very easily.
It has been a good, busy year. I am really looking forward to 2006 - more challenges, more growth and hopefully a bit more free time to relax and really play with technology again.
Goodbye apt-get
So it's finally time to leave apt-get, at least on my desktop machine, and go down the path of bsd and ports. I gave up on fink for my software packaging needs and switched to OpenDarwin instead. The main reason for the choice was that OpenDarwn has a larger and more updated package list, so it requires less manual downloading and package dependency resolving. The small price of learning a slightly different controlling program (port vs apt-get) was more than outweighed by the ease of typing 'sudo port install x' and just walking off and letting it do all the work. The lack of a central repository (OpenDarwin downloads the tarballs directly from the original site, not from a central repository) makes it difficult to use a repository cache, but I guess that's irrelevant since I am the only one in the office that uses OpenDarwin anyway.
Also took care of the clunky OpenOffice issue by installing NeoOfficeJ. My colleages have not had good experiences with the older versions but the latest version does not seem too bad. Fires up relatively quickly and seems stable, but I must admit that I have not used it too extensively. Will know more later.
One minor annoyance with OSX are the emacs keybindings, or the lack of two of them. Almost all the commonly used ones are here, even the obscure ones. However, there are two that I use very frequently that are not found in the text editor programs - Meta-f and Meta-b. These go forward and back one word, respectively. I have ctrl-f and ctrl-b, which move one letter at a time, but I have gotten very used to the one word movements that I seriously feel the lack without them. In terminal, all the Meta-x keys are done via option-x, and in text editors, some of this is there. option-backspace is backspace one whole char, the same as Meta-backspace in emacs. However, the option-f and option-b keys produce some strange characters, instead of replicating their Meta-x behaviours. Ah well. At least option-arrow key still moves on word at a t ime, but I'm too lazy to move my entire hand the few cm required to hit them :)
One other note - never use transparencies on anything you want a fast response time from. I had a slight transparency going on my terminals and was wondering why text-mode emacs was moving dog-slow. Once the transparency was shut off, it was back the good old speeds you have come to expect and love from text terminals. Even with the high-end graphics cards that all these machines come with, transparency is still very expensive on the CPU
Powerbook day 5
So it has been 5 days with this nice little bit of hardware and I am liking it greatly. It occurs to me though, that my blogs do not reflect this. I am more likely to point out its shortcomings and difficulties than all the things I love about it. So I thought I'd spend this blog writing about it. I'll try to gloss over the stuff that most other people rave about and focus on what is special about it to me as a full-time developer.
Ok, the usual stuff people rave about:
- Beautiful, well thought out design
- It Just Works!
- Consistent UI
- Sleeps and recovers in less than 4 seconds
- High quality hardware
The other stuff that makes me use it as my primary workstation, despite its currently lower performance:
1. Power users welcome
While Apple may have been one of the first to properly implement a GUI and has one of the slickest interfaces around, the mouse is almost totally optional! People like me who like to work primarily through the keyboard interface can actually navigate around purely with the keyboard, using the mouse very rarely. For me, this is a significant speed increase as keyboard shortcuts are far faster than using the mouse. Additionally, because the key bindings are very similar to emacs (my main development environment), everything is comfortable and familiar.
One other cool thing about emacs key bindings is the very limited hand movement. You can reach just about all the keys without moving your hand more than a centimeter from standard rest position. You do not even have to move your hands the ~10 cm required to hit the arrow keys (3-4 cm on a laptop), thereby not disrupting your typing speed. Yes, I am anal about these things. But it does make a serious productivity difference to me.
2. *nix base
While the flashy GUI protects new users from having to deal with the guts and internals of the OS, you can still pop open the hood as necessary. Fire up the terminal program and you can see which processes are the memory/cpu hogs, what crazy programs are running riot, etc. You can fire up your trusty shell scripts to perform routine tasks, run all the *nix applications which we know and love, etc. Get down to the OS level and figure out exactly why that peripheral is not working, if you can.
Granted, this is no different from any *nix environment, but it is a *nix environment with few of the warts. In many ways, it feels like a *nix desktop done _right_.
3. It Just Works!
Ok, so I ran out of other points. But this is a point I cannot emphasize enough. Almost everything works without problems, in exactly the way you would expect it to in a perfect world. One click/keypress often produces the results you are looking for. Things run effortlessly. You waste so little fighting the GUI/OS and so much time just being productive. All the little annoyances that bugged you, be they in the Linux, BSD or Windows world, seem to be magically gone here. No need to spend days or weeks tweaking/configuring it.
Ultimately, the powerbook and osx is what I consider the ultimate combination - the simplicity that Windows promised, the power and security of *nix, all combined in one very well designed package. It is a testament to what attention to detail and usability can do. The hardware, while of high quality, is commodity. Any other company can manufacture materials of comparable quality. However, no one can put it together, hardware and software, the way Apple has done with their computers or iPod.
Powerbook Day 4
It is closing in to my 96th hour with the powerbook and it has been a fun experience so far. The first 48 hours was when I had to do the bulk of my learning - getting used to the "Mac way", installing applications and waiting for stuff to download. After that, it has been mostly smooth sailing.
The only issues I've encountered so far is the relative slowness of the ppc chip and OpenOffice.org clunkiness. The OpenOffice.org clunkiness is just because the standard installation uses X11 and is not well integrated with the rest of the desktop at all. Things like one click in email to open a document just do not work. It is a cumbersome multi step process. I'm currently downloading NeoOfficeJ (based on OOo) but the slow download speeds in China really takes a while. Still, not a big deal.
The more serious issue is the performance of the system for number crunching. After I had set up my development environment, I was surprised to find that CPU intensive unit tests in my projects took twice as long as they did on my old laptop. Functional tests (fire up all the servers, do black box testing using simulated browsers) took 4-5 times longer. This makes the powerbook substandard as a development station.
A simple test of this is to run code like this:
a = 1
for i in range(1,1000000):
a = (a + i)/i
On my laptop and workstation (1.5Ghz AMD Athlon, underpowered by today's standards), this takes 0.8 seconds to run. On the default python program on OS X, this takes 1.3 seconds! Using the version of python obtained from fink, this drops to 1.1 seconds, but still far slower than the Athlon systems. This is all using python 2.3.5.
Have not looked into detail for the reasons behind this, but I suspect that it is in part because OS X is tuned to be a desktop, not a server. Responsiveness of the GUI is paramount.
At any rate, this is disappointing, but not a deal breaker for me. While it means I have to run the intensive functional tests on a separate, development machine, I can still use the powerbook for most of my daily work. I always have a workstation somewhere within ssh distance of me anyway, and sadly enough, most of my daily work does not involve CPU intensive activity anymore. Things such as responsive email, web browsing and document editing are more important.
It's Powerbook Time!
I finally gave in to the dark side this saturday. Could not resist it any more. I finally broke down and ... got me a Mac. A powerbook to be exact. Retired my old x86 laptop and went down the path of OS X and the PPC (temporarily). It has been an interesting 48 hours as I tinker with my machine, getting it ready for duty as my development machine on Monday.




