01.18.08

What mapreduce is

Posted in Coding at 10:47 pm by David Kellogg

I knew it was only a matter of time before a pedantic database supporter would chime in that mapreduce is not the greatest thing since sliced bread.

Mapreduce is very cool, and the last people likely to understand this are database programmers.

What is mapreduce?
Mapreduce stems from functional programing, such as Scheme. At the University of Illinois, I thought the CS program did students a disservice by teaching them Scheme. Apparently not. The simplest map reduce function I can think of is counting words.

every good boy does good

Let’s create a map function that simply places a one (1) next to each word and sorts it.

(boy, 1)
(does, 1)
(every, 1)
(good, 1)
(good, 1)

Next let’s create a reduce function that adds the number for non-unique keys.

(boy, 1)
(does, 1)
(every, 1)
(good, 2)

That’s mapreduce. If you had 1 trillion words to count, mapreduce becomes more useful. The data can start on many nodes, then return counted and sorted to many nodes. Even better, numbers are not necessary here. URLs can be added for both keys and values, such as (“www.amazon.com”,”ebay.com”). Using this model, reverse links can be counted, monte carlo simulations can run, and the result is page rank. There are many uses. This does not include every coding problem, but it opens many doors for formerly difficult problems.

Why would database people not understand this?
Database folks appear to be in a rut. They are overly concerned with optimization. They create the index, continue to add to the index, and retrieve data very quickly using the index. What mapreduce does appears wasteful. It creates the index once, cannot add to the index, and throws away all of the work each time it is run. I think someone in the Hadoop world should solve this problem of throwing away the index, but that’s only an optimization. Optimizations do not count where a paradigm shift occurred. Now from the article.

The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.
* Schemas are good.
* Separation of the schema from the application is good.
* High-level access languages are good.

Are schemas good? This implies that data should be strongly typed. Now we are getting into the strongly-typed argument that seems never to be won between Java and C vs. Perl and PHP. My guess by extension is that Perl and PHP are bad and Hadoop is bad as well.

Schemas and applications could be separated, but that makes sense only in a database world. In mapreduce, the programmer is given control over his data. It’s called freedom. This also sounds too much like MVC arguments. Yes, databases and MVC save money in some cases, but in other cases, they just hold back creativity and development.

High-level languages are good. I agree. In a way, mapreduce programs are written in a high level language in all cases. The loop that you think you write in Java or C is actually torn apart and run on many nodes. I only appears that the loop runs on one machine, yet it runs on many machines in tiny pieces.

Some of you out there should try a little mapreduce programing and see how it screws with your mind. It’s wonderful to feel different about a loop. I feel just as good doing this as when I first learned SQL.

01.17.08

10 absolute “No!s” for coders

Posted in Coding at 9:30 pm by David Kellogg

I enjoyed 10 Absolute “Nos!” for Freelancers. This appears to be sound advice. As a counter to this sage advice, I give you

10 Absolute “No!s” for coders

1. Can you comment your code? No!
People ask me all the time if I can comment my code. I just won’t do it. Commenting my code simply invites lesser programmers to come along and steal my job. Remember this, commenting your code will only get you fired.

2. Can you use this/that design pattern? No!
Design patterns merely sell books and create bloated code. The answer is no. Too many devs rely on a singleton that really should have been a global. Why does no one praise global variables anymore? They are all scared by the design pattern mafia.

3. Can you stop using a global variable called temp4 and be more descriptive? No!
I’m old school. If you can’t remember all of your variables and what they do, you should not be writing code. Pure and simple, you should just sit down and remember this stuff.

4. Can you use this/that language for this project? No!
Assembly is good enough for Steve Gibson. It’s good enough for me. People who do not code in assembly regularly simply lose touch of reality quickly. Then people ask why I even need an assembler. Why not machine code? Machine code is never ever portable!

5. Can you get me some coffee? No!
People think because you are not a high-paid, suit wearing consultant, you have to do everything for them. I stop at coffee. I will get you water if I am getting a cup, but not coffee. It’s too demeaning.

6. Can you take a shower? No!
I’ve been here for 22 hours. I don’t have time to take a friggin shower.

7. Can you write down how the config node works for the rest of us? No!
See 1 above.

8. Can you fix your buffer overflow or memory leak? No!
I can find many better things to do than fix a memory leak that probably came from some else’s library.

9. Can you write this as a reusable library? No.
Why would anyone ever want to use my code?

10. Can you write your code on more than one line? No!
There is a reason for semicolons to exist. They separate code. If you can’t read my code, you should get emacs or some IDE to format it for you.

Strangely, one programmer once worked with me who said no to maybe 8 of these requests. I was flabbergasted. Here was a real-life Bartelby that refused to do everything that comes natural to most coders. I have seen multiple functions written on one line. That’s fine, but very hard to debug. I’m still not sure what to due with my temp3 and temp4 variables. Maybe they mean something. For the record I never asked another coder to get me coffee.

01.15.08

Ted Stevens predicted Twitter downage

Posted in Uncategorized at 9:42 pm by David Kellogg

After Twitter’s falling apart during Mac World Expo, I though about Ted Stevens.

Ten movies streaming across that, that Internet, and what happens to your own personal Internet? I just the other day got… an Internet was sent by my staff at 10 o’clock in the morning on Friday, I got it yesterday. Why? Because it got tangled up with all these things going on the Internet commercially. They want to deliver vast amounts of information over the Internet. And again, the Internet is not something that you just dump something on. It’s not a big truck. It’s a series of tubes.

It seems Twitter had trouble scaling those tubes.