Sunday, June 7, 2009

transactional memory

concurrent programming with shared memory is complicated. if there was a way to make such programming easy, shared memory could remain an alternative to message passing in the presence of large amounts of thin cores.

transactional memory aims to automate the selection and acquisition of locks. whether transactional memory can be done efficiently, and adopted incrementally is still uncertain.

there are both software and hardware implementations of transactional memory. at first they seemed to me sequential evolutionary steps but that is not the case. Most HTM implementations operate at cache line granularity and use cache coherence protocols to track and maintain different copies of shared memory areas. most STMs operate at object level and use locks, temporary copies of objects, and additional data structures to keep track of transactions and mutated state. the fundamental problem of HTM is transactions that exceed cache sizes or that last longer than one scheduling quantum. the fundamental problem of STM is overhead. it seems natural to expect a balanced hybrid to offer the best of both worlds.

transactional memory implementations differ in certain key attributes. the choices often depend on whether commits or aborts are expected to be more frequent.

shared state can be modified in place or on a separate copy. in place modification requires a second copy to be created to maintain the original state in case of an abort. in place modification makes commits cheap.

concurrency can be handled optimistically - several threads can enter the same critical section concurrently. if the threads do not conflict the locking would have been overhead. if a conflict is detected all but one thread are aborted.

conflicts can be detected as soon as they occur or only at commit time. late detection saves overhead in the absense of contention. early detection involves more bookkeeping work, but when a conflict occurs it skips the useless computation in the doomed portion of the transaction. late detection can also allow computation on inconsistent state. th results will eventually be discarded but it can result in exceptions that wouldn't occur otherwise.

the semantics of nested transactions can be defined in two ways
for aborts:
  • a nested transaction aborts the parent transaction.
  • a nested transaction allows the parent to proceed.
and for commits:
  • changes committed in a nested transaction become visible to the parent transaction.
  • changes committed in a nested transaction become visible in global state.
IO and cooperation between transactional and non-transactional software are some of the challenges for the future. The classic chicken-and-egg problem is naturally a key blocker. Intel has a lot of eggs in this basket if they expect anyone to buy 1000 core chips.

Saturday, May 23, 2009

yeah

I bought a pair of pants. Six days later I notice the stitching has come apart. No need for vexation. I go to the store, I show the pants and the receipt to the nice guy behind the counter. He calls the manager just to be sure. The manager decides to come over and look at the pants.

The manager then says the pants are fine, the hole is probably of my own doing, but she's not saying it is, she's merely examining the threading. In the end she tells the cashiers to refund me and she leaves. I get my refund. When the cashier ends with "I'm sorry about your pants, they really should last more than six days," I realize that I'm not mad at the cashiers, I'm mad at the manager.

Flip the tables. You're the cashier. A calm customer comes in for a routine refund of a defective good. Suddenly, your boss butts in, she tells the customer to fuck off, she orders you to finish the transaction, and she retreats to the office.

-

Thursday, May 21, 2009

clarksfield turbo

though clarksfield chips are reported to be clocked at measly 1.6, 1.8, and 2.0ghz, nehalem's turbo boosts them up to 3,2ghz for a lone running thread.

the current top of the line mobile core2 chips x9100 and qx9300 are dualcore @ 3,06ghz and quad core @ 2,53ghz, both with 1066mhz fsb. clarksfield uses dual channel 1333mhz memory, and has DMI instead of the desktop variant's QPI.

conclusion: a 2ghz clarksfield will beat an x9100 in single-thread performance, and a qx9300 in parallel performance. clarksfieldy macbook pros should be out right around the time my summer vacation ends.

how bad is DES?

DES was kicked out of srs bsns over a decade ago, yet there are no bruteforce des crackers available for home PCs. DES encryption/decryption is not well suited for the instruction set of a general purpose computer. in 1999 distributed.net harnessed the idle cpu time of around 100.000 computers together with EFF's deep crack to break des within 24 hours. the peak of 170 billion decryptions every second on 100.000 computers averages out to 1,7 million decryptions per second per computer.

for computers 10 times as fast now as those of 1999, a large amount of botnet time would be consumed in cracking just a single key. the current GPGPUs aren't particularly well suited for DES cracking either. a single decryption takes a lot of cycles and a lot of power.

FPGAs are optimal for DES. the described implementation is a 37 stage pipeline composed of single cycle steps that changes keys, plaintext, ciphertext and operation mode on every cycle without stalls.

in 2006, german researchers implemented a 9000 euro device with a total power consumption of 600 watts consisting of 120 FPGAs, each capable of 400 million encryptions per second. they eventually raised the clock speed so that an exhaustive search of the DES keyspace takes 12.8 days. DES is futher susceptible to time/memory tradeoff attacks which copacabana performs.

-

Tuesday, May 19, 2009

how wicket protects you from CSRF

the beauty of wicket finally became apparent to me when I wrote my first delete operation.

the traditional delete works like this:

miau.biz/delete.php?id=5

instead, wicket sets up a bounded number of continuations that the user can invoke. each continuation corresponds to one of the allowed deletions. if an item isn't allowed to be deleted, that continuation does not exist, and neither does there exist a way to invoke it.

an attacker can however induce a victim to invoke any existing continuation against their will.

say the url to delete id 5 is:

miau.biz/::wicket::invokeContinuation:2

putting that url in an img tag will make the browser load it automatically. oops.

the suggested solution is to include a hard to guess token with each link, and requiring separate login for dangerous operations.

wicket uses null as the name of the default PageMap, and sequential numbering for page id's, page versions and markup id's.

wicket has a builtin fix against csrf. you aren't forced to use their solution however. in fact nextPageId() in Session is protected and not final; I can override that with

private static SecureRandom random = new SecureRandom();

synchronized protected int nextPageId() {
return random.nextInt(Integer.MAX_VALUE);
}

not every request gets a unique id, but many requests do. and the whole thing is fairly transparent.

Wicket's CryptedUrlWebRequestCodingStrategy has a fallback mechanism to allow you to also use urls that aren't gibberish:
   public RequestParameters decode(final Request request)
{
String url = request.decodeURL(request.getURL());
String decodedQueryParams = decodeURL(url);
if (decodedQueryParams != null)
{
// The difficulty now is that this.defaultStrategy.decode(request)
// doesn't know the just decoded url which is why must create
// a fake Request for.
Request fakeRequest = new DecodedUrlRequest(request, url, decodedQueryParams);
return defaultStrategy.decode(fakeRequest);
}

return defaultStrategy.decode(request);
}

*cough*

Sunday, May 17, 2009

eclipse containers for fun and profit

given a multi module project in eclipse, where project X depends on project Y,
when I close project Y,
I want project X to depend on a jar of project Y.

eclipse classpath containers are a mechanism for dynamic late binding of classpath elements.

ibm has a tutorial that requires you to register on their site. m2eclipse does the whole thing for maven projects, but also 99 other things, most of which involve running random maven tasks when your trying to write code.

despite all the static methods and integers enumerations, eclipse makes the whole thing easy; including creating your own plugin and publishing an update-site for it.

the heart of the class path container is the is the IClasspathContainer. The getClasspathEntries() method returns whatever is relevant for that situation. in my case it's either a JavaCore.newProjectEntry() or a JavaCore.newLibraryEntry() depending on whether the referenced project is open or closed at the time. For some reason JavaCore.newVariableEntry() comes out as K_SOURCE which doesn't work for a jar. so I use wrapped the resolved path from newVariableEntry() into a newLibraryEntry to make it a K_BINARY. your type is K_APPLICATION, your path is your unique identifier, and stuff after the slash in the path is parameters for your container. in my case the parameters might be the other projects that this project depends on.

then you need a ClasspathContainerInitializer, which will call JavaCore.setClasspathContainer with a new instance of YourContainer on the dependent projects everytime you open or close one the dependee projects. this then sends an F_CLASSPATH_CHANGED signal to those concerned.

you'll know that a project was opened or closed, because you added an IElementChangedListener with JavaCore.addElementChangedListener().

Your initializer is registered by adding it at the extension point org.eclipse.jdt.core.classpathContainerInitializer. never mind the xml, eclipse will generate it for you.

to allow the addition of containers from the build path configuration menu, eclipse requires a classpathContainerPage. either create a dummy one and register it, or add your container to the classpath file by hand.

Finally, I exported my plugin, and then binary imported the same. This is a nonrecoverable operation that replaces the source code of the plugin with the class files.
*cough*

Sunday, May 10, 2009

throwing money at it

sometime in the future apple will get past the awful 1.6.0_07. meanwhile joel on software, did a wonderful piece on using money/ssd drives to speed up their build. I wasn't able to test this myself so I am glad joel did it for me, and confirmed my expectations that compiling/testing is cpu/memory bandwidth bound rather than io bound. our whole build takes up around 200 megabytes of physical space with all its dependencies and the jdk takes another 200 megs. further, a good part of both chunks is probably never loaded. so with enough memory there would be virtually no need for disk io.

what would be interesting beyond this would be to figure out if ssd could help in the overall workloads we use, that is (again): reading email, im'ing, running virtualbox, using an IDE, building, using the browser, profiling, running jetty. all at once. and that could hard to measure unless the difference was so great it would be obvious in actual use.

then I had the chance to work on a brand new windows box with a t5500 (a 1,66ghz core2duo), 2gigs of ram, and corporate antivirus software set to scan all class files, and read michael feather's excellent book working effectively with legacy code.

by the time I highlighted a block of code in eclipse, held down alt and hit the up arrow and nothing happened. 5 seconds elapsed, there was screech from the harddrive for a few seconds and then my method moved up a few lines. I was just smiling from the absurdity.

michael feathers points out the importance of having a fast feedback cycle and his book also alleviated my fears that, while i am waiting for the computer to do something it's natural for my mind to wander off and get frustrated, rather than spending that time subconciously processing the customer's business logic. I've actually found myself trying to optimize my day so that when I have to go to the bathroom, I'll hold out a bit so I can start a build, or get an update from version control before I go. then I realized I was way off putting any mental effort into this.

we now have some good steam going into getting our sprawling build back under control. the experience so far has reminded me of the importance of communication. I just can't guess what others are thinking from just looking at them and watching them act. I really need to make the effort to conciously communicate with people in the same room, the next room, and farther away to find out. and it is often very enlighting to do so.

as an example, it turns out part of the frustration of the slow build was of my own doing. the whole issue could be circumvented not by cleaning up the build, getting faster hardware, optimizing our continuous integration environment, switching to a more branch friendly version control system, but by the simplest solution of all. it all goes away if you just stop running the build.

I also started using a remote git repository on my own server to move scripts around between home and work. I'm expecting it to be even better than posting stuff on my blog and copypasting it at work.

for those on jaunty, who do run the build:
#!/bin/bash
function notify {
test -x "`which notify-send`" && `which notify-send` -u $1 -c $2 -i $3 "$4" "$5"
}
/usr/bin/time -f "%E real" -o ./time.txt buildr $@
if [ $? -eq 0 ]
then
notify normal transfer.complete sunny "successful build" "`cat time.txt`"
else
notify critical transfer.error error "failed build" "`tail -1 time.txt`"
fi
rm ./time.txt
and this too, because it uses the totally sweet erlang icon in the notifications!
       case regexp:match(ServerName, "compute.amazonaws.com") of
nomatch -> ets:delete(instances, Nick),
os:cmd("/usr/bin/notify-send -i erlang -c device.error -u normal 'joe experienced an error'"),
bot:irc_privmsg(Socket, Nick, "some kind of error occurred.");
enjoy.