Premature optimization is the root of all evil

January 17, 2016

A year of using Git: the good, the bad, and the ugly

I have been working with Git for about a year now, and I think I am ready for some summary. Before Git, I used SVN, and before it Perforce, TFS, and some other products, but I will use SVN as a prototypical “non-Git” system.

Git buys you flexibility and performance for the price of greatly increased complexity of workflow.

SVN will work for you better if

  1. All your developers work under single management, AND
  2. All developers can get relatively fast access to a central server.

Conversely, you should prefer Git if

  • There are multiple independent groups of developers that can contribute to the project, AND/OR
  • It is difficult to provide all developers with fast access to a central server

Here is my take on the good, the bad, and the ugly sides of Git.

GOOD: great performance, full local source control that works even when disconnected from the server, ability to move work quickly between servers without losing history.

BAD: complexity of workflow, more entities to keep track of, lots of new confusing terms, “one repo is one project” policy, limited/outdated information about the remote, misleading messages. Commit labels are not integers, which complicates build/version number generation.

UGLY: you can lose work by deleting branches or tags.

[read more…]

January 7, 2016

Thank G-d for StackOverflow

I have just run into a problem: a test passes fine on the CI sever, but breaks on my machine. Reason: it gets a wrong config file. Root cause: when you run tests from multiple assemblies via Resharper test runner, it plays games with AppDomains for the sake of optimization. Let’s say you are testing assemblies A.DLL and B.DLL. If you run tests from B.DLL separately, they will be executed in an AppDomain that points to B.DLL.config. However, if you run tests from both A and B in one session, they all will be executed in a single AppDomain, most likely pointing to A.DLL.config, and thus tests from B.DLL may fail.

It turns out there is an option to turn off this optimization, but obviously it is not really easy to find among 10,000 Visual Studio options. Thankfully, someone raised a similar question on StackOverflow:

They even submitted a feature request to JetBrains:

In the comments to that request, JetBrains support says this is already fixed, but it’s not clear in which Resharper version. I sent a query to JetBrains support to find out, waiting for their response.

January 2, 2016

PHP SimpleXml is Broken: Part 2

In a nuthsell, SimpleXMl has two things broken:

1. Elements that represent “emtpy” tags evaluate to false. This is weird design.
2. Other elements sometimes evaluate to false due to bugs.

I covered problem #1 in the previous post. This post is about #2.

Unfortunately, due to bugs in the SimpleXML parser, a non-empty element may evaluate to false. No version I tested is completely bug-free, and the difference between correct and incorrect behavior may be as subtle as adding a space outside the element in question.

The code I use for testing is along these lines:

$xml = $xml = simplexml_load_string($text);
$node = $xml->xpath($xpath)[0];
if ($node) echo "Evaluates to true!"; else echo "Evaluates to false!";

Here are the test results:

$text $xpath Result Boolean value
<root><b>text</b></root> /root/b <b>text</b> PHP 5.4.4 – true
PHP 5.4.45 – true
PHP 5.6.15 – true
<root><a/><b>text</b></root> /root/b <b>text</b> PHP 5.4.4 – false
PHP 5.4.45 – true
PHP 5.6.15 – true
<root><a/><b>text</b> </root>
note extra space before “</root>”
/root/b <b>text</b> PHP 5.4.4 – false
PHP 5.4.45 – false

PHP 5.6.15 – true
<x:root xmlns:x=”urn:q”><a /></x:root>
same as the first one, only with a namespace
/x:root <?xml version=”1.0″?>
<x:root xmlns:x=”urn:q”><a/></x:root>
PHP 5.4.4 – true
PHP 5.4.45 – true
PHP 5.6.15 – true
<x:root xmlns:x=”urn:q”><x:a /></x:root>
same as previous, but using <x:a /> instead of lt;a />
/x:root <?xml version=”1.0″?>
<x:root xmlns:x=”urn:q”><x:a/></x:root>
PHP 5.4.4 – false
PHP 5.4.45 – false
PHP 5.6.15 – false
<x:root xmlns:x=”urn:q”><x:b>text</x:b></x:root> /x:root <?xml version=”1.0″?>
<x:root xmlns:x=”urn:q”><x:b>text</x:b></x:root>
PHP 5.4.4 – false
PHP 5.4.45 – false
PHP 5.6.15 – false

I believe evaluating non-null objects to false was not a good idea in the first place, but worse yet, PHP cannot really do it right, so the result of if ($node) check is never reliable.

PHP SimpleXML is broken: Part 1

The following code does not work as expected:

$xml = simplexml_load_string("<root><a/><b>text</b></root>");
$node = $xml->xpath("/root/a")[0]; 
if ($node) process($node); // process($node) may not be called for some valid nodes

Unlike most other objects, SimpleXmlElement may evaluate as false even when it is not null. Specifically, SimpleXmlElements representing empty tags (like “<a />) will evaluate as false, so process($node) won’t be called for them.

Yes, they made a special case for it in the language definition. E.g. DOMElement representing an empty tag won’t evaluate to false, at least not according to the documentation.

So, in my previous post “Node is null” should read “Node evaluates to false”, but it is still a bug, since it was not an empty tag.

When you write if (something) that something will be converted to Boolean per the below conversion rules. Most of them make sense, but some of them totally don’t. If you want to check for null, you’d better do it explicitly: if (something !== null).

PHP Boolean Conversion Rules

Per the PHP documentation on Boolean (comments are mine):

When converting to boolean, the following values are considered FALSE:

the boolean FALSE itself Totally makes sense
the integer 0 (zero) Like in C/C++, so it is expected, even if may not make perfect sense
the float 0.0 (zero) Like in C/C++, so it is expected, even though it makes even less sense
the empty string Not like in C/C++, but sort of makes sense
and the string “0” Why?! Because it represents integer 0? How about string “0.0” then? Or string “false”?
an array with zero elements Sort of makes sense
an object with zero member variables (PHP 4 only) Does not make sense, but they deprecated it
the special type NULL (including unset variables) Totally makes sense
SimpleXML objects created from empty tags Why?! Who came up with this wonderful idea?

Rest in peace, dear rule of least astonishment. You should never have ventured in that cruel PHP land.

PHP bug + version dependency hell

I have discovered a bug in PHP 5.4.4: a simple XPATH does not work.

$xml = simplexml_load_string("<root><a/><b>text</b></root>");
$node = $xml->xpath("/root/b")[0]; // $node is null $node evaluates to FALSE

This is so ridiculous, I could not believe my eyes. Nevertheless, it’s a bug. It does not occur with the latest version of PHP (5.6.15), although I could not find a specific fix that would be responsible for this in the PHP bugs database.

The trouble is, PHP 5.6.15 no longer supports Apache 2.2, which means I have to upgrade to Apache 2.4. This is probably a good idea anyway, but I hate this kind of domino effect.

Alternatively, I could ditch SimpleXML and switch to DOMDocument. Or write the whole thing I was writing (which is a test project anyway) in something more civilized like ASP.NET. Anyhow, fun stuff!

Update: PHP 5.4.45 does not have the bug. PHP guys seem to be doing a good job backporting bug fixes to older versions. Revolution averted :)

Update2 : Correction: in PHP 5.4.4 the node was not actually NULL, it was (incorrectly) evaluating to FALSE as if it were an empty tag. This, however, is still a bug. The fix in PHP 5.4.45 is also only partial. See my next post.

December 31, 2015

WCF Restful service: returning data in different formats depending on request parameter

Suppose I want to return a list of users in JSON or CSV format depending on a query parameter.

MyService.svc/users?format=json returns JSON
MyService.svc/users?format=csv returns CSV

The question is: how do I implement it with WCF (leaving aside the discussion on whether it’s a good idea)?

The answer: the multi-format method must have return type Stream. The programmer is fully responsible for formatting the response body in all cases, including setting Content-Type.

    public interface ITestService

        [WebInvoke(Method = "GET", UriTemplate = "users?format={format}")]
        Stream GetUsers(string format);

Solution attempts that did not work

1. You cannot have two methods that differ only on query parameters.

    public interface ITestService // Does not work as expected

        [WebInvoke(Method = "GET", UriTemplate = "users?format=json", ResponseFormat = WebMessageFormat.Json)]
        List GetUsers();

        [WebInvoke(Method = "GET", UriTemplate = "users?format=csv")]
        Stream GetUsers();

This throws an exception at runtime saying one cannot have two methods whose URI templates differ only by query parameters.

2. You cannot have a method that returns object. I expected WCF to turn off response formatting when dynamic response type is Stream. This turned out to be not the case. WCF turns off response formatting only when the method’s static return type is Stream. See my question on StackOverflow.

Things to pay attention to

1. If you create a MemoryStream and write into it, make sure to reset the pointer to the beginning. If you don’t, the response body will be empty.

2. Don’t forget to set Content-Type and (optionally) Content-Disposition.

3. JSON must be serialized by hand.

4. Don’t forget to properly handle case of bad request parameters.


December 25, 2015

Christmas carol: XML Signatures


  • XML signatures are trouble, because they are ridiculously hard to implement right.
  • XML signatures in PHP are double trouble, because of lack of decent libraries.
  • .NET security libraries are mildly annoying, because they cannot read RSA private keys from a PEM file, which are the standard form of storing RSA private keys.


My team currently maintains a project that provides single sign-on for 3rd party web sites. These days single sign-on appears in many places on the web: 3rd party web page displays “login with your Facebook account” button, you click on it, Facebook verifies your identity, and sends secure message to the 3rd party web site assuring that you have been authenticated. In my case, the authenticator is not Facebook, but our company’s platform.

I tried to implement a sample 3rd party site using PHP and SAML. I chose PHP because my home web site uses it, and because our server is implemented in .NET, so by using PHP on the other side I can prove cross-platform interoperability.

SAML Request is a simple GET

I relatively quickly finished the SAML request part: it amounts to a large GET request where the XML is compressed and signed with the 3rd party’s private key. The request and the signature are supplied as separate request parameters. The URL has its share of long namespaces and long URL-encoded nonsense, but this is all relatively benign.

To verify that the SAML request came from an authorized partner, the SSO platform needs to look up the Issuer field from the request, find corresponding partner public key in its database and verify the cryptographic signature using standard library.

SAML Response has an XML Signature

SAML response tells the partner whether the user was authenticated or not. It can take multiple forms, but in my particular case it is a POST containing an XML document. Authenticity of the information within the document is verified by an XML Signature.

Creating XML Signature

Creating an XML signature involves many more steps than simply digitally signing a bunch of bytes. As most other things designed by W3C and OASIS, XML signature has so many variants and options on top of standard RSA signatures, that it is basically impossible to get right without a library, and writing that library is not an easy task.

XML signature allows to sign many “references” at once. In theory a reference can be any URI. In practice it is either the entire current document (empty URI), or a fragment of the current document ("#fragmentID"). Each reference is canonicalized and then its “digest” (cryptographic hash) is computed. The digest is a regular SHA1 or SHA256 hash, this step does not use any secret keys.

In the second step, all digests are gathered in a <SignedInfo> structure. Additionally, it may also contain the issuer’s public RSA key, or even an entire X509 certificate. Besides RSA key, a certificate contains issuer information such as location and name, and may be signed by a trusted Certificate Authority. SignedInfo is then canonicalized, and digitally signed with the issuer’s private key. The SignedInfo with the digests and the signature are added to the original XML document, making it signed.

Note that SignedInfo may not contain any key information at all, in which case the verifier will have to find the issuer’s public key from other sources, e.g. from a database of known issuers. Even if SignedInfo does contain a key, the verifier must ascertain that this key indeed comes from the issuer and not from some malicious middle man.

To verify an XML signature one must

  1. Ensure that the issuer public key is valid by tracing the certificate to a trusted authority, or by checking a database of known issuer keys
  2. Canonicalize the SignedInfo node.
  3. Verify that digital signature of the SignedInfo node is valid.
  4. Canonicalize each reference, compute its digest and verify that it matches the digest provided in SignedInfo.

If all of the above checks out, the document is authentic and has not been tampered with.

The nightmare of canonicalization

Note, that the document text may (and probably will) be sent in a non-canonicalized form. So, there is a pretty good chance that what was digitally signed is not what you see in the document. For the signature verification to succeed, both the issuer and the verifier must perform the canonicalization process, and they must do it exactly in the same way, or the digests won’t match.

The canonicalization algorithm is, to put it mildly, non trivial. To makes things easier for us, “exclusive canonicalization” is defined as “Canonical XML” with a couple of “exceptions”. Canonical XML is a hefty algorithm by itself: it converts self-closed tags (<tag />) to open-close tag pairs (<tag></tag>), deals with white space, attribute order, propagating namespaces, and the like. Exclusive canonicalization adds a paragraph or two of refinements that should be applied under simple conditions like “if the prefix has not yet been rendered by any output ancestor, or the nearest output ancestor of its parent element that visibly utilizes the namespace prefix does not have a namespace node in the node-set with the same namespace prefix and value as N.

A “non-normative implementation” provided in the standard works in “many straightforward cases”, but, by the standard author’s own admission is “constrained”.

PHP libraries and online signature verifiers

I found two PHP libraries that purport to implement XML signature verification:

robrichards/xmlseclibs: the use case in the documentation shows how to create XML signature, but not how to verify it. There is a “verify()” method, but it is not quite clear how to use it.

Marcel Tyszkiewicz’s php-XMLDigitalSignature. It also has a verfiy() method, but the examples only verify signatures that have just been created. It is not quite clear (even after looking at the source code for some time), how to verify a signature that came from someone else.

There is also FR3D/XmlDSig, which is a wrapper around xmlseclibs. Its code has a straightforward verify() method that does uses xmlseclib’s verify().

However, even with this library I could not verify the signatures, because it turns out that our server was not doing the canonicalization right.

XML signature verifiers

To sanity check my documents, I wrote a console verifier program using .NET framework (perhaps more about it later).

I found two online verifiers. works well with documents signed in their entirety (ReferenceUri=””), but fails if signature is only applied to a portion of the document (ReferenceUri=”#mySignedElement”). Offline .NET version can handle both cases fine. does not seem to work at all: it failed to verify any document which was recognized as good by other verifiers.

Bottom line

Frankly, guys, this is ridiculous. Signature verification should not involve so much complexity. “Keep it simple” principle was violated big time here. And I still did not achieve nirvana by verifying my signatures in PHP, although the FR3D library does look promising.

December 2, 2015

Windows 10 Install

I did a clean install of Windows 10 on a brand new SSD.

Good things:

  • It was relatively straightforward and quick.
  • It also rebooted seamlessly without requiring me to remove the DVD so it could boot from the hard drive.

OK things:

  • Most hardware worked out of the box, but not the speakers.
  • I was able to solve the speakers problems via a few clicks (update driver here, properties there), without going to Google

Bad things:

  • By default a lot of data goes to Microsoft: browsing history, typing information (whatever it is), et cetera.
  • It asked me a lot of questions about how I want to use Wi-Fi. The problem is, my desktop does not have Wi-Fi capability (shocking, eh?), but they never bothered to check.
  • It never asked me about the time zone and determined it “automatically” instead by putting me on the West Coast. Wrong!

November 11, 2015

The art of doubt

I run into the same story time and time again.

Someone stumbles upon a bug that they cannot explain. All plausible reasons are eliminated, but it still does not work. They call me for help.

When all plausible reasons are eliminated, what’s the next logical step? Start looking into less plausible reasons.

The amazing thing is, most people categorically refuse to “waste time” by verifying their “obvious” assumptions, even when faced with a problem they can’t explain. People know that custom port works exactly like standard port, that function Foo() never throws exceptions, that messages always arrive in a particular order, that the water in the Southern hemisphere whirls in the opposite direction, et cetera, et cetera. Furthermore, they put up significant resistance when advised to take actions that would verify that these assumptions are actually true.

“I see you are using the beta version of Bazooka service here. Let’s try production”
“But the Beta is exactly like production! I know it works just fine. It would be a waste of time!”
“Well, we don’t know what’s going on, maybe it is not exactly the same, let’s try it”
“Why should we try it? The problem must be something else!”

And of course, we switch to production Bazooka and it works. Bazooka Beta has a bug. If they were able to systematically test the “obvious” assumptions, they would not have needed me to find that out. But they can’t.

I guess this is more or less how religion works: experimentally verifying the dogmas is not only unnecessary, but highly undesirable. I can find the bug not because I am smarter, but because I am willing to systematically question “the obvious”, even if it sounds crazy.

November 5, 2015

Big Brother is seriously watching you

4:18PM: Went to Google Maps and entered “Tirana” in the search box. This was a random act of curiosity: we recently had a conversation about Albania at work, and I was wondering how the city looks. I am certain I haven’t mentioned Tirana anywhere on the Internet within the last two or three months at the least, and perhaps longer. I was using Chrome and I was logged in to my GMail account.

5:22PM: Received an e-mail from with the subject “Great Hotels in Tirana – Book Now!”

Conclusion: Most likely Google shares my map search history (and G-d knows what else) with third party sites, and it can be acted upon in a matter of minutes.