Mar 13, 2016

MongoDB v 3.2 + silently breaks backward compatibility affecting behavior of capped collections

Capped collections are collection with some very special characteristics,  although these are used internally by MongoDB for replication (oplog) are also offered as a feature for general use, their limitations and  advantages are well documented.
Because of those advantages developers are using capped collections widely for many purposes, and this usage it is not a hack or undocumented feature exploitation because capped collections are well documented and a well known feature of mongoDB.
I have seen capped collections used as :

  1. FIFO buffers
  2. Trigger-like mechanisms
    (as a matter of fact a well known framework is based on this feature
  3. PubSub architectures
  4. Message queues
    Yes you can implement a message queue with mongoDB, although it will not be as fast as ZeroMQ or some other tools, it has some attractive benefits as persistence, built in redundancy, limiting stack complexity etc. (My implementation of a PubSub/message queue)  
Except that you can't delete a document in a capped collection the other major limitation used to be that a document couldn't grow up in size as a result of an update. This is well understood by the developers and could be addressed by various techniques as:
- pre-filing field(s) with dummy values
- un-setting a field during an update operation to make space for a new field(s) etc.

Now all of a sudden MongoDB from version  3.2.0-rc0 + decided to introduce one more limitation "the document can't shrink in size" as a result of an update.
This breaks backward compatibility of a well published and widely used feature and although I understand the technical reasons behind this decision as described here, still I can't accept that such a decision can be taken so lighthearted without consulting with the developers/ecosystem. If you follow the discussion in the above ticket in Jira you get the impression that the only thing that they really cared about is how this can possibly affect their internal use of the feature and since they found no side-effect they went ahead with this.
To keep adding salt to injury the breaking change is not yet published in the manual as of today, neither I can find any reference in change logs.
So it is left to developers to find it out the hard way when their code breaks after an update.
I filled a ticket, then I realized a related ticket already exists but there is no action taken yet.
Of course those tickets deal with the documentation only since the code breaking change is there and we have to live with it.
Too bad many developers didn't realize this change was planed, that's why a compatibility breaking change should get as much publicity as possible so that developers don't get caught of guard, since is impossible for a developer to follow each and every ticket in Jira.

This is a sad story that  I hope will not be repeated in future since some of the things that made mongoDB so successful IMO are:
a) developers are not caught up by surprises
b) high quality of manuals

Dec 9, 2015

MongoDB 3.2: Now Powered by PostgreSQL ?

Having said that MongoDB is listening to developers/ecosystem some months ago, today I have read an article MongoDB 3.2: Now Powered by PostgreSQL by @jdegoes that contradicts my own experience.
I will not categorize this article as one more mongophobic article that makes headlines from time to time here and there  like: 
Although I understand writer has some personal interest as he has invested in mongoDB Analytics, I have to admit that he is making a case that technically looks very sound to me "flattening out the data and using a different database to execute the SQL" is not the way to go for MongoDB BI solutions.
I also agree with much of his arguments when he tries to describe of what is going wrong with mongoDB's ecosystem.

Still I am not as pessimistic as the writer and I hope that:
  • a) This connector to BI tools is only a temporary quick fix solution and there are better tools coming in the pipeline.
  • b) MongoDB will listen to his arguments and comes with a revised policy regarding its partners and the ecosystem at large.
    The story of how $lookup ended up been part of community edition makes me believe that mongoDB can do that.

Sep 1, 2015

MongoDB is listening to developers/ecosystem

Today MongoDB 3.1.7 is released and shell includes the new CRUD API.
I am happy for this feature to be implemented and get into production so fast and feel really glad that I am the one who has triggered the introduction of this new API.
It all started 5 months ago when PyMongo 3.0 was introduced and A. Jesse Jiryu Davis the coauthor of pymongo wrote about new CRUD API in his blog where I posted a comment complaining that this is a step backword unless it is implemented in the shell API as well.
Jesse engulfed the idea opened a ticket at mongoDB's Jira then things started rolling. Today I was reading again his blog and was excited to realise suggestion was in production. I am also thankful to Jesse for his kind words and attribution and feel obliged to repeat those here:
The official announcement focuses on bug fixes, but I'm much more excited about a new feature: the mongo shell includes the new CRUD API! In addition to the old insert, update, and remove, the shell now supports insertMany, replaceOne, and a variety of other new methods. Why do I care about this, and why should you? MongoDB's next-generation drivers, released this spring, include the new API for CRUD operations, but the shell did not initially follow suit. My reader Nick Milon commented that this is a step in the wrong direction: drivers are now less consistent with the shell. He pointed out, "developers switch more often between a driver and shell than drivers in different programming languages." So I proposed the feature, Christian Kvalheim coded it, and Kay Kim is updating the user's manual. It's satisfying when a stranger's suggestion is so obviously right that we hurry to implement it.
I'm so glad we took the time to implement the new CRUD API in the shell. It was a big effort building, testing, and documenting it—the diff for the initial patch alone is frightening—but it's well worth it to give the next generation of developers a consistent experience when they first learn MongoDB. Thanks again to Nick Milon for giving us the nudge.
In an other occasion I requested a feature this time from pymongo's team a few days ago, that was a trivial one and easy to implement that was to name the threads that pymongo is creating for debugging purposes, of course I could have done it myself and request a pull from github but it involved naming conventions for which I was not sure, so I prefered to post a feature request in jira, Bernie Hackett responded "it seems a good idea" and to my suprise I saw this implemented in next release few days later.
What this story tells us developers is that we can request for features/fixes and can expect to see those implemented in reasonable time even if it is a major effort provided that:
  • our requests are reasonable and technically sound.
  • we document those properly.
  • we use proper channels to communicate.
  • the company/organization has a culture/history of listening to developers/ecosystem and those 2 examples proove that mongoDB is one of those.

May 18, 2011

New App Engine Pricing policy, the good the bad and the ugly.

Not many good news from the cloud recently, Amazon’s AWS had a long downtime 3 weeks ago and Google announced a much controversial new pricing model for App Engine during Google I/O 2011. I will concentrate on the later since a lot of developers keep asking what this really  means to them.
To start with, the announcement was immature and got developers by surprise. Up to now many details are sketchy and a lot of things remain to be defined (remember - devil hides in the details). I understand it was made in a hurry in order to catch up with I/O 2011, but this is not a good enough excuse. Google could just announce the basics and wait for an official announcement when they were really ready to present a well defined pricing policy, preferably after some more consultation with developers and other platform stakeholders. Yes there was a survey last February but no results were published and if I judge from users comments at the time it seems their concerns are not resolved by the new pricing policy.

Good things first :
Sure there are some good things announced so let me name a few:

  • App Engine is leaving Preview status so it becomes a mainstream product thus offsetting some of the worries that Google would possibly discontinue the platform.
  • It will come soon with a 99.95% uptime service level agreement for paying customers, which means it is mature enough for enterprise level applications.
  • “Go” language is added to the stack along with the python and java.
  • Back end (always on) servers are available now.
  • High Replication Datastore prices got a haircut, probably as an incentive for developers to move their applications from Master-Slave Datastore.
  • Blobstore is available now to free applications as well as back ends and other APIs. This is great since it allows new comers to experiment with all available tools and APIs before they commit to the platform.
  • New interesting features are added to road-map as well as promises for some badly needed tools (sockets etc..)

Bad things :

  • Free application usage is much more restricted with new limits applicable to Datastore API (max 50k operations per day), email recipients per day drastically reduced, and XMPP and channel API quotas are trimmed down drastically.
  • On-demand Frontend Instances (max 24 Instance Hours). This although on paper looks good compared to 6.50 CPU hours of current quota, until we take into consideration that the new unit is instances with a minimum charge of 15 minutes per instance per use as was disclosed in the forums. This combine with the new datastore quotas makes absolutely prohibitive for any free (especially python which lacks a multi threading capability) applications to serve reliably 24 hours a day even for minimum amount of traffic and defies the promised 5000000 requests / month free (well ... that was the promise made 3+ years ago when app engine came to life) .
  • Datastore departs from actual CPU cycles used model and joins a not yet defined model that charges per varius datastore operations. Although I understand the motives here (more transparency they claim and they are right, cpu usage it is hard to be understood by enterprise accountants) I do not see how they can make it measurable and transparent given the many different types of datastore operations (reads, writes, key only fetches, deletes etc. etc.). For example how will they charge for a normal fetch of 1000 entities vs an enumerated fetch where a coder trades memory for execution speed. This and many more questions remain answered by the released Pricing and Features preview table.
  • Pricing based in CPU usage is over, new billing will be on a per live instance per hour basis with minimum 15 minutes (reasons given: again more transparency and inability to charge for memory used by an instance while serving). Well this is the issue that created a lot of backlash among the developer community and with very good reason. How in the world we are moving from ms pricing granularity to 15 minutes, it is beyond my imagination. This is against the long standing App Engine motto “pay as you go”. Now you got to pay going or not going, if you want to secure an instance on standby in case needed or if 2 requests happen to come at same time in a python application, your application can serve those consuming just 100ms of cpu time still you got to pay for 15 minutes of instance time while the instance will probably sit there idling for 899900 ms. This is not green computing.
  • Google’s answer is that the new scheduler will take care of those things to some extend, which I really doubt, but even if this comes true still why should we have to be charged acording to schedulers efficiency which we do not control ?
  • Reserved instances with a reduced pricing (an idea borrowed from AWS ?) is a new toy, we have to see how it works out but still it makes application utilization planing and billing much more complicating.
Ugly things :

  1. The way the new scheme was introduced created a lot of confusion to app Engine advocates, while helped its enemies to spread a lot of FUD around. All in all it was close to a PR disaster. Some app engine engineers writing in the forums and talking in I/O 2011 helped with calming down the crowds for the time been but, I am not sure for the end result at the end of the day.
  2. The tactics used left the developer community with the impression that GAE is concentrating on enterprise and abandoning developers and small business. This may be unfounded but if you take a look at the url pointing to the new pricing list you can see it written - loud and clear : “enterprise/appengine/appengine_pricing”.
  3. It looks like after abandoning App Engine for business, Google tried to accommodate that project into existing App Engine platform by squeezing some of the breathing space used by existing developers.
  4. New pricing model looks more like IaaS than a PaaS service which GAE claims to be.

So how really the Years Ahead for Google App engine look like ?
I have no simple answer to that and I do not want to jump to premature conclusions until the dust raised by the latest announcements settles down and more concrete pricing policy details emerge. Unfortunately this is going to take sometime, meanwhile I feel that unless there are some changes to the policies just announced and some pleasant surprises by GAE team when all this gets finalized GAE’s future looks grim.
Do not read me wrong I am an early adopter and advocate of App Engine and I want it to succeed, but I am an grown up man and can’t be turned into a fun boy applauding everything  that comes out of Googleplex.
I am disappointed but I hope things will turn better than what look like now and I do see some signs on the horizon that tell me this is happening already as engineers are trying to take back their baby from the poor sighted accountants and GAE4B group who hijacked the plane.
I fully understand that accountants do have a place in managing this business and help make it sustainable and profitable, something that will benefit Google’s shareholders and developers alike. My objection is that they do not really understand the product, its strengths and virtues, so it is up to the engineers to communicate those to them and only then finalize a pricing policy. From what I read in the groups and social media developers are ready to support the product and willing to pay double or even triple the price they are paying now, what they really do not like are new policies that ruin their work and time they spend trying hard to optimize their code.
App engines main attractions are:

  • Automatic unlimited scalability. Do not spoil this by introducing policies that fight that. I am referring to reserved instances and passing responsibility of fine adjusting the scheduler to developers.
  • Pay as you go. Billing by the instance especially at this high granularity is against this principle, further more it violently repositions App engine as more of VS kind of thing and closer to an IaaS rather than a PaaS service. I believe this is also a bad marketing policy because App engine can’t never compete with IaaS offers like AWS and the vast ecosystem that exists around these products. Of course App engine’s people argue that a managed environment can’t be compared to an unmanaged one but their actions make this differentiation a very difficult thing.
  • Start up and small business friendly. That used to mean a smooth gradual transition path between free and per use paying system. New policies destroy this by drastically lowering the quota on free package and steeply raising the entrance fee of a paying account. This gap has to be bridged somehow. I am not talking about the $9 per month fee which is reasonable but the accumulating costs of instances running, datastore operations quota etc.. Perhaps a step to this direction that can be considered is the introduction of an intermediate pricing level between free and fully paying applications for developers who are not really ready for prime time and do not need an SSL certificate neither an SLA contract.

Steering GAE’s ship to the enterprise is not inherently wrong for most of developers, since it provides opportunities for them. But putting most of the effort to the enterprise while individual developers feel - rightly or wrongly does not really matter - abandoned is wrong and is not going to work in the long run. Not many fortune 500 type customers will join unless they know there exists a healthy and growing ecosystem around the product. I do not have the numbers but Google says there are around 100k active developers, although this is not a small number still can not be considered a game changer. So even if product development strategy is looking to big enterprise customers the timing is wrong, priority at this point in time should be given to developing the ecosystem.

I want to believe all this is a nightmare that will pass soon as GAE’s team start to understand what is happening to their ecosystem and steer clear of trouble and  my plane keeps on "flying into the clouds".

Update May 18, 2011
Google's Gregory D'alesandre has posted a "FAQ for out of preview pricing changes" where he tries to answer some of the questions, and clear up some of the mesh new policy has created. Also there are some definitions there of what consists what in new App Engine speak, looks like we got to study a new science and a brand new terminology before we can proceed.
IMHO this is a sisyphean task, new policy has opened Pandora's box with questions popping up from it in a much faster rate that can be answered.
Update May 19, 2011
Lots of talk and fighting in the forums with developers comparing App Engine vs AWS vs Rackspace vs any_other_VPS_service_on_earth.
There was not such talk before, coz App Engine looked different from those products both in terms of pricing as well as features.
Now thanks to latest news it managed to be transformed to Yet_An_Other_VPS overnight and I can not blame developers for those comparisons.
To tell you the truth I have seen that coming ever since I show SQL databases on the roadmap.
IMHO these are signs that we are on the wrong path, but .... then again who am I to give advice ?
or ... if I can quote @saidimu : "A true mark of a dysfunctional platform: in-fighting among developers who formerly only sang praises of the platform. #AppEngine"
Update May 20, 2011
Plenty of new questions waiting for replies.
A real interesting one by Raymond C :"Is MapReduce still a flexible solution on AppEngine under the new pricing model ?"
My answer : probably not, new pricing model makes mapreduce operations a no - no. Price will be prohibitive for such operation especially ones that depend  on many instances to run a job fast, unless those used to take hours rather than minutes to complete. So I guess the team can drop the "reduce" part and query based mapreduce things from roadmap, new model renders  those irrelevant for most use cases. Also drawing a "danger - high $$$" icon as a precaution next to copy/delete model buttons on control panel would be a good idea.
You can read a great, summary of what new changes bring to App engine by johnP here

Apr 9, 2011

Google Maps API rate limiting and Google App Engine again

@bFlood : Maps Premium is an expensive service usually used by not public-facing, password protected Web sites, while what we are talking about here is the free Google Maps service, where G is penalizing all Apps running on top of GAE since they have to share the ip addresses pool of GAE ip adresses.
@Ikai : what you write applies to old maps (V2) API where you can obtain an application authorization key and applications are rate limited based on this key, although I feel that some kind of ip based limitations exists also.
Maps V3 which is the way to go especially for mobile appls, do not require an application key, instead there are rate limitations based just on originating IP addresses, this puts GAE based appls on a disadvantage since we have to share this with all other GAE based appls using the service.
(see : post publised in App Engine group)

Mar 25, 2011

App Engine's Email service future

From time to time I have seen posts in app engine's groups of people complaining for Email service glitches. Some of those real some not so. Sometimes coming from people who did not bother to read the documentation or just ignorant of what an Email service is all about.
App Engines team response alarmed me a little.
IMHO it is not a good policy for GAE to abandon(?) services middle way
instead of improving - enhancing those.
so I wrote the following post and now I am waiting to see how the conversation evolves.

I agree on most of what you write above, and I understand that you prefer to focus on more important things, also having run Email services for enterprises in the past I do know it is not trivial.
But ....
still I believe Email service is a major asset for GAE and dropping it (or anything to that effect) will constitute a major blow to App Engine.
Gae offers a limited subset of services compared to what a LAMP box or a IaaS box can offer but been a PaaS provides trouble free operation and automatic scalability.
Email service usually is part of any web operation so by dropping it out you farther limit the number of potential applications that fit well into what GAE offers.
Of course, developers can look into alternative options but this makes our life difficult since we have to integrate several other third party services in order to make a working web solution i.e. setting up multiple accounts, feed traffic back and forth to other services, having to monitor and deal with one more possible point of failure. All this defies to some extend the benefits of GAE as a PaaS.
Also, dropping a service in a time when competition is adding services, will sent the wrong signal to App Engine's developers/users echo system and having in mind that G is associated with the best email service can possibly turn into a PR disaster.

Further more, been a regular reader of the groups and following App Engine since the very early days I do not see that Email service has raised a lot of issues. I believe for most people who know what they are doing and do not abuse the service it works quite smoothly. Some of the issues raised (mainly spam flagging)
a) happen to the best of Email services b) are addressed by well known techniques and practices described by others here and elsewhere.
In conclusion:
I would welcome any measure taken to fight service abuse like using GAE primarily as a mail server - we all understand that this is not what GAE is all about.
Instead of dropping the service I would prefer to consider:
a) put false positive spam flagging issues under the responsibility of developers.
b) exclude the service or part of it (like delivery assurances) from the future SLA offer.
c) think about the technical possibility to integrate it to gmail which is the *most* reliable email service in town.

Feb 15, 2011

Selecting distinct entities across a large table

I have faced this kind of problem some time ago.
I tried some of the solutions suggested below (in memory sort and filtering, encoding things into keys etc. and I have benchmarked those for both latency and cpu cycles using some test data around 100K entities)
An other approach I have taken is encoding the date as an integer (day since start of epoch or day since start of year, same for hour of day or month depending on how much detail you need in your output) and saving this into a property. This way you turn your date query filter into an equality only filter which does not even needs to specify an index) then you can sort or filter on other properties.
Benchmarking the latest solution I have found that when the filtered result set is a small fraction of the unfiltered original set, is 1+ order of magnitude faster and cpu-eficient. Worst case when no reduction of the result set due to filtering the latency and cpu usage was comparable to the previous solutions)

Hope this helps, or did I missed something ?
Happy coding-:)