Tuesday, August 12, 2014

Open issues

This is the last article for the moment. It is about things I don't know how to solve or have not yet had time to solve.

Double transactions
I've already mentioned this issue but I still don't know what is the root cause and how to mitigate it. The check to not insert already inserted transaction prevents creation of ~25 double transactions per day. Still, I see there is usually one double transaction per day (that is per ~2500 transactions). I am quite confident this is a sever side issue as __createdAt times are very close to each other.

POS does not work on iPhone
For some reason, the application does not work on iPhone. The main page is displayed, user is correctly redirected to the Google's sign in page but after successful sing in, she is redirected back to the main page which does not proceed to the next page. I've briefly tested the issue on borrowed iPhone and found out it is caused by combination of AWS and Durandal. Pure AWS page works just fine and I have not found any indicia that Durandal does not work on iPhone. I believe I just need to borrow the iPhone for longer time to fix this issue.

Built-in *.azurewebsites.net SSL certificate
The server is hosted on azurewebsites.net domain and uses built-in wildcard certificate for secure access. Although I have read several articles that using wildcard certificates is bad, I don't think it is bad in our case. It is a REST service where both endpoints are fixed. If you disagree, please let me know an example of real threat.

For reference:

Authentication token expiration
The authentication token expires approximately every month. Which is the reason why I still have not developed automatic re-authentication when the token expires. It hasn't had enough priority.

There is a way how to do it generally for all requests by using withFilter (usage example). Or I will just sign out and in every couple of days. Whatever will be easier to implement.

No tests for server side JavaScript
I have tests for client side JavaScript and for SQL queries but I do not have any tests for sever side JavaScript. The main reason is there was usually just a few lines of simple code so I did not bother to create a testing set up. But this is slowly changing so creating server side JavaScript tests climbs up to the top of my to-do list.

Saturday, August 9, 2014

Lessons, Surprises, Mistakes III. - AMS

Next set of lessons, surprises and mistakes; this time about Azure Mobile Services (AMS).

Download location
Maybe I am blind but I cannot find a download link for the actual version of MobileServices.Web.js script. WinJs versions are easily available via nuget. Digging into tutorials I have found a location with all Web versions. Comparing it with nuget page, the latest available Web version is 1.2.2 (although WinJs is already at 1.2.3):
http://ajax.aspnetcdn.com/ajax/mobileservices/MobileServices.Web-1.2.2.min.js

Update 24.1.2015: There is a changelog from which you can see that at this moment, the latest Javascript SDK is 1.2.5.

Shared Scripts
There is a possibility to manually upload a script to shared folder on server:
azure mobile script upload shared/helper.js
You can then use it via require in other scripts:
var helper = require('../shared/helper');

Unfortunately, the scripts there are periodically deleted. It turned out that manual upload is not supported and you can only use this feature if you enable source control. See more info on forum.

Stuck scheduled tasks
I've managed to get the scheduled AMS task into state where it ignored all my updates. It ran the old script version no matter what I did. Even deleting the task and creating a new one with different name did not help. I asked on forum and got quickly an advice how to work around the issue by restarting the whole mobile services.

Wrong authorization
Although I require an authorized user when accessing all AMS endpoints, it took me a while to spot an error in my implementation. I did not refuse cashier operations from unknown users. You couldn't do it from UI, of course. But any script kiddie could download the POS, get the application id, authorize with any google account, and send as many transactions/shifts as wanted. The fix was simple - only allow server operations to known users.

SQL Tables Initialization
I let AMS to create my tables. It was a mistake. Only when I started to have performance issues with getting report data, I realized what types are used for the data. For example, as JavaScript does not have integers but only floats, all integer values were stored in float columns. Not very fast and index-able… I have changed type of some columns but next time I will create proper SQL init script.

SQL query optimizations
I had not written complex SQL query for a long time so the first version of the report query was quite embarrassing. There were too many inner queries and other issues. After a consultation with my friend, I speeded up the query by
  • Use null id trick to select the whole row with max value
  • Use union all instead of union because all sub-queries are distinct
  • Converting float columns to integer columns
  • Index on date column did not help because server still clustered index on __createdAt column which has very similar content to the date columns

Monday, August 4, 2014

Lessons, Surprises, Mistakes II.

Next set of lessons, surprises and mistakes; this time various issues in no particular order.

Hardware issues
We use some cheap Android Sencor Element tablets. From 10 tablets, 2 were faulty and we had to replace them. You can recognize faulty tablet by its freezing. Yet there were also more interesting manifestations (appeared only on one tablet):
  • Time speeding up by minutes per hour. That produced transactions and shifts with wrong time. Automatic time synchronization does not help as I assume it is synchronized only once per day.
  • Touch events were not registered in some areas of the screen or were shifted (touching on one place triggers action from different place).

User feedback
It's always good to observe and talk to your very end users. Based on seeing the POS in action and talking to one cashier, I realized the buttons were too small. The cashier had hard time to aim at the right button. So I've made all buttons and texts as big as possible:


Wrong development browser size
I use Chrome Resolution Test plugin to easily resize browser window to tablet's resolution (1024 x 786). But I made a stupid mistake. The POS web page on tablet spreads across the full tablet screen (minus Android menu bar) while the plugin resizes the whole browser window to the specified size. So page areas were different. To have the same area in browser and on tablet (1024 x 720), the correct resolution for the plugin is 1034 x 826.

Browser zooming
Cashiers had also problem with browser zooming triggered by double tap. For a long time, I did not know how to disable it because I thought it is as a browser functionality and looked for an option in browser. Eventually, I've found out it is a HTML5 meta viewport property and you can turn it off by:
<meta name="viewport" content="user-scalable=0" />

JavaScript caching
I was surprised that browser loads JavaScript files from cache even when you ask to bypass it (e.g. by Ctrl+F5). I have not found a way how to force reloading JavaScript files. As a fix, I append a timestamp to main.js in the release package (and index.css because it can be cached too).

Automatic updates
Uptake of a new version was slow. It was normal that an old version was used several days after the new version was released because page refresh was needed on POS and only company employees could do it. Luckily, there was no critical update needed.

I've implemented automatic updates like this: successful ping request returns the actual server version. When it comes back to a tablet, it means there is internet connection (necessary because of mobile internet - see the previous post). When the server version does not match with the client version and there is no cashier signed in, the web page is refreshed which performs the update. The server version is set manually as part of release deployment.

Friday, August 1, 2014

Lessons, Surprises, Mistakes I. - Internet Connection

This is the first of several posts about my mistakes, surprises and lessons learned. I'm always eager to see other's experiences so I can learn from them. Here are mines so you can learn from me.

Going Live
1st April morning. It kind of works. Most kiosks are online and I see signed in cashiers and some transactions. Some kiosks are offline. I guessed internet connection problems.

It was very similar during the next couple of days. There were rare moments when all kiosks were online. Most of the time at least one kiosk was offline. We discussed the cause of the issues and it was clear that the culprit was the internet connection. The mobile internet is ***. It disconnects every now and then, it drops packets, it is sloooooow. Especially in the kiosk which is a big metal box. I had to mitigate the problems.

Status emails
First step was to have better visibility of the problems. Having status page with kiosks overview was nice but it was not enough - nobody periodically checked it. That's why I've implemented status emails. AMS offers scheduled jobs and I use them to periodically check kiosk statuses and send warning email with problematic kiosks. Initially, the check was every 30 minutes but there were too many emails and nobody cared about them. Currently, the status is checked once per hour during the opening hours and a warning is sent only when kiosk does not ping for more than one hour.

Automatic router restarts
We've discovered the biggest issue in routers. They disconnect from the internet quite often. Luckily for us, routers have monitoring functionality. They can ping specified servers and if no reply comes back, they restart themselves. Once turned on (with 30 minutes threshold), the internet reliability increased dramatically.

Inability to sign in/out
Pages in Durandal are stored in separate files. That means there has to be internet connection to successfully navigate to another page (e.g. starting/ending shift). I thought (but not tested) that browser would cache all pages. But I was wrong and with such bad internet connection, some cashiers could not sign in or out - the screen just turned white because the target page could not be loaded.

The solution is to build the whole app into three files (index.html, index.css, and main.js) which are loaded at once during POS initialization. To do that, I use grunt-durandal and grunt-uncss tasks when creating release package. The only communication then is sending data to server which is backed up in local storage.

Double transactions
Another sign of bad internet connection is double transactions. POS sends data, data is written into database, but the confirmation is lost on a way back and POS sends data again. I have actually seen several triple transactions in the database. To solve this issue, I have added a check before inserting a new data into database.

It did not fully mitigated the issue though. I'm still seeing some duplicate transactions. AMS provides createdAt column and according to this column these duplicate transactions are usually milliseconds apart or even at the same time. I have no idea how these duplicate transactions are created. It can be strange network behavior, bug in AMS…

Success transaction visibility
This is a small tweak. POS displays whether the last server operation was successful or not. It's not for cashiers because they don't care. It's mainly for company employees when they check the kiosk on site to know that everything is OK.

Summary
After implementing above fixes, the reliability is quite good. It is rare to see offline kiosk. There is still some work I'd like to do though. For example, I do not properly handle authentication token expiration so somebody has to manually re-login kiosks every ~30 days otherwise they would be offline because of missing authentication.

Tuesday, July 29, 2014

POS Architecture

The initial design proved itself to be working nicely so I've stuck to it:
  • Android tablet in each kiosk connected to the Internet via 3G Wi-Fi router. Tablets with 3G modem are more expensive and every kiosk had already been equipped with 3G Wi-Fi router because of the previous system.
  • HTML SPA (Single Page Application) as client POS application running in the browser on the tablet
  • JavaScript Azure Mobile Services (AMS) with SQL database as server
The software architecture is guided by using Durandal (MVVM design pattern) on the client and AMS on the server. I decided to use Durandal because I like the testability of MVVM applications and AMS because of its easy use.

POS as web page
The whole POS software runs as a HTML5 application in full screen browser. Cashiers cannot use other apps on the tablet because the special tablet holder hides main Android menu, all hardware buttons and prevents edge swiping which is the only way how to control the browser. Here is the picture of the real device:

POS

Authentication
All AMS endpoints require authenticated user. AMS supports several authentication providers. I've chosen Google provider as all company employees have Google account. Special Google account has been created for every kiosk. The kiosk account is signed in in the POS browser all the time so no cashier needs to know its password.

Authorization
There are only two groups of users:
  1. Cashiers
    They can only insert data (transactions, shifts)
    . Every cashier has assinged a PIN which is used to start her shift.
  2. Company employees
    Additionally to cashiers, they can also edit some data (cashier management) and read the data (reports). As there are only 3 employees at this moment, their Google ids are hardcoded in scripts.

Offline mode
Because kiosks are connected to the Internet via mobile network, I expected they can be offline from time to time. That’s why every operation is first saved to browser local storage and only when server confirms it is saved in the database, it is deleted from the local storage. When confirmation comes, then next unsaved operation is sent to the server until the queue is empty. Operations are sent one by one to not trigger multiple requests at the same time (imagine sending all transactions from one day at once).

Operations
As the launch day approached, I started to wonder/worry how to monitor the health of the kiosks. There is no way to connect to them from the Internet and check the POS status. So I did it other way round. Every POS pings the server in specified interval (10 minutes) and reports its status (signed in cashier, locally stored operations, POS version). Company employees and I have access to the web page which displays statuses of all kiosks. The most important information is whether kiosk is alive (pinging) or dead. Repair is simple - restart tablet and/or router.

Release package
Grunt task for creating release package contains steps to uglify JavaScript and uncss/mincss (uglify for CSS rules). Initially, the Durandal pages were separate files even in the release package but I had to change it (more on that later). Now, the whole POS consists of three files: index.html, index.css, and main.js.

Sunday, July 27, 2014

Development Environment

I took this project as an opportunity to learn recent web development techniques. This is the list of buzz words I've used:

First, I used Bower to download Durandal and Bootstrap and prepared some basic Grunt configuration (run web server for development). Then, I dived into Durandal and learned how to structure the application. I tried to scaffold the app structure with existing Durandal's Grunt configuration but eventually, I created everything from scratch using the scaffolding structure as a tutorial. One learns more with this approach.

Having basic app structure (2 pages wired together) and grunt configuration, I created my first Jasmine test. I wanted to practice TDD for development but I found out that it does not work well for me. I did not have any clear vision how things would work and I did a lot of changes during the development. That's why I have just classic unit tests. Once I am satisfied with a feature, I create unit tests for it to make sure I won't break it in future.

Although Durandal uses AMD, I worried about referencing source files from tests. The research, experimentation and creating Grunt configuration took me some time. But it works nicely in the end. Grunt's Jasmine task references two require.config.js files. The first one is the main application require.config.js file and the second one is Jasmine's one with baseUrl pointing to application's folder. All unit tests run when any of my JavaScript files change.

The last thing to figure out was how to test SQL queries. I set it up only later when I was developing the report and it was obvious that with manual testing I would not be able to cover all the corner cases properly. I use T.S.T. and I set it up like unit tests. I have special query file which contains just the SQL query. Using simple script, this query is inserted into production and testing JavaScript template files. If any of these files change, tests are triggered. Of course, there is a Grunt task for firing it up.

Note at the end: web technologies change rapidly. Today, instead of Durandal, I might use just Knockout 3 based on Steve Sanderson NDC presentation:  Architecting large Single Page Applications with Knockout.js.

Saturday, July 26, 2014

POS for Ice-cream Kiosks

I have a neighbor who sells ice cream. He owns several kiosks in various towns around the country.  Two years ago we talked and he asked how hard would be to develop a simple POS (Point Of Sale) for him. He was very unsatisfied with his solution. He had a general cash register from which he had to download all transactions every day otherwise they would disappear. It had functionality he did not need but also lacked some functionality he needed. And it was expensive.

I worked for Radiant Systems so I know a thing or two about POS systems. This one is very simple. All it has to do is to send transactions to the server. The items and prices are fixed during the year which means there is no inventory management. The only changing variables are cashiers but their management is simple - we need to track just name, pin and kiosk where the cashier work. Reporting consists of one report with sold items and shift durations. Overall, quite simple functionality, right?

Then I started thinking about the server requirements which is a grey area for me. I don't know enough about it, especially about operations. I played with Node in Azure VM a little bit but I was not sure enough. The project went to sleep at the end of the year 2012…

To be resumed a year later. My neighbor was more angry about the cash registers. And there was a new feature in Azure - the Mobile Services. I don't need to deal with the server details any more. Exactly what I wanted. So in the autumn 2013, we agreed on an attempt to implement it and I started to work on a proof of concept.

Following the KISS principle, the initial design was as follows:
  • Android tablet in each kiosk connected to the Internet via 3G modem
  • HTML SPA (Single Page Application) as client POS application running in the browser on the tablet
  • JavaScript Azure Mobile Services with SQL database as server

I prototyped an initial solution during the rest of the year 2013. Three screens (sign in, cash box, report), all in one big awful HTML page. But that is what prototypes are for - to see whether it can work, then throw it away and do it properly. Here is the main screen from the prototype:


We tested it on a tablet and it looked good - a green light for my little, fun, real, side project. I had three months to properly develop it during my spare time. Yes, it was optimistic :-)