Maintenance and log management
KioskNet needs to be deployed in areas with little or no other infrastructure.
Therefore, one of our key design goals was to build a system that
could be maintained with the least
possible effort by semi-skilled field technicians.
We also desired a means to cheaply, securely, and reliably monitor both
ferries and kiosks from an NGO central office.
These two features would allow a handful of skilled workers at the central office,
helped by a larger number of field technicians,
to support hundreds or even thousands of kiosks and ferries.
In this section, we describe KioskNet maintenance and monitoring.
Maintenance
Routine software maintenance requires software running on kiosk controllers
to be upgraded and patched from time to time.
To avoid having technicians travel to each kiosk location
to install or upgrade software, we provide a sub-system for centralized
management and maintenance of kiosk controllers.
This mechanism, similar to the Disruption Tolerant Shell,
is described next.
In KioskNet terminology, an update is a zipped and signed file that contains a executable script, the recipients' GUIDs,
a unique sequence number, and all other files that the script needs for execution
(this is similar to a RedHat RPM).
When a KioskNet component receives an update, it first checks the signature. An authentic update is uncompressed in a pre-specified location, and the script is then run with root privilege in a forked shell. When the shell terminates, its sequence number is recorded along with the exit value of the controller script and output logs are submitted to the logging sub-system (described next).
The controller script performs the following steps:
- Checking pre-conditions: The script may check the sequence number records, along with other preconditions.
- Running the main task: In this stage the controller script can use any of the local files in addition to the files shipped with the update.
- Generating short and long logs: KioskNet requires that, updates generate two log files. Short logs are immediately reported back to the central administration, potentially using the SMS control channel, and long logs are treated as normal system logs.
- Returning a status value: The returned status value is recorded along with the sequence number.
Updates can reach KioskNet nodes over one of three channels. The normal DTN/OCMP
mechanical backhaul
channel is the preferred transmission mechanism. When this channel does not work,
the central office can choose to flood updates to all KioskNet nodes.
In rare cases when a node is not reachable using any of these two channels,
a field technician can apply the update using a USB key - on detecting an authenticated USB key,
the controller reads the update on the key and applies it, just as if it had received it over the wireless link.
Logging
KioskNet has been designed to be robust and tolerant to failures.
However, both DTN and OCMP, which are critical software layers, are under active
development. Therefore, software failure is a distinct possibility.
When a failure does occur,
central office technicians require a means to collect and debug system logs
that does not rely on OCMP or DTN. We have, therefore, designed and implemented a
mechanism that floods logs across a disconnected network to the Internet
using opportunistic connections. We call this application log-flood.
Log-flood periodically compresses the contents of \textit{/var/log/},
timestamps it, and
signs it with a sequence number.
It then periodically sends a broadcast ping to detect neighbouring KioskNet
components. When a neighbour is detected they exchange log archives
opportunistically using the standard Unix $rsync$ utility.
For secure transfer, we actually tunnel rsync over ssh using an
ssh key installed by the central office when configuring the KioskNet
component.
Each KioskNet component floods log archives to each other until the files reach
a gateway.
To prevent redundant flooding, the gateway does not flood logs to neighbouring ferries;
it simply forwards log archives to the proxy on the Internet. The
proxy subsequently acknowledges the delivery of each log archive and forwards an
acknowledgement file to the gateway. Acknowledgement files are then transferred
from the gateway to neighbouring ferries, and flooded back across the
disconnected network. When a KioskNet component receives an acknowledgement
file, it deletes the originating log archive. Acknowledgement files eventually
expire on each component. In this way, by mimicking DTN using rsync, we allow
robust log propagation.
|