Author Archive: Eduardo Jezierski
If you buy into the idea that information about your body is ‘owned’ by you, then it’s obvious you should get to have a say about what happens with that data once it leaves your body. Unfortunately, today there is no clear and easy way to express how you want people to use your data. In most cases, you never even get asked.
As a result I see many eHealth project implementers making cavalier decisions about data management that impact the rights and privacy of populations, patients and doctors alike. People grab, share, analyze information they may not have the rights to, sometimes even by accident. There are so many incentives to produce and move data freely: folks could use the data for the promise of big data analysis, research publications, commercial product improvement, for the sakes of efficiency, medical research, marketing, writing grants, or operations analysis. Violating rights and expectations is especially easy as these eHealth projects tend to have a bunch of players involved, each one with their own language, objectives and culture.
Unfortunately, there is a lack of frameworks and common language in which to have a discussion about rights to share and use health data. Academics do IRB reviews, but rarely understand licensing terms. Doctors use eHealth systems but are not information specialists, but typically don’t go beyond clinical and public health use. Private systems may or may not have end user license agreements (EULAs), they impose a one-size-fits-all policy, and nobody reads the EULAs anyway because they are complicated, and each one so is different. Doctors in the USA mention HIPAA, and folks from other countries snicker. Governments roll out an eHealth HIV project and the data ends up in some intern’s laptop in California because he happened to help with the database system. And as you see this information flow rarely involves the consent and opt-in of the population and health providers in whom the population places their trust.
Imagine this situation: a mom shows up at a clinic and gets a diagnostic for her child. Who has which rights on the data? What about the kid? Her mom? What about the doctor who takes the test, the manufacturer of the diagnostics machine, the clinic where the doctor works, the NGO that implemented the diagnostic program, the funder that funded the NGO and bought the diagnostic machine, the government of the country, WHO?
I was recently a part of the annual AAAS Annual Meeting on a panel about Surveillance. It was a good chance to catch up with Nigel Collier from Biocaster and get to hear some poignant questions from Vint Cerf, one of the ‘fathers of the internet’. We had representatives of all sorts of surveillance work from anti-terrorism to meme propagation to infectious disease tracking; and there I presented a sketch of an idea:
What if we created a simple licensing framework that made it clear what rights and constraints go with different bits of your health data as it gets stored, aggregated, and analyzed?
If Creative Commons licensing helps a wide sharing of creative work under predictable terms that respect the intent of the creators; could a “Health Commons” do the same thing for health data? What can we learn from the evolution of sharing of information on the web and apply it to this critical space?
I would like to one day be able to share information about my health on some mobile app, a wellness site, or a diagnostic procedure, and specify that I am sharing it with the following restrictions:
Sometimes I would say it is OK to link the data to my other records, sometimes not: it all depends on the context and what it is that I am sharing. The important thing is that I am in control of data about my health.
Or, conversely, if I am participating in some survey, taking a diagnostic, or going to a new health care provider I would like to know if my data is going to be used with a forced license on it, so I can make an informed decision about whether to actually participate or not.
How would it work?
The idea roughly sketched would be to:
- Treat personal information as data covered under copyright law, with the patient/originator as the original copyright holder.
- Build a licensing scheme that grants explicit rights and restrictions to receivers of that data.
- Make sure the rights and restrictions are termed right so that re-licensing and aggregation have clear and simple rules.
- Embed licensing options into all relevant diagnostic and medical record platforms, as well as wellness websites, social networking sites, and so on.
- Communicate & advocate the framework especially building conscience in the public.
I don’t know if the license example I invented for the example above (linking to other personal information, aggregating, and use for health, science, and commerce) are ‘the right ones’. I would love to hear more ideas for the sort of constraints and freedoms a simple license would allow.
Maybe other terms would be more important. Are there levels of anonymization I could specify for my data in aggregate form? Are there clauses for natural disasters or crisis that would allow me to temporarily bypass privacy concerns in order to help me reunite with my family? The nice thing about the model is that it provides a framework in which to resolve these questions.
The genius of Creative Commons was to choose a few simple rules that would be easy to understand for many, instead of trying to make it a comprehensive license for all cases and preferences; a Health Commons would have to emulate that approach. Each time you see the Creative Commons icon it carries beneath it a smart and legally sound set of terms and licenses.
If anyone feels inclined to develop this further please let me know. The idea needs work from copyright attorneys, IP wonks, IRB data geeks, healthcare providers — and most importantly, anyone in the general population who would like to have a tool like this. I am especially interested in the licensing framework required for safe sharing of personal health information. I have seen “Health Commons” used to describe a knowledge commons with intellectual property such as genetic sequences, but I think much more focus is needed on the incoming tidal wave of integrated personal data from electronic records, sensors, and surveillance.
I think especially large funders and companies who are at the intersection of humanitarian field work and scientific investments need to improve their frameworks to make sure their programs have an ethical approach to protecting rights of their beneficiaries. In the meantime, maybe they should get into the mindset that they are just storing borrowed copyrighted information…
Please leave comments if you have an opinion on the topic.
We use Creative Commons extensively in our work at InSTEDD. Most of our presentations are explicitly licensed under CC BY-NC-SA 3.0 (Attribution, Noncommercial, Share Alike), as is the material of this blog.
Like InSTEDD, Creative Commons is a non-profit organization that can always use your support: consider donating to them here.
At InSTEDD right now we are involved on a wide spectrum of projects. Many of our projects are short and grassroots-driven, such as our recent work with UNICEF using aerial photography to map environmental vulnerabilities in the slums of Rio de Janiero (see blog from our iLab Latin America: Part 1 and Part 2). In addition, many of our projects are longer and more complex as we help Ministries of Health or large NGOs lay the technology foundation that helps them meet better serve their country over the long run, such as our work in Cambodia and Rwanda.
These latter projects require more patience to see the impact and are sometimes more difficult to work on due to the time lines and the large number of stakeholders and interests involved. But, when done right, the long-term impact can be transformational.
We are lucky to be involved with Jembi, Regenstrief Institute and others on implementing such a project in Rwanda, under the leadership of Dr. Richard Gakuba, who works for the Ministry of Health as the National eHealth Coordinator. Dr. Gakuba is leading the charge transforming Rwanda’s Health Information System. A big part of this modernization is the implementation of shared health records, terminology services, and facility and provider registries. When this phase of the project is done, Rwanda will have a variety of independent, but interoperating, web services that implement these capabilities. It may sound like a 2002 buzzword to call it a “fabric”, but it evokes the right image: a supporting net of independent but inter-woven services.
Having a fabric of services makes a lot of sense in this context, starting with the impact of this architecture pattern on human and organizational dynamics. Distributing the ownership, management and maintenance of different areas of data is appropriate when the organization itself is made of different departments with different workflows, incentives, and management styles. Centralizing all these processes and information into a monolithic block would cause a collapse as the only entity able to change things would rapidly become a bottleneck. Just the provisioning and maintenance of one big system can be an insurmountable obstacle. Modularity allows piecemeal evolution.
Having a “fabric” of services has many advantages:
Creating an Ecosystem — services and their data can get used by others to create new “value-add services”. For example, a facility registry could be extended with a call-in system to let immigrants get directions to nearby clinic in their own dialects. Enabling these value-add services (including mashups and client apps) allows the ecosystem of users (the general population, the local NGOs, the international ICT community, national and international private sector) to act like a gap-finder for better health services and businesses.
Potential for Big Data — Services provide a quantum leap over the traditional approach of keeping excel spreadsheets, access databases, and ad-hoc CSVs for research and retrospective data analysis. Data can be stored and versioned appropriately therefore simplifying future retrospective analysis. Taking advantage of these datasets comes with many challenges, however. For example, most countries do not have or enforce a framework where people opt-in to having their data stored, shared, and/or analyzed for research or commercial purposes. Rwanda has taken what I consider to be a great stance: All data that will be kept centrally about individuals has to be approved ‘column by column’ (using a relational metaphor). The data can be used only with scientific journal backed evidence that the information can be used to improve health outcomes within an actual project/program to do so. Notice this may turn some big data fans pale (“What, you are not saving everything centrally and then figuring out what to do with it?”), but I think it is a smarter place to start.
Steps to Open Government - While modest, the decision to have public data available as web services on the internet can be a milestone towards “open gov”. Opening up government data increases accountability, trust, and feedback loops. Many governments would probably prefer to pay lip service to open data principles rather than embrace them; but there are so many benefits to doing so in the health sector that it may be a great place to start.
What are the services that could exist in such a fabric? The theoretical list runs long but here are some examples of the services we are dealing with in the real world:
- Facility Registry: A service to keep track of facilities, their admin information, the health services they provide, and data about their catchment population.
- Vital Registration: Service to keep track of births, deaths and Health ID management for the population (Note that a different ID assignment authority is needed: the United State’s practice of using a unified social security number for financial ID, health ID and immigration purposes is considered ‘bad practice’ by modern standards, and I am happy where we are working with people who are staying away from unified IDs).
- Shared Health Record: Services that keep track of individual people’s health data over time.
- Provider Registry: A service that tracks the institutions and individuals who are licensed to work in the health system. This can be enormously important for HR, education, and performance-based-financing work. Having a current provider registry also is a foundation for maintaining privacy.
- Terminology Registry: Services that collect, map and standardize the meaning of different words and fields. This makes it easier to see if the “blood pressure” field used in system A can be equated to “blood pressure” in system B (If the blood pressure is taken in different parts of the body in different conditions, the data semantics are different, regardless of the shared label).
- mHealth SMS, Voice and USSD gateway: Having these help consolidate agreements with operators & aggregators and provide a simpler way to manage collections of mHealth initiatives.
Of course, having many services makes it necessary to have better federated authentication/authorization capabilities (unless you want users to forget 10 passwords instead of only one) and to have some external services that act as controllers/orchestrators for complex multi-step operations (for example, someone dying or moving may trigger a cascade of operations on all the services above).
To be good citizens of the fabric, the services have to play well with each other. Here are some expectations:
- Master vs Reference Data: Each service has clear ownership of master data versus what is reference, externally managed data which may be continuously updated from some external system.
- Accommodating Dynamic Changes: Services-especially the registries and shared health record — have to accommodate dynamic changes in information schemas and uses over time; and provide a good long-term versioning strategy.
- Updates and Queries: Services must expose a REST API or equivalent endpoints for state updates and queries, as well as expose a stream-of-events API (e.g. Atom/RSS feeds with some pingback/notification mechanism to allow other services to adjust themselves to the changes in real time or in batch mode).
- Compatibility: Services share compatible approaches to authentication/authorization/auditing and other crosscutting aspects.
For the work we are doing, Rwanda has chosen use cases in maternal-child health as the ‘red thread’ that will drive priorities in the project. Part of their well thought out strategy is to keep governance over the technology but rely on local and international partners to help build the technology instead of having an in-house dev shop within the Ministry of health.
InSTEDD contributes the Facility Registry
Rwanda is currently evaluating good starting points for the services discussed above. For the Facility Registry services, Rwanda is evaluating Resource Map, an InSTEDD tool that was originally developed in 2009 by our iLab Southeast Asia.
Resource Map evolved to help people make better use of their data. Data that isn’t used is stale, and stale data isn’t used. We have seen lots of projects and facilities collect and forget about the data as soon as its reliability became suspect. Tens of thousands of dollars per country are spent every year on collecting information that could have simply received minor updates from previous versions.
Originally, we called the tool ‘Dynamic Resource Map’ to emphasize the dynamic nature of the tool. We wanted to ensure that the tool supported making the data operational, not obsolete. Some key aspects of the tool include the ability to define your own layers, with ‘points’ or resources or reports on those layers (which are shown as different fields). The tool also has query and update features that can be done through SMS and smartphones (using Open Data Kit). The tool is designed to manage a resource database that happens to have a ‘geo’ component to it which adds critical behaviors on top of the typical alternatives of having semantic-less spreadsheets or generic GIS tools.
A real-world example of the value of the tool is its ability to track stocks of supplies for Malaria treatment at the health center level. The simplicity of being able to just text in your current stock and have it automatically trigger an alert to the folks in the capital that will send you more medicines is invaluable.
Seeing the individual or clustered facilities with your own icons or alerts based on your rules can give you a real-time operational picture that otherwise would be impossible to visualize. Other uses include tracking of information about water quality measured periodically at different pumps, and also it is useful for project tracking and monitoring & evaluation (M&E) data gathering.
Data that isn’t used is stale, and stale data isn’t used.
As with any InSTEDD tool since 2007, we took the perspective of providing a cloud ‘product’ that is generic and usable worldwide, that each user can configure to their needs. For example, folks can add their own fields, manage their own user permissions, and have total freedom to import and export their own data, etc. And with the APIs and import/export features, people can move the data to spreadsheets or to more specific tools like ArcGIS or GeoCommons as needed.
As we work with the Rwanda Ministry of Health on their eHealth foundation, our Resource Map tool will evolve to incorporate the experiences and feedback from the people using the tools. We will use their stories to maximize the benefit to the global eHealth/ICT4D community as we develop new versions over the upcoming months. In addition, we have started engaging with other amazing implementation partners so that this work can be incorporated into the shared commons of technology. We are excited about the changes that this initiative is already bringing in the front of APIs as well.
Back to Rwanda
He warns about making implementations of health programs revolve around evaluations (instead of make evaluations revolve around implementations) and gives an idea of the progress Rwanda has made in Maternal/Child Health, and how eHealth can help further MDG goals and health delivery.
He also reinforces the importance of starting small and working directly with your users as much as possible. Dr. Gakuba also recommends deploying simpler, smaller parts in a sequence versus going after large, complex technologies from the onset. Fortunately InSTEDD’s services-based approach is a perfect fit for that strategy, which allows the Ministry of Health to stay focused on priorities and gives a more iterative, agile frame to the project.
In the past, we’ve blogged about some of the many lessons we’ve learned when it comes to working on projects that require some form of data collection. As a follow up, we’ve added more of what we’ve learned in the hopes that it will help you better deal with similar situations. Read more »
Since our tools are open source and can be used from any location, we saw that people from New York to Bahrain had discovered that when you design for a constrained environment, the result is simple enough that it’s applicable anywhere else, and ready to roll as soon as the need appears. We intentionally combined our humanitarian mission with smart business acumen so that we could set our first iLab on track to scale and become a financially independent social enterprise Read more »
As January 12th approaches and with it the anniversary of the tragic Haiti earthquake, I have some hopes:
- That everyone in Haiti has a moment of respite and peace to focus on their loved ones, the ones that are here and the ones that are gone.
- That folks consider the Haitian’s perspective over the international community’s perspective & party line. Thanks @mediahacker for your constant voice.
- That folks have respect for the pain in Haiti over trying to celebrate an overall immaterial progress in reconstruction. This post was particularly touching for me.
- Finally, I hope that more attention gets put to the dynamics in and around Haiti on January 11th, than to the international response on and after the 12th. What would a stronger, more resilient Haiti have looked like on the 11th to start with?
The “eHealth” space (which obviously includes the mobile, mHealth aspects), is a bit too chaotic from the perspective of a common developing country. Imagine you are responsible for ICT (Information and Communication Technology) of a ministry of health or hospital wanting to modernize to improve patient outcomes or disease detection. Where do you start? What could work and what won’t, for you? What is reliable? What is the fine print?
Unfortunately, this is not just because of a rapid pace of innovation in technology, or the extreme conditions in which these health solutions have to exist.
Some of the confusion is –unwillingly– created and perpetuated by the same organizations that are trying to help in the space. This includes international organizations, academia, NGOs, funders, open technology groups, private tech vendors, etc. Types of issues I’ve run into first-hand include:
- Academic projects that collect data with preference towards information that will help to publish a paper rather than the information that will be the most actionable or help community health the most. The project rarely fits in with other technologies already deployed.
- Funders that sponsor the construction of specialized, one-off, disease-specific systems, that are built from scratch even if architecturally they are the same as other specialized, one-off, disease-specific projects.
- Technology vendors fostering ‘data sharing’ projects where the data ends up shared but, unfortunately, ‘owned’ by the vendor.
- Open technology projects that would rather accrete features or add cool gizmos that attract users into a do-it-all system rather than open up information and let the data flow around to other applications.
- Groups that would rather implement anything new, now, regardless of what already works, than to help a developing country figure out what they really need.
Some of organizations are fortunately waking up to these issues and starting initiatives to reduce their occurrence. A key component of these initiatives is to bringing in an architectural approach to the evaluation, planning, implementation and assessment of ICT needs. And fortunately these organizations have people that both know the problem space and have worked as architects in other contexts.
By an ‘architectural approach’ I mean an approach that:
- Separates the discussions of capability from implementation. e.g. a medical record system is a capability a hospital needs, OpenMRS or OpenVISTA are two implementation alternatives that could fulfill that need.
- Understands the role of standards in supporting interoperable building blocks that can evolve over time, not as an end in of itself.
- Helps transition the end goals, requirements and capabilities of the overall health system — the ‘business’ architecture — into ‘technology’, ‘integration’ and ‘infrastructure’ architectures that only exist to support the end goals.
- Navigates the tension between the potential benefits of centralized, top-down decision making around ICT versus the potential benefits of decentralized, bottoms-up decision making.
What would it look like from the perspective of an implementer if the eHealth/mHealth community took such an approach? Here are some things you could imagine:
- You would get something like a capability map, a set of boxes with labels and lines that describe common elements of an eHealth countrywide health information system (HIS), including capabilities such as medical records, biosurveillance, pharmacy stock management, etc.
- You would be able to write on this map which capabilities you have implemented (digitally or not), and for each capability get some performance metrics that can help you rank its effectiveness. For example, a biosurveillance component would assess the timeliness and completeness of reports. Your capability maps would help you do an assessment against this metrics, letting you see your maturity, and your weak spots. This assessment by itself is a huge asset for a country and funders, as it lets you understand the landscape before you aim your efforts.
- Using the same taxonomy of capabilities, a technology team should be able to find open source solutions, papers, and case studies that describe if/how the capability can be improved. Ideally, these case studies should roll up to a community-maintained pattern library, that describes the distilled “solutions to a problem in a context” that have been discovered previously.
- Any improvements can be measured over time and pilots can be assessed objectively as to how much they contribute to the goals of the country (currently, organizations running pilots set up their own measures and they aren’t always traceable to the measures a host country cares about).
- Funders could work together helping implement solutions that work together and not on a per-project, per-disease basis.
- Finally, any local innovations could be tracked and published against that map, helping discovery by others wanting to implement it elsewhere, contribute code, etc. Assisting the discovery and amplification of bottom-up ideas is critical as the eHealth space is very much giving its first steps.
So an architectural approach makes it easier to implement, build and fund technology for eHealth. So let’s look at what holds this space back and some potential issues that may crop up by rushing in.
Pitfalls of an architectural approach
These pitfalls are not inherent to any and all architecture efforts, rather, they are risks that can be managed and mitigated. I am sharing them because I’ve seen these sap energy out of what otherwise could have been a great contribution:
I hope this doesn’t sound as complaining. Rather, I am proactively sharing experience for which I have first-hand scars, after having worked in the enterprise architecture space for many years. Actually I’ve been coming back and again the idea of drafting a book on technology patterns for developing countries to share this, but would like to make it a collaborative effort. It is simpler to point out pitfalls than to steer a course that avoids them, but that was not the point of this post. Also, any architecture is a starting point, not an endgame that does the decision-making job for you: it is place from which to begin the conversations. Even with the best architecture efforts, the responsibility of coming up with the right solutions is with the implementers.
The landscape is improving
Here are some efforts I like because I think they are taking the right steps to creating long-lasting value. If you know of other relevant initiatives please feel free to add comments below
HMN is a multilateral effort supported by funders, WHO and many organizations to define and help implement a framework for health information systems.
OASIS: Chris Seebregts and others have been putting together an effort called OASIS to help contribute to this space. I haven’t seen much official content about OASIS yet, but knowing Chris and his deep experience in the field I know that he is likely to endorse things that really work, and has direct access to the ‘proven practices’ in his work on OpenMRS and other technology efforts in Africa.
(This is not to be confused with the well-known OASIS consortium http://www.oasis-open.org/ which has IBM, Microsoft, Oracle and Sun as founding members)
- HMN’s framework
- Taha’s blog (Taha is quietly helping in some continent-wide health system integration efforts, and has a lot of experience in this area)
- Christopher Alexander and The Timeless Way of Building introduced patterns and pattern languages to describe what would otherwise be a complex, multidimensional knowledge base of architectural approaches to building homes.
- One of Chris Seebregts’s latest presentations on SlideShare.
- Sketching User Experiences about the role of design and how it relates to successes in technology .
Inspired by the Wired article “Scientists Hack Cellphone to Analyze Blood, Detect Disease, Help Developing Nations” by Dave Bullock there has been a lot of activity under the change.org post “The Cellphone that could change the world” by Nathaniel Whittemore.
Nate’s post takes on a ‘remember the future’ approach where he fast-forwards to 2011, and paints a scenario where mobile technologies are widely deployed and used. I really like that approach to visualizing possibility, and wished it was used more as a social activity. Strong Angel and Superstruct do this too, in a way. The realm of the imaginable could be further expanded by more science fiction about community and civilization resilience (This year I enjoyed reading Kim Stanely Robinson’s fiction books about the onset of sudden climate change and the response of a “fictionalized NSF” and a US govt that isn’t afraid to change). But I digress. I liked Nate’s post and the ideas there. The comments were riveting.
Katrin urged me to engage in the discussion at change.org. Reading through the original post and then through the comments (with a lot of ‘strong players’ from the mobile applications community), a couple of thoughts emerged about the state of mobile technology applications for health and other social purposes. Here are some.
In the future…where are the business models?
If you are curious, here is the reality today: In June the week before the elections I visited Zimbabwe. Here you can see a real, resilient, working Guava machine for CD4 counts on the outskirts of Harare. It uses microfluidic technology (for smaller blood samples and reactant costs) and if I recall correctly the operating principle is the same as the phone above which is a tested technology. The thing is solid, and the staff deemed it highly reliable. Calibration was not an issue. They were able to multiply the amount of CD4 lab counts manifold to 300+ per day. I was there discussing the possibility to link to the lab record system, but it wasn’t the highest priority.
A lot of the discussion did center around how disruptive it would be to have an open platform (open hardware, open software, open assays, open IP on the test methods, open reactant formulas and manufacturing) for these tests.
Just as a $99 iPhone is a red herring for the phone network costs you are going to pay every year, a cheaper test sensor that becomes widely deployed and relies on proprietary reactants has a hidden, more insidious cost.
I did not check what are the assays or lab system used by the LUCAS phone in the Wired article, and whether they are open. I just was surprised this dimension wasn’t part of the interview. I encourage Ozcan from UCLA to open-source the hardware specification to allow others to build on it!
Question: When you plug something in do you say “I’m using electricity” or “I’m using the wall socket”? Sometimes I feel the discussion about innovation in mobile tech sounds like a discussion of innovation in energy…where the discussion centers on the design of plugs & sockets. A phone is just a conduit to a network, and a powerful, sensor-rich, user-friendly device can be underused as a collaboration tool that help people work better together if network reliability and costs are not managed in unison.
In my 2011, I hope that there are hybrid social-enterprise efforts that can make inroads to working with wireless providers and carriers. They need to evolve their offerings and provide the types of cost structures needed for health and social good to scale and not depend on infusion of donations to keep running OR pushing costs where they can’t be paid while willing customers cant spend their money. Even just helping providers make money differently would help a lot. Examples: toll-free-SMS? Free-to-send? Free-to-receive? Mobile banking? Shared-costs billing? Provider-supplied location tracking of registered gov’t health staff? Anonimized tracking of random individuals for disease migration modeling? it goes on.. Providers could make more money (gasp!) and they don’t.
Beyond 2011 I hope more effort gets put into creating connectivity approaches that would be disruptive to current wireless systems. And I mean the “system” of government spectrum licensing + carriers + wireless providers + device manufacturers. But who would fund this research? Sigh…we need smaller, personal, cheaper GSM ‘towers’ that can be linked up more than phones. What would happen if every smartphone could host a 802.13 ‘peer’ network?
Centralized or Distributed mobile apps? There are no ‘best’ practices…
There are only proven practices, in context.
When evaluating whether an approach fits a new situation, you have to consider the context in which other solutions succeeded or failed. I face this all the time in the discussion of ‘centralized’ versus ‘individual’ mobile solutions. Sometimes I get asked which approach is better and the answer is a) it depends b) you want both, not either/or.
The centralized approach uses national or international-scale gateways, like Ushahidi with Clickatell, RapidSMS, InSTEDD GeoChat with Clickatell and BT, and so on. These are appropriate for national-scale programs, where reliability, performance security and availability of certain types are provided.
FrontlineSMS is the archetypical individual or grassroots approach, where a phone attached to a computer acts as a gateway where you control costs, numbers, location, etc. – providing different types of reliability, performance, security and availability for different contexts. This type of ‘individual’ solution can even run in a smartphone, and FrontlineSMS and other projects are already proposing such a migration. For GeoChat, we put it on the backlog until we saw more demand for this approach from our Asia programs.
Approaches like RapidSMS which rely on an Asterisk server can also work on a laptop, or on a server, and can help span a ‘middle ground’ between other solutions.
Scalability is important, but, I see discussions of scalability center around numbers of messages and numbers of registered users which is for most cases profoundly irrelevant. Again, scalability is context-specific; and measured by how well you grow with your users’ needs.
I know a chap –I consider him a hero– who spends most of his month travelling rural Cambodia supporting a national program to send data via SMS using plugged-phone installations. Imagine it: phones with locked enclosures get forced and misused, SIM cards swapped, chargers that burn out, USB drivers that fail, phones that lock up…Support costs of a site are his scalability denominator. For GeoChat, for example, our main scalability metric is latency of roundtrip messages under sustained use (like twitter, responses have to come out fast) across all channels (SMS, email, twitter) under large number of group users and groups.
But why one approach or the other?
Some applications support both centralized and decentralized models (like GeoChat) but as we start working together in this budding mobile community it makes sense to pool efforts and re-use each others’ technologies. I don’t see why InSTEDD for example should build yet-another-phone-detection-and-driver layer if other “social good” applications have it. For example, FrontlineSMS can forward messages on to Ushahidi (acting as a local gateway). We will take a similar approach with InSTEDD and should be emulated by the rest of the community. By working on common protocols all our apps could forward messages to each other as required (see this example as a working draft from the Open Mobile Consortium Katrin mentioned) (And Ken, if you are reading this, contributing to FrontlineSMS source was on my last years’ resolutions, and now that we got access to the source code we can really start work on integrating/implementing it with GeoChat, Mesh4x, etc… I’m optimistic about ‘09)!
The goal is to be able to pick the right tool for the context, and all the applications mentioned above are already working on protocols that would let you have a hybrid deployment that would allow you to scale up or out as needed. As contexts change, having freedom to evolve your app and not be locked into one or another is key.
Once you are moving messages around, how do you make sure different applications interpret the information in similar ways?
Shared formats for data exchange
To achieve interoperability, and reuse the human capital of having trained users, mobile apps should also share conventions on what gets put IN the messages. There is a huge gap in defining what gets put on SMS messages for diverse uses:
- Free text, with specified language
- Free text with explicit tags
- Locations (lat/long, place names, village PCodes etc)
- Delimited data (e.g. Ed, Jezierski, Cambodia)
- Self Describing Data (e.g. firstn:Ed|lastn=Jez|city=Seattle
- Multi-Message batching, sequenced or order-agnostic
- Message batch retries
- and the list goes on…
The community of builders of mobile apps for social purposes has to start catching up in this space. I suggest re-using the leadership of twitter and other services in evolving some conventions (eg @user, #tag) in common ways where applicable. I would also like e.g. Nokia’s data gathering solutions and other industry players e.g. Google to participate in the open forum, too.
For example, In the Cambodian Avian Influenza hotline pilot we implement batching and self-describing data over SMS. We should get together with RapidSMS, and define a common approach. This would let the Cambodian government switch out InSTEDDs backend and put RapidSMS transparently, if they chose to do so.
One example of this is GeoChat + JavaROSA. We want to support JavaROSA front ends to send structured data to GeoChat, and if we documented the format well, other clients (like Nokia’s?) or servers could be used interchangeably.
JavaROSA is an excellent open source project, great technology and well run. We have contributed the ability to do 2-way sync between phones and between a phone and a server, already.
Even with these agreements interoperability can also lead to a shallow openness, where applications work with others… as long as they can continue to hoard the data and lock-in users. You can see this happening over the last year in the space of social networking technologies, where many announcements of open approaches veil an underlying strategy of trying to become the ‘hub’ or the ‘one stop shop’.
Do the benefited populations really gain much if folks can collect more data, but we they can’t move it around?
We all know the limits to sharing data are political or incentive-based, more than technical. But technology makes a fine excuse for not sharing information.
In the field one faces many silos – NGOs with different mandates, Government agencies with different domains (animal health, human health), research programs funded by different ivy league universities, not to mention ethnic, language and country borders.
This is an area where InSTEDD has been doing a lot of work as part of the Mesh4x project, which basically allows data to be shared two-ways between disparate systems.
Here are some latest updates
- For Geeks: Progress on Mesh4x: Cloud Services, Architecture, Adapters, and Adopters – here you can see how Frontline would play with Mesh4x.
- Mesh4x goes mobile with JavaROSA, allows you to sync data on your handset with no Internet
The goal: An Open & Sustainable Platform for the end users
Ken uses the \o/ logo for FrontlineSMS, a gesture of empowerment. I smile every time I see it.
We can’t forget that all these technology efforts are trying to empower individuals and organizations, and simplify the work of caring for one’s own community or for others.
All the teams mentioned here are working together already in different capacities towards this end goal. Resources, timelines, tools are always an issue, but over time things will be more integrated.
All the technologies mentioned here are converging towards a shared architecture –a platform for data exchange and collaboration built around mobile users in the harshest environments. A platform that can start small and grow transparently, or start large and continue running even if the centralized networks are unavailable. Because of this shared architecture, the end portfolio will be stronger, dollars spent on technology will go further, and users will have a simpler entry point to learn what are the right tools for their context.
So when a new phone comes out with a CD4 blood cell sensor, its users will know that it can send its data and “it just works”…and then go change the world one CD4 test at a time!
As the year wraps to an end we have a mixed blessing: On one side we have a small but growing portfolio of technology stemming from our organization’s immediate goals to improve disease detection and public health in South East Asia, being built at a steady pace by our small but ultra-capable team. On the other hand, the scenarios we are addressing are proving to be relevant in all walks of life of the health and humanitarian space, generating an increasing demand and with it, a simultaneous increase in breadth and depth on the demand side. Exciting times indeed!
“The goal of mesh4x is to provide a portfolio of libraries, tools and applications that simplify using standards-based data meshes from multiple platforms and languages…”
The libraries can be used right away by developers who integrate them in their own applications, so there was no need for them to wait for a more packaged set of user interfaces and end to end experiences.
Why it matters and why InSTEDD is working on this
Data meshes have appealing characteristics for our users, so our contributions to the Mesh4x project are driven by observed data-sharing needs in the health and humanitarian space.
- Symmetrical: They allow data to exist in a concurrent multi-master environment where updates can be applied at any node in the mesh.
- Asynchronous: They allow offline updates to information and synchronization with other nodes without requiring data locks, essential for occasionally connected applications.
- Dynamic: The synchronization can happen even in constantly changing connectivity topologies. I can sync with a server and later the sync can be done between my client and another client, who could then sync with another server if the first one isn’t there, and so on.
This matters to us as these characteristics help information flow and data sharing even in the tough contexts we face:
- Symmetrical: No organization or application has, de-facto, greater control over information than any other. Symmetry allows power to be shared equally amongst partners, in a true multi-master way, resulting in less hoarding of live data.
- Asynchronous: Connectivity is an occasional luxury, and the most up to date information is found where it is less likely to have a connection. Storing changes locally and sharing them opportunistically keeps information moving.
- Dynamic: Connections are opportunistic – you may not have Internet access at all, but you have access to local wifi networks, physical contact with other devices, etc. Data will eventually get to the desired endpoints as it leaps opportunistically between participants.
Some concrete applications of mesh4x in the space:
I have another blog post I should release soon that highlights the proven value of meshes and Groove in the humanitarian space, and my personal introduction to the uses of this architectural pattern.
But this post is about the progress & directions for the project.
In the last post we mentioned building a cloud based services as a contribution to the space. The demand was for an always-online, cheap to host, simple server that could act as a storage of data and as a relay point for devices connected to the Internet.
The implementation was embarrassingly simple on Amazon’s Elastic Compute Cloud (EC2, a dynamic and virtualized hosting environment) and S3. As a matter of fact, a single Java servlet running on Tomcat + Linux and driving the Java Mesh4 sync libraries (“Mesh4j”) provides the heart of the logic. Less code is the best code!
We are doing a pilot with the Center for Disease control, synchronizing their Microsoft Access-based EpiInfo application, and they asked if the health surveys they were taking could be automatically geo-mapped as the users synchronized to share their information. This led to incorporate an ontology (“schema”) mapping aspect to tell the server “expose a KML feed taking THIS as the title, description, address, and timestamp for the items”
Taha describes the work with CDC on his Biosurveillance 2.0 blog and why using mesh4x will help them extend the effectiveness of EpiInfo for outbreak investigation.
We will be opening this service up progressively as we test it out with initial users and tweak it based on their feedback; I hope in a couple of months to have a tested version we can point you to publicly! In the meantime, contact us if you are interested via email or if you are a developer via the Mesh4x.org code project.
Part of the forcing function for writing this post this week is that we’ve been chatting with CDC, JavaROSA, and others about these store/endpoint/mapping capabilities and I’d rather we start the collaboration early before we accidentally diverge codebases or approaches.
Under the Hood
This is the architecture that the server has been going towards these last couple of weeks:
The GeoChat and a FrontlineSMS bridge would allow message forwarding and sending semistructured data directly in via SMS.
This is the storage layer for all the data and the configuration, security information, etc needed to keep the service running. In our web-based instance, all this data is stored in S3, but if you wanted to host this in your own office or in a clinic, it would all be sitting inside a MySQL instance. As a matter of fact, all the mesh4x services’ information is managed by mesh4x itself, so the actual configuration data is stored via an adapter.
Our service differs from a database in which you don’t need to tell it the schema of your information up front. As a matter of fact, we would like to know as little as possible about the format of your data. We prefer to let applications change and evolve the data they use without having to ask developers to change database structures or write specific code for each case. But knowing just a little about the structure of your data helps with things such as defining mappings and filters, so we try to infer as much as we can. The Ontology Extraction component allows you to submit RDF-formed information (or XForms-based or other any other formats that has a transformer) and we keep track of (for example) what fields make up your entities. If you supply such ontologies yourself (in RDFS, or an XForm Definition)we keep it around, too (e.g. ‘Patient Date of Birth is a Date/Time field’ ).
Internally, we are using RDF as the default standard to represent data and ontologies. RDF has many properties that make it the simplest appropriate choice, but that would be the topic of a whole different post in of itself.
Ontology Mapping allows us to map fields and entities of different ontologies to help us make sense of your data. For example, to do nice map of your data we need a title and a descriptive summary, a position, and a timestamp associated with the entity. Which field should provide the timestamp? Which address or coordinate fields should be used to put an item on the map? How should the description be composed from from the data? Mappers allow us to do this, and in a future through the user interface you will be able to define these yourself.
Filtering is essential in a mesh where little devices and big devices coexist. You could have refugee records for a whole country in one mesh4x mesh, but on a mobile phone you’d probably only want to keep a subset of that. As soon as we expose filters it will be easy for a phone to say ‘I work with patients in village X’ and just sync that subset of data.
Format Transformers are components built to translate data into specific formats. GeoRSS and KML are standard formats for representing information with geographic aspects to them. You can see the KML in Google Earth, for example, and items would appear on the map as people sync their data to the server.
Transformers for XForms Models and XForms form allow us to translate the information of your entities and their ontologies into XForm formats. We see the utility and the pragmatism of XForms models as a way of exchanging records and to define the UI model of the forms users see in XForms, so these transformers allow us to go from our internal RDF-centric representations to these broadly adopted formats.
Finally, you have all this data here, but you probably want to work with it elsewhere! Folks have suggested/requested the following as potential endpoints for the data:
- Google Spreadsheets: we have a Microsoft Excel adapter, so why not a Google spreadsheet one? Imagine creating a form, having it fill out a spreadsheet with gadgets for analytics, and then. Google spreadsheets are also great when lots of people online have to work live on the same data.
- Zoho is coming up with lots of useful applications. Imagine synchronizing your Zoho app with a table in your MySQL or MS-Access database.
- MySQL: a lot of websites out there –for good or for bad– run with their MySQL instance exposed on an open network port. Someone we were working with in Mukdahan, Thailand (a 12-hour truck ride from Bangkok), asked the simple question: if I give you my connection string, can you just put the data there for me? Seemed simple and straightforward, so we will line it up in front of other needs!
Together with running sync adapters we will have to have some user interface to schedule these updates, define mappings between schemas/ontologies, and resolve conflicts. A nice UI for this may end up taking a big pat of the project effort, so if you can reference us to open source projects that do any of this or want to contribute, don’t be shy!
These mappings are part of the mesh too, so in a future (assuming anyone requests InSTEDD or contributes the source) you could be offline and mark an excel spreadsheet as ‘shared’ and when you sync, not only the data would travel back and forth, but the server itself could create a Google spreadsheet endpoint (or something similar) with the same information for others in your team to use!
Putting it all together
In my next post I am explaining how all the pieces of the Mesh4x project come together to help data integration of disparate systems and helping connect these applications into a synthetic whole, instead of having dozens of islands of information.
http://www.cdc.gov/epiinfo/ EpiInfo is CDC’s outbreak investigation surveying tool. You can participate in their Open Source project on CodePlex: http://www.codeplex.com/EpiInfo. We are working with them to enable synchronization over the cloud of their MySQL/Access based tool.
.…And recently had a release, announced hours ago. Congratulations to the CDC team!
After an amazing work on behalf of the volunteers doing the translation from English to Burmese, we have translations for all Sahana strings, which will make a deployment of Sahana in the country reach a broader audience and be useful beyond the few … Read more »
You can define a generic form, load the form definition to any java-enabled cell phone loaded with the JavaROSA forms clients extended with a mesh4x transport component, do data entry in your phone and synchronize the data 2-way with a server or directly peer-to-peer with another phone handset.
The form definitions are saved and exchanged as XForms, and the data as XForm models. The data can be exchanged over http (if the phone users can afford GPRS and have a data connection) or over compressed SMS messages. This can even happen between phones directly — you enter the phone number of another handset running the app and press “sync”. Tondat describes this in detail in his latest blog post. The clients depicted here look awful on the emulators as they use J2ME Polish ( http://www.j2mepolish.org), which then makes the app look great on specific handset models and adapts the UI to the capabilities of each phone.
This extends the scenarios of JavaROSA– from data-entry bringing it closer to a collaboration tool, where the information being entered can be edited by multiple users and shared from a central database back to the phone in the field.
This contrasts with “data collection” pattern of data entry solutions…if you believe information is power, data collection creates a vast vacuum cleaner shifting the balance of power: away from those in the field who understand the data the best and can act on it the soonest, towards the center. But does it need to be this way?
At InSTEDD we look at information flows such as those and ask ourselves: What information should be flowing back to the field? How can the person at A work better with the person at B beyond just sending data? How do we shift from ‘sending’ data to sharing realtime and enriched information two ways? The mesh4x + JavaROSA effort is addresses some of these questions.
This was made possible through a collaboration and set of code contributions we had with the JavaROSA team. JavaROSA is an implementation of OpenROSA which could become a strong player in the mobile data gathering and sharing space in the near future. Kudos to Clayton Sims, Jonathan Jackson, Andreas Kollegger, and everyone else from the javaROSA team for your work and friendly attitude!
The XForm definitions are stored in an http service behind a REST API (http://sync.instedd.org/ which is a strawman of a mesh4x cloud-based service. If you played with our map-sync technology you have already used this service).
Our strategy with mesh4x is to contribute code to existing projects being deployed in the field that need 2-way synchronization, data exchange over SMS, or multi-master storage based on standards. Episurveyor, Gather, Pendragon etc come to mind.
Within this directive, our roadmap on mesh4x will involve effort in four areas:
- Cloud Services
- Data Standards
- Client Applications
- Transformers and Adapters
1. Cloud Services: A scalable server implementation supporting security standards . We have a skeletal solution built in C# that we grew from the ‘sse’ open source project in Codeplex, (which has moved to http://mesh4x.org as well). We host an instance at http://sync.instedd.org/. But it uses a relational database, so we would have to change the storage layer if we wanted to grow it for real. So what are our options? Java on EC2/S3 seems to be the shortest path given the code we already have in the project, but Python on Google App Engine sounds enticingly simple to maintain and scale, at the expense of initial effort to port the sync libraries to Python. Which seems a unnecessary until you consider that Inveneo, the African Access Point and other platforms prefer Python or Ruby for a bunch of good reasons. We’d like your input — .NET+MySQL, Java, or Python + GAE?
2. Default Data Exchange Standards: Using XForms is simple and works for easy scenarios. We’ll advocate use of RDF when XForms will fall short, but by all means we wanted to avoid a custom/ad hoc way of defining a typed dictionary ‘schema’, versioning of that schema, and of encoding entities following the schema.
Following a clear set of standards for data formats will allow easier mapping of information from one system to another, and the creation of tools that allow end-users to define how their systems integrate.
Isn’t this obvious? Defining which standards to support early in the process is critical because it is easy to reinvent the wheel in this space. Even accidentally. Anyone who can code their way out of a paper bag can define a custom way of serializing dictionaries (a collection of names and values such as name:Ed, country:Cambodia) and define a schema model for it in an hour or so. But inventing one just tends to lead to incompatibilities in the long run, and lack of interoperability in humanitarian systems is an obstacle that anyone with experience has seen get in the way of collaboration and data sharing. It is much smarter to support a well documented subset of a standard such as RDF or XForms –and define extensions as needed (both standards allow schemas/ontologies to be extended). If we can we pledge to play along, applications from multiple organizations will add up to be ‘more than the sum of the parts’
3. Client Applications: We desperately would like to implement or contribute to a stand-alone fat (aka rich) client that you can use on your desktop to synchronize two data endpoints. Ideally this client would allow you to set the endpoints for synchronization, mapping of data schemas, filters, and managing conflicts — in a secure and easy-to-use interface. Any pointers?
4. Transformers and Adapters: There are many existing applications out there that do their work very well. Sometimes two applications serve similar purposes for different audiences or contexts. Sometimes new applications have to coexist with politically entrenched older systems. While we are building common-purpose adapters to mesh4x (such as Hibernate, Java RMS, and KML which we already have but also CSV and google spreadsheets for example), we already hear demand for specific adapters that take into account particular needs of real-world applications that are already deployed in the field. Which systems should we start with? We have been approached with questions about mesh4x and OpenMRS (http://openmrs.org/) or Sahana (http://www.sahana.lk/).
We’d love your input!