NIMA Commission Report

Report of the Independent Commission on the National Imagery and Mapping Agency

14. NIMA and Its Information Architecture--A Clean Sheet

As mentioned previously, the Commission is enthusiastic about the Director's reformulation of NIMA as custodian of the US Information and Geospatial Service (USIGS). Sometimes misunderstood, this reformulation is emblematic of a healthy change in focus, away from systems, away from products, away from processes, and toward information services. This is not to say that NIMA will no longer produce its hallmark products: maps and imagery intelligence products. As NIMA focuses on information services, the hardcopy maps and reports are byproducts--intentionally useful derivatives, but not the essence of NIMA.

A critical consequence of the reformulation is the need to get the information architecture just right. Otherwise, the future extensibility of USIGS will be severely limited. New applications will not be able to flower.

A sub-panel of the Commission took a look at a possible architecture unconstrained by any legacy issues--a "clean sheet" was the starting point for a top-level design exercise. The conclusion of the sub-panel, endorsed by the Commission as a whole, is that to support NIMA's transition to an information service, the USIGS information architecture must become "data-centric." To anticipate the discussion, this means that all TPED processes--and subsequent analytic processes, as well--become transactions against the database, each deriving value from, and adding value to, the database.

14.1 The Importance of Architecture

The importance of focusing considerable energy on NIMA's information architecture cannot be overstated. NIMA is embarked on a major acquisition initiative for its tasking, processing, exploitation, and dissemination (TPED) process, which will, for better or worse, solidify its information architecture for a decade or two to come. The Commission fears that, left to its own devices, NIMA's information architecture could well remain system/function-centric, structured around discrete systems purchases made several hundred million dollars at a time. While these systems could be individually coherent, and would likely meet current stated requirements, they would neither position NIMA to take full and continuing advantage of the revolution in information technology, nor interface gracefully to systems and processes as yet unimagined.

To oversimplify slightly, the Commission is inclined to believe that TPED and other major applications would be best served if NIMA were to develop a new architecture, a new process by which to acquire this architecture, and a new organizational form to take advantage of it. The new architecture would be built upon a distributed database that integrates geospatial and imagery information--and can extend to encompass information derived from other "INTs". The new process would adopt COTS to the maximum useful extent, built in terms of periodic increments, and cut back on requirements for systems integration. The new organization would focus NIMA on its emerging role as content provider for the Global Information Grid (GIG).

It is with temerity that the Commission offers for consideration this more detailed discussion, not to provide a blueprint, but to illustrate how fundamental changes in architecture create fresh possibilities--yes, and raise new issues. It should neither be accepted uncritically, nor discarded petulantly. It should serve merely to illustrate how rethinking TPED without preconceptions can inform the structure and composition of NIMA's information systems, and indeed, NIMA itself. The Commission realizes that insofar as there are sound ideas here, they are neither unique to the Commission, nor absent in NIMA's own thinking.

14.2 Toward a New Architecture

Only half jokingly has NIMA, in its current configuration, been described as "two communities separated by a common agency." Imagery analysis, with its intelligence heritage, is quite comfortable with its functionality allocated as TPED. Geospatial analysis, with its cartographic heritage, is less well served by the TPED nomenclature and more at home with order entry tracking (OET) and work flow management (WFM). While either argot could be adapted to (or adopted by) either community, the data-centric construct accommodates both. The Commission cautiously asserts that beyond being an inclusive construct, data-centricity is a unifying construct.

NIMA is perched on the edge of a systems acquisition that will influence its information environment for years to come. This provides NIMA with a unique opportunity to consolidate its information architecture. The Commission believes that NIMA's information infrastructure should be built around an integrated data architecture, not around a collage of systems, nor products nor processes.42 Actually, the Commission's view is grander still. If done skillfully, NIMA would become the architect, if not the custodian of the Geospatial Information System for the larger national security community--intelligence and operations, diplomatic and military, strategic and tactical.

This "mother of all databases"43 at the center should be the conceptualization, if not the container of all the national security community's geo-referenced (and time-tagged) information.44 Indeed, nearly all relevant information is, or could profitably be geo-referenced. "The Central Database"--which need be neither singular nor centralized--must be widely and easily shared among users and, in the first instance, should hold vector data (the stuff of maps) and raster data (the stuff of images) as a seamlessly packaged whole. The database should be structured to be independent of client or application, fully distributed, and capable of accepting successive value-additions and user annotations. These features would depart from NIMA's current information architecture (though some of NIMA's as-yet-unimplemented plans pull in that direction).

14.3 A Database to Support the TPED Process

As shown in the accompanying illustration, such a database could constitute the primary--not necessarily sole--support for the imagery TPED process; indeed, it would support any number of TPED processes as such.

All TPED functionality--from requirements and tasking, to data reception, processing, exploitation, and dissemination--can be seen as transactions against a database. That this database may be parsed, distributed, replicated, aggregated, and so on is key. Transactions--the value added to data in the database--need not adhere to the sequential implications of traditional TPED interpretation.

14.4 Tasking, Processing, Exploitation, and Dissemination as Transactions

Tasking flows from an expression of information needs and logically starts with an investigation of what already exists--Are the data in a database? Is the product already in inventory? If so, pull it. If not, order it. Ask that it be pushed to you, or ask to be advised as to when it is available to be pulled. In the "back office" the order is processed--pulled from a queue, or pushed to the fulfillment process. Different views--depending upon whether one is in front of the counter or behind the counter--which can be reconciled as transactions against a database. Much can be relegated to server applications: notification, standing taskings, and the like.

Processing, in the first instance, refers to turning the information downlinked from the satellite (in what we might refer to as a "proprietary" format) into a "picture" ready for exploitation, on film or on soft-copy. Processing operations are, generally, done for each picture and so it makes sense to do these prior to the exploitation phase, on large capable hardware close to the downlink entry point. If and when exploitation operations become so routinized that they can be done automatically--say, change detection--then that process might well migrate from the exploitation segment and move "upstream" into the processing segment. In organizational terms, this could mean that NIMA cedes control and execution of these processes to the National Reconnaissance Office (NRO) or commercial operator. No matter who, insofar as the original downlinked information is archived, then successive processing operations can, too, be seen as transactions against a database.

In the same sense, the succession of value-added exploitation steps can be seen as transactions against the database. The (copy of the) image is pulled from the database, value is added, and the modifications and/or modified picture are written back into the database. Thus, exploitation can also be seen, as in the accompanying figure, as a series of transactions (involving imagery but also related vector information), which can continually enrich the database with new features (e.g., a newly discovered double-perimeter fence line) and annotations upon old features.

Dissemination--the intellectual task of deciding to whom information should go, as distinct from distribution, which is the process of carriage--entails both "push" and "pull." In the former case, a background process--driven, say, by tables that codify users' expressions of needs and wants--runs against new postings to the database and sends that information, or a notice of new information to the desirous users. In the pull case, users run queries against the database holdings. Indeed, if the query language allows the user to specify not only how far back in the archive the search should be conducted, but also how far into the future, the distinction between push and pull logically disappears.

We have taken the liberty, in the preceding discussion, to pretend that there is actually one integral database. That need not be the case, and some would argue that in terms of implementation, no one database could possibly satisfy all. But, the master geo-referenced database still holds its position as the logical source of and sink for NIMA work.

14.5 Vector-Raster Integration

The NIMA database ought to permit clients to access vector and raster information in an integrated fashion--i.e., "normalized" to each other so that the user can drape one over the other seamlessly and transparently. As the accompanying figure suggests, image analysts themselves may be able to do their jobs better by being able to see "through" images into underlying geospatial data (or take advantage of geospatial analysis that may indicate, for instance, likely hiding areas for SCUDs; see A Tale of Two Cities, elsewhere in this report).

Today, such a database would naturally contain "chips" of an image--e.g., polygons containing interesting pieces of the larger image. Today, the polygon would be determined by geospatial coordinates--say, a rectangle 2km by 3km centered on a set of geo-coordinates, the "aim point." Eventually, we can expect the chips to be determined more by imagery content--a building, or a compound, or the right-of-way along a road. In either case, a goal is to accommodate better the "bandwidth-challenged" user--fielded forces, those at sea, or airborne. Even with conventional compression, the "last tactical mile" generally constrains us from sending full-size images, which will, themselves, get larger with the next generation of imagery satellites just about as fast as bandwidth will increase. So, the ability to combine vector-map data (which are generally compact for the area covered) with imagery extracts of key visual features, may be the best of all worlds.

14.6 Product, Application, and Client Independence

For many users, NIMA still is defined by its catalog of standard map products, paper or CD-ROM.45 The Commission believes, however, that such products are better thought of as renderings of datasets extracted for specific purposes from a larger database. Users themselves create "products" from the database that NIMA provisions. A "standard" product becomes one where a script has been generated to ensure some uniformity in the data extraction and rendering.

Where once NIMA's job was to make maps, tomorrow its job will be to provision the database and ensure the availability of applications that enable a user (or another application) to call for data using a combination of coordinates, scale, feature sets, and in some cases, currency (what time period is relevant) from an integrated database. Data should be accessible through multiple methods, as shown in the accompanying figure. GIS data can also be used (and thus should be formatted to easily be used) as an input to planning, modeling and simulation, and planners may be able to exploit the database without ever having to see a map or an image.

The ability to call on NIMA's database through standardized function calls should be a capability that others can build into their products. The separation of client and server functions through modular interfaces also eases the systems integration problems (the importance of which is discussed below). Support must be provided for both thick clients with software powerful enough to manipulate and finish the product and thin clients which can only display a map as a picture but cannot manipulate it as data. Overall, the user interface should be a function, not of the database, but of the user's requirements.

Making GIS data broadly accessible via standard protocols permits anyone to build new applications for users. This frees NIMA from having to guess how its data will be used, and allows unanticipated uses to flourish. The data provider simply cannot be prescient enough to anticipate all the uses to which the data will be put. Traditionally, however, data can be seen only through conforming applications, and manipulated only through routines built into the applications themselves. The software behind the Common Operational Picture (COP: the real-time view of the battlefield), for instance, has no macro language. Best commercial practice, however, avoids this dead end, and so, too, must NIMA.

14.7 Location Independence

The "NIMA database" can (and should) be distributed both physically and virtually. As the accompanying figure illustrates, it suffices that one node "know" where all the relevant data sits; the many data streams that go into a GIS system may sit in various locations (and be managed by various owners within and without NIMA) as long as their interconnections--through the GIG, say--are sufficiently robust. Storage, communications and processing all trade off against each other and best effect can be achieved when a single architect has the freedom to make all the tradeoffs--i.e., to globally optimize the network design.

"Ownership" of data ought to be divorced from locality. There is no need to invest the CINCs with responsibility to hold and manage a set of images taken with national assets over its AOR (area of operational responsibility); it is not even clear that information acquired with theater assets (e.g., UAVs) ought to be part of an exclusive CINC image library as well. True, leaving the command image libraries in place may be optimal from the networking point of view--as long as they are globally accessible. But how users "see" the database can be expected to vary only with their employer, clearance, and need to know.

14.8 Annotation

The "NIMA database" must support value-added contributions from anyone, anywhere--the database must host user-supplied annotation. This opens it to a good deal of informed (but, alas, also uninformed) commentary but it also gives users a stake in understanding the GIS database because of their ability to contribute to it. (Although the emergence of client-to-client programs, such as Napster, suggest the distinction between clients and servers is eroding, all NIMA information should be server-accessible because client connections are uncertain and security implications of client-to-client connectivity have yet to be fully explored).

Over time, annotations should become a very significant part of the total database. Indeed, the value of having the database capture the feedback of users (both from DoD and the rest of the Intelligence Community) could rival that of the database itself. Annotation should be understood as exactly that: not the official database, itself, but commentary thereon. Thus, NIMA would retain responsibility for the master plot.

14.9 The Need for a Rigorous Data Model

In developing an architecture for the NIMA database a rigorous data model inherently comes first. All other decisions (such as the systems model) ought to follow, not lead. Such a data model can be conceptualized as the three concentric rings of the accompanying figure. In the center are the core scalable database and network structures (i.e., the processing, storage, and distribution engines).

In the middle ring are the basic data types of a GIS: raster data, vector data, features data, networks, grids, TINs (triangulated irregular networks), fundamental objects etc. In the outer ring are constructed objects (e.g., a street, a multi-spectral image, a vertical obstruction, an "urbanized area"). Such a data model, therefore, would contain a definition of feature classes, metadata, and symbology.

14.10 Ways to Absorb Data from Third Parties

Commercial GIS users are beginning to benefit from the widespread sharing of data sets. NIMA need not create all the information it provides. NIMA already has information-sharing agreements with many governments, and prospects for further sharing appear likely. Datasets can be acquired from other US departments and agencies, as well as from industry.

There are many data sets (e.g., where embassies are located) that other entities (e.g., the State Department) can affordably keep track of much more accurately than can NIMA, itself. There is no good reason for NIMA not to mirror such databases within its own system (mirroring eliminates the very significant problem of combining classified data with unclassified data and second, of thin or unreliable connections to third party servers).

Overall, the more NIMA's data model is compatible with counterpart data models used by the USGS, NOAA, FEMA, major allies, or key NGOs (e.g., the World Bank)--the better. NIMA is best off adapting and adopting commercial standards that work. But where standards do not yet exist, NIMA has to step in to foster their creation to permit greater interoperability and collaboration. The VPF format used in VMAP was developed by NIMA; its success was verified when others (e.g., NATO) adopted it. It helped that NIMA reached out to the community in developing VPF and like activities in the future should have as much participation of the commercial world as they can get.

14.11 Methods to Deal with Logical Inconsistencies

At one level, logical consistency appears to be the sine qua non of a map. Roads are expected to connect, boundary lines to join at their edges, and most buildings sit over land not water.

Unfortunately, although reality may be consistent, databases often are not, especially when they come from different sources, or were made at different times. (both may have been right when made but may have been made at different times). The traditional approach--make it right--may not be the best. The desire to make things consistent inhibits incremental database updating in favor of explicit versioning. Flagging contradictions may be better than arbitrarily declaring one right and one wrong.

14.12 Methods to Separate Public from Restricted Information

NIMA's total information base can be divided into what is unrestricted and what is restricted--either by license and agreement or because of sources and methods. Currently almost all of NIMA's digital cartographic products are restricted for one or another reason. NIMA should continue to exert care in not confusing the protection of intellectual property with the protection of sources and methods so that legitimate government users need not have a security clearance merely to access "the database" for information that is not classified. The discerning reader will recognize the need for separation, yet integration of information as that old bugaboo of multi-level security. The Commission has no answer other than to suggest that multiple levels of security is a here and now solution. The paradigm shift that is hard for some to make is to do database operations at the lowest possible level (not "policy high") and then replicate the data to higher levels. To NIMA's credit, they seem to understand this. NIMA will also benefit from the DOD-wide rollout of a Public Key Infrastructure (PKI) and a concerted effort at Information Warfare Defense/Defensive Information Operations (IWD/DIO) designed to preserve the confidentiality, integrity, non-repudiateability and availability of essential information. And fortunately, although security is an area where the federal government often leads the private sector, commercial firms have increasing motivation to solve this problems of protection of intellectual property and privacy of proprietary data.

14.13 New Data Types

"The database" should be capable of holding new data types such as HSI, video, SAR-MTI and urban data. Each presents its own problems and taxes the extensibility of database design and the prescience of the data model. No simple answers are at hand except an open mind.

Powerful examples of the benefits of fusing multiple sources of intelligence are widely known, even if less-widely emulated. The challenge for NIMA is to ensure that its data model and database designs do not constrain the incorporation of new data types.

The logic of using geo-referencing to break the tyranny of the intelligence stovepipes is clear. Thus, the burden of multi-INT integration falls on NIMA--NIMA is clearly the enterprise to organize such an endeavor by virtue of its deep geospatial knowledge and its capacious storage and networking capability (even if, as argued further below, it needs more technological capability to assume the job.

14.14 Precision and Persistence

Resolution, or ground sample distance (GSD), are watchwords in the imagery world. Information differs in how accurately it can be measured. Imagery (both EO and synthetic aperture radar), for instance, can be accurate to the sub-meter level--but not always: e.g., MSI, HSI, and USI, for technical reasons, have successively less resolution, and correspondingly less geospatial precision. ELINT data are even less precise; so is most acoustic and seismic information. Most weather data are measured over kilometers.

Information also differs to the extent that accurate measurement is meaningful. Some phenomena are inherently fuzzy. Neither the habitat of a species, nor the turf of a gang, the catchment area of a shopping center, or the track of a storm can be usefully measured in meters. Assigning geospatial attributions to other phenomena is a stretch. Rumors, for instance, about impending governmental decisions in Ethiopia may be geospatially tagged to a specific office building in downtown Addis Ababa, but such tagging feels artificial or at least of questionable value since its source and impact may be geospatially distant from the office. Some information has no real geospatial content whatsoever: the characteristics of a weapons system, or reports on an impending religious schism.

It is pointless to give geospatial information more precision than is warranted. But every datum has to be anchored to some location in a geospatial database.

Persistence marks NIMA's products; evanescence marks the Common Operating Picture (COP). Yet, persistence is not a binary attribute. Take the accompanying figure. A mountain pass is forever. Successively, a paved road that traverses the pass, a gravel trail that leads off the road, an assembly point for mobile-missile launchers and finally, the Scud in flight are increasingly fleeting. Nevertheless, sensor-based data, for instance, of mobile objects acquires context, in large part, from a background of immobile objects. Accounting for trucks requires accounting for roads and passes, in a sense.

So where is the proper boundary between "NIMA's data" and that which makes up the Common Operating Picture (COP)? To what extent should NIMA's data model be built for eventual extension into the COP data model? Good questions, but no good answers, as yet.

14.15 Toward Multi-INT integration

The Commission believes that any architecture recommended by NIMA has to be able to evolve to a multi-INT architecture. Clear minds will separate this from the questions of who should implement and who should pay for the implementation.

NIMA should begin to engineer a broader architecture by which such INTs can be captured and presented in a coherent fashion. In its simplest form, other-INT data should be available as layers normalized to NIMA data. From whichever layer the user starts, he must be able to drill down to access the other information.

Multi-INT database(s), as they emerge, should take advantage of the inherent parallelism in TPED processes across the various INTs--as the accompanying figure suggests, every INT, as a general proposition involves tasking, collection, processing, exploitation, and dissemination.

Still, it is important to note that the relationships among tasking, collection, and processing vary by INT. It is also important to note that this multi-INT architecture does not need to spring into being all at once. We can replace components as dollars and ideas permit, and invest in those areas that provide the highest payoff.

Serious thought is needed on how to manage a federation of databases, separately budgeted, with crosscutting management structures. Perhaps an intermediate but high-level interagency group could coordinate the overall data model, and the underlying technology standards, as well as sponsoring consulting and training. DIA's Joint Intelligence Virtual Architecture (JIVA) provides a model for consideration.

Finally--despite the Commission's enthusiasm--it is worth remembering that geo-referencing is not the only way to look at a mass of data.

14.16 Conclusions of the "Clean Sheet" Exercise

Building NIMA's architecture around a database that integrates maps and images and other relevant intelligence data, making this database independent of location and client, and permitting third-party annotation to it together constitutes the core recommendations for the information architecture.

Radical approaches like these are less risky than they sound. People have been doing data-centric architectures and databases for many decades, and GIS databases for at least two of them. The commercial industry is mature in all respects: workstations, databases, and GIS. Commercial capabilities already exist to do most of the imagery and geospatial manipulation that NIMA could want. NIMA is not being asked to approach this architectural requirement in a way and with a degree of effort that no one has ever done before; it is asked to apply familiar methods to its problems, which, if unique in scope, are not unique in form and content.

Footnotes

42 Advocating that NIMA develop a data-centric architecture rather than a system-centric, product-centric or process-centric architecture may seem, at first, to run counter to today's government and business practices. Normally, one first determines the business processes critical to the organization and then designs an information system to meet these. For NIMA, though, information is the product.

43 With apologies to Bran Ferren.

44 It will be worth exploring whether, and to what extent, the MIDS-IDB database administered by DIA should form the conceptual core of a new data-centric architecture.

45 There were 283 products at last Commission count.

Table of Contents | Home | PDF

NEWSLETTER

Join the GlobalSecurity.org mailing list