Minimal viable EEG equipment for dissertation research on BCI / BMI

Minimal viable EEG equipment for dissertation research on BCI / BMI

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am planning out a dissertation study of Brain-computer Interface (a.k.a. Brain-machine Interface, BCI, BMI, etc) applications.

One of the 3 papers in that dissertation will involve collecting original data, analyzing it, and writing a biofeedback application based upon this data.

I'm in a Computer Science department at a university which does not have a med school, so as far as I know I'll have to purchase the equipment myself out of a very modest research budget.

Thus, my question is "what's the lowest-end (i.e. cheapest) kind of EEG machine which would still be considered modern and sophisticated enough to be publication worthy in journals?".

Of course, I've done a good bit of research on this by reading recently publications on Google Scholar and will continue to do more research, but there's a lot of information overload and I have no prior background with EEG equipment, so any insight would be greatly appreciated.

From what I can tell the main thing driving price is the number of leads or sensors. They seem to range from 1 lead in consumer EEG toys to 100 leads, with a lot of products being advertised that have 16 or 24 leads. There are of course a lot of other features, so feel free to answer using either specific products or in terms of whatever features you've found important.

I have been working with BCI since some time and would recommend you to try these ones. They all have been widely used in MS/PhD research and their results are more or less accepted everywhere:

  • Emotiv
  • NeuroSky (Starts from 99$)
  • g.tec (Most accurate/Expensive one - not recommended for startups)

P.S: Last, but not least, feel free to have a look at BCI Competition datasets. I remember I used them in my semester project. Plus, if you are interested in getting the dataset of invasive BCI, I can provide you the email of respective research as well. Best of luck in your Research!!

Frontiers in Neurorobotics

The editor and reviewers' affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.

  • Download Article
    • Download PDF
    • ReadCube
    • EPUB
    • XML (NLM)
    • Supplementary
    • EndNote
    • Reference Manager
    • Simple TEXT file
    • BibTex



    Brain computer interface technology represents a highly growing field of research with application systems. Its contributions in medical fields range from prevention to neuronal rehabilitation for serious injuries. Mind reading and remote communication have their unique fingerprint in numerous fields such as educational, self-regulation, production, marketing, security as well as games and entertainment. It creates a mutual understanding between users and the surrounding systems. This paper shows the application areas that could benefit from brain waves in facilitating or achieving their goals. We also discuss major usability and technical challenges that face brain signals utilization in various components of BCI system. Different solutions that aim to limit and decrease their effects have also been reviewed.


    The field of assistive technologies, for mobility rehabilitation, is ameliorating by the introduction of electrophysiological signals to control these devices. The system runs independent of physical, or muscular interventions, using brain signals that reflect user's intent to control devices/limbs (Millán et al., 2010 Lebedev and Nicolelis, 2017), called brain-computer interface (BCI). Commonly used non-invasive modality to record brain signals is electroencephalography (EEG). EEG signals are deciphered to control commands in order to restore communication between the brain and the output device when the natural communication channel i.e., neuronal activity is disrupted. Recent reviews on EEG-BCI for communication and rehabilitation of lower-limbs (LL) could be found in (Cervera et al., 2018 Deng et al., 2018 He et al., 2018a Lazarou et al., 2018 Semprini et al., 2018 Slutzky, 2018).

    About five decades ago, EEG-BCIs used computer cursor movements to communicate user intents for patient-assistance in various applications (Vidal, 1973 Wolpaw et al., 2002 Lebedev and Nicolelis, 2017). The applications are now widespread, as machine learning has become one essential component of BCI, functional in different fields of neurorobotics and neuroprosthesis. For lower extremity, applications include human locomotion assistance, gait rehabilitation, and enhancement of physical abilities of able-bodied humans (Deng et al., 2018). Devices for locomotion, or mobility assistance, vary from wearable to (non-wearable) assistive-robot devices. Wearable devices such as exoskeletons, orthosis, prosthesis, and assistive-robot devices including wheelchairs, guiding humanoids, telepresence and mobile robots for navigation are the focus of our investigation.

    Control schemes, offered by these systems, rely on the inputs derived from electrophysiological signals, electromechanical sensors from the device, and the deployment of finite state controller that attempts to implicate user's motion intention, to generate correct walking trajectories with wearable robots (Duvinage et al., 2012 Jimenez-Fabian and Verlinden, 2012 Herr et al., 2013 Contreras-Vidal et al., 2016). Input signals are typically extracted from the residual limb/muscles i.e., amputated or disarticulated lower-limbs (LL), via electromyography (EMG), from users with no cortical lesion or intact cognitive functions. Such solutions consequently preclude patient groups whose injuries necessitate direct cortical input to the BCI controller, for instance users with neuromotor disorders such as spinal cord injury (SCI) and stroke, or inactive efferent nerves/synergistic muscle groups. In this case direct cortical inputs from EEG could be the central-pattern-generators (CPG) that generate basic motor patterns at the supraspinal or cortical level (premotor and motor cortex) or the LL kinesthetic motor imagery (KMI) signals (Malouin and Richards, 2010). The realization of BCI controllers solely driven by EEG signals, for controlling LL wearable/assistive devices, is therefore possible (Lee et al., 2017). Several investigations reinstate that CPG with less supraspinal control is involved in the control of bipedal locomotion (Dimitrijevic et al., 1998 Beloozerova et al., 2003 Tucker et al., 2015). This provides the basis for the development of controllers, directly driven from cortical activity in correlation to the user intent for volitional movements (Nicolas-Alonso and Gomez-Gil, 2012 Angeli et al., 2014 Tucker et al., 2015 Lebedev and Nicolelis, 2017) instead of EMG signals. Consequently, controllers with EEG-based activity mode recognition for portable assistive devices, have become an alternative to get seamless results (Presacco et al., 2011b). However, when employing EEG signals as input to the BCI controller, there necessitates a validation about the notion that EEG signals from the cortex can be useful for the locomotion control.

    Though cortical sites encode movement intents, the kinetic and kinematic changes necessary to execute the intended movement, are essential factors to be considered. Studies indicate that the selective recruitment of embedded “muscle synergies” provide an efficient means of intent-driven, selective movement, i.e., these synergies, stored as CPGs, specify spatial organization of muscle activation and characterize different biomechanical subtasks (Chvatal et al., 2011 Chvatal and Ting, 2013). According to Maguire et al. (2018), during human walking, Chvatal and Ting (2012) identified different muscle synergies for the control of muscle activity and coordination. According to Petersen et al. (2012), the swing-phase was more influenced by the central cortical control, i.e., dorsiflexion in early stance at heel strike, and during pre-swing and swing phases for energy transfer from trunk to leg. They also emphasized the importance of cortical activity during steady unperturbed gait for the support of CPG activity. Descending cortical signals communicate with spinal networks to ensure that accurate changes in limb movement have appropriately integrated into the gait pattern (Armstrong, 1988). The subpopulations of motor-cortical neurons activate sequentially amid the step cycle particularly during the initiation of pre-swing and swing (Drew et al., 2008). The importance of cortical activation upon motor imagery (MI) of locomotor tasks has been reported in Malouin et al. (2003) and Pfurtscheller et al. (2006b). Similarly, the confirmation of electrocortical activity coupled to gait cycle, during treadmill walking or LL control, for applications as EEG-BCI exoskeletons and orthotic devices, has been discerned by (He et al., 2018b, Gwin et al. (2010, 2011), Wieser et al. (2010), Presacco et al. (2011a), Presacco et al. (2011b), Chéron et al. (2012), Bulea et al. (2013), Bulea et al. (2015), Jain et al. (2013), Petrofsky and Khowailed (2014), Kumar et al. (2015), and Liu et al. (2015). This provides the rationale for BCI controllers that incorporate cortical signals for high-level commands, based on user intent to walk/bipedal locomotion or kinesthetic motor imagery of LL.

    While BCIs may not require any voluntary muscle control, they are certainly dependent on brain response functions therefore the choice of BCI depends on the user's sensorimotor lesion and adaptability. Non-invasive types of BCI depend on EEG signals used for communication, which elicit under specific experimental protocols. Deployed electrophysiological signals that we investigate, include oscillatory/sensorimotor rhythms (SMR), elicited upon walking intent, MI or motor execution (ME) of a task, and evoked potentials as event-related potentials (ERP/P300) and visual evoked potentials (VEP). Such BCI functions as a bridge to bring sensory input into the brain, bypassing damages sight, listening or sensing abilities. Figure 1 shows a schematic description of a BCI system based on MI, adapted from He et al. (2015). The user performs MI of limb(s), which is encoded in EEG reading features representing the task are deciphered, processed and translated to commands in order to control assistive-robot device.

    Figure 1. Generic concept/function diagram of BCI controlled assistive LL devices based on motor imagery.

    Reviewed control schemes deployed by wearable LL and assistive-robots are presented in a novel way, i.e., in form of a general control framework fitting in hierarchical layers. It shows the informatic interactions, between the user, the BCI operator, the shared controller, and the robot device with environment. The BCI operator is discussed in detail in the light of the feature extraction, classification and execution methods employed by all reviewed systems. Key features of present state-of-the-art EEG-based BCI applications and its interaction with the environment are presented and summarized in the form of a table. Proposed BCI control framework can cater similar systems based on fundamentally different classes. We expect a progress in the incorporation of the novel framework for the improvement of user-machine adaptation algorithms in a BCI.

    The reviewed control schemes indicated that the MI/ME of LL tasks, as aspects of SMR-based BCI have not been extensively used compared to upper limbs (Tariq et al., 2017a,b, 2018). This is due to the small representation area of LL, in contrast to upper limbs, located inside the interhemispheric fissure of the sensorimotor cortex (Penfield and Boldrey, 1937). The review is an effort to progress the development of user's mental task related to LL for BCI reliability and confidence measures.

    Challenges presently faced by EEG-BCI controlled wearable and assistive technology, for seamless control in real-time, to regain natural gait cycle followed by a minimal probability of non-volitional commands, and possible future developments in these applications, are discussed in the last section.

    2. Materials and Methods

    2.1. Instrument Requirements

    We identified aspects that are crucial to be fulfilled for a fNIRS device in the context of mobile BCI and neuroergonomics. Besides the criterion for the hardware to be comparatively low cost, these can be assigned to four groups:

    Usability: Miniaturization and mobility of the device, unobtrusiveness and robustness of the optode attachment.

    Signal Quality: Low inter channel crosstalk, low drifts of light sources and overall system signals, high signal sensitivity/amplifier precision, robustness to background light and high dynamic range.

    Safety: Low heat development, harmless light intensities and galvanic isolation to power lines.

    Configuration/Customization: Scalability of channel number, modularity, configuration of light intensities and receiver gain, interface to custom hard-/software.

    The following subsections will provide detailed information on our approach to fulfill these requirements on a concept, hardware and software level.

    2.2. Instrumentation Design

    2.2.1. System Concept

    The system concept of the modular open instrument is shown in Figure 1. It consists of one or more stand alone 4-channel Continuous Wave NIRS modules and a mainboard. Each module is controlled by the mainboard via a simple parallel 4 Bit control interface. The mainboard provides the power supply rails, AD-conversion of the NIRS signals and an UART communication interface and can be replaced by any custom data acquisition (DAQ-) equipment when the control interface and symmetric କ V power rail are supplied. This enables full customization of the instrument with respect to physical channel number, power consumption and conversion rate and depth, while spatially distributing the hardware components (and weight), and performing local hardware signal amplification and processing, thus minimizing noise and interferences.

    Figure 1. System Concept.

    The fNIRS modules were designed considering the current understanding of fNIRS instrumentation technology as reviewed by Scholkmann et al. (2014) and others (Obrig and Villringer, 2003 Son and Yazici, 2006) with special regard to hardware design and wavelength-selection for SNR maximization/crosstalk minimization and considering potential hazards as identified by Bozkurt and Onaral (2004).

    Each module provides four dual wavelength fNIRS channels using 750 and 850 nm multi-wavelength Epitex L750/850-04A LEDs. While the LEDs have a broader emission spectrum (Δλ = 30/35 nm) than sharp peaked laser diodes (typically Δλ≈ 1 nm), their incoherent and uncollimated light allows for a higher tissue interrogation intensity and direct contact with the scalp due to less heating and is safer with the human eyes.

    The LED current is regulated by adjustable current regulator circuits based on high precision amplifiers (Analog Devices AD824A) and field effect transistors (FMB2222A). Channel activation and current modulation for lock-in amplification is performed by analog switches (Analog Devices ADG711) that are accessed via an analog 1:8 demultiplexer (NXP HEF4051). After tissue interrogation, NIR light is detected by a central Si photo detector with integrated trans-impedance amplifier for output noise minimization (Texas Instruments OPT101, 1 MΩ feedback resistor, bandwidth 14 kHz) and is then amplified and lock-in demodulated (using Analog Devices AD630). An 8 Bit Atmel Corp. AtMega16A microcontroller's PWM module creates the 3.125 kHz square wave reference for lock-in (de-)modulation using an external 20 MHz crystal for jitter minimization. It also processes incoming control signals from the 4 Bit control interface and operates and configures the on board hardware. For adjustment of the LED currents, an 8 Bit digital-to-analog converter (DAC Maxim MAX5480) is implemented. It supplies the voltage level at the current regulator inputs that is the command variable for the current regulation level. A programmable gain amplifier (Texas Instruments PGA281) is implemented for pre lock-in amplification of the detected NIR signal with a variable gain from G = 0.688 to 88.

    During lock-in demodulation, the signal is filtered by a 3rd-order Butterworth low-pass and is then again amplified (G = 5.1) and stabilized by a set of two high precision amplifiers (Texas Instruments LMC6062) before leaving the fNIRS module for external AD conversion.

    The system is designed for Time-Division Multiplexing (TDM) of the fNIRS channels. This is a trade-off between minimizing inter-channel crosstalk, heating (Bozkurt and Onaral, 2004) and battery consumption on the one hand and sacrificing SNR, which is limited by the width of the applied time windows. For demultiplexing of the locked-in output branches, a variable (sample rate dependent) dwell time is inserted after each onset of a single channel activation before sampling the steady state photo detector signal on the mainboard or with custom DAQ equipment.

    Configurable PGA gain (G = 0.6875�) and LED-intensity (256 DAC levels) in combination with a feedback “signal monitor” line allow the signal dependent adaption for maximum amplification in the lock-in demodulation process without reaching the dynamic range limit of one of the components.

    Modularity: The above described design of the fNIRS modules allows operation in many configurations—only requiring compatibility with the above mentioned interface consisting of 4 Bit control, power supply and analog output. For an extension of the total channel count, several modules can be used. Changes in set-up and module count only affect the control unit and its routines chosen by the user, which activate the time division multiplexed channels and convert the analog fNIRS signals from the modules:

    • As the objective of this work was the design of miniaturized fNIRS modules for mobile applications, a microcontroller (Atmel AtMega644) based mainboard was developed for mobile data acquisition and module control. Using a 4 channel 16 Bit analog-to-digital converter (ADC Linear Technologies LTC2486) and a Bluetooth wireless controller (Amber Wireless AMB2300), the mainboard acquires the fNIRS signal(s) from up to 4 modules (16 channels), transmits the data to a computer via serial protocol and processes incoming user controls. To scale the number of channels, the user connects the desired number of modules and configures the channel administrator routine on the mainboard's microcontroller (see also Section 2.2.3). The symmetric କ V power rail is created from battery DC voltage using a stabilized linear power regulator circuit (based on ON Semiconductor MC7805 and MC7905 ICs). Running on batteries and using only low voltages also ensures user safety.

    • The mainboard is a placeholder for any (custom) peripheral acquisition and control hardware. With DAQ-devices providing digital I/Os and an external power supply, any number of modules (limited by the desired dwell time and sampling rate) can be used and controlled by control- and acquisition routines written and customized by the user.

    2.2.2. Selection of Hardware Design Aspects

    Emitter Branch: For a high accuracy of the fNIRS instrument, a careful design of the NIR-light emitting circuit is crucial, as fluctuations in the radiation intensity cannot be discriminated from changes in absorption due to changes in chromophore concentrations in the tissue.

    To keep the current through the LED semiconductor junctions constant and independent from variations in supply voltage and temperature, and at the same time allow intensity adjustment and current modulation for the lock-in amplification process, a customized current regulator circuit was designed (see Figure 2).

    Figure 2. Current regulator/modulator circuit.

    Similar to a solution proposed by Chenier and Sawan (2007), an analog switch is used in the OpAmp based regulation circuit for square-wave modulation of the current. However, instead of disrupting the regulation process at the transistor base, analog switches (ADG711) are used at the inputs of the regulator circuits to pull the regulator inputs low when deactivated. fNIRS channel activation and modulation is thus realized by simply feeding through the square wave reference to the corresponding current regulator switch selected by the multiplexer.

    As the regulator is modulated in the kHz-range, over- and undershoots influence the ideally square-wave shape of the current. To optimize the shape, a passive negative RC feedback was added and evaluated for best performance.

    Receiver Branch: The receiver branch was designed to maximize SNR by minimizing noise influences from shot, thermal and 1∕f noise, dark currents and stray light from external light sources.

    Shot noise is based on the quantum nature of the photons and therefore unavoidable and, for detectors without internal amplification, proportional to the square root of the average incident intensity (Scholkmann et al., 2014). To maximize SNR, the instrument is operated using the maximum NIR-light intensity level for the current regulators that is feasible in the experimental situation. Opaque cell rubber tubes are used to cover the sides of the NIR emitters and detector and the fNIRS module housing is covered with opaque paint to minimize shot noise influences from background radiation.

    To reduce thermal noise influences, a Si photo diode with integrated trans-impedance amplifier circuitry (OPT101) was selected for detection. Lock-in extraction of the detected signal further reduces stray light, dark current and 1/f noise influences. Placing the PGA between the detection and lock-in extraction unit enables maximum pre-amplification of the signal while amplifier noise components added in the amplification process are reduced by the subsequent lock-in demodulation. Non-physiological high frequency components of the signal are attenuated by the 3rd order low pass filter of the lock-in demodulation unit.

    2.2.3. Interfaces and Software Design

    Figure 3 shows the software concept. The fNIRS module software sets up hardware components (PGA, DAC, MUX,…) and is controlled by an interrupt-based architecture that receives its control signals from the 4 Bit parallel interface. Therefore, interface operation and analog signal conversion can be done by the mainboard or any custom or standard DAQ-equipment with 4Bit programmable digital outputs (such as e.g., NI USB600x series). Using the mainboard, a channel administration routine both supervises data acquisition and acts as interface between the fNIRS modules and the PC by processing received user commands (configuration, start, stop…), translating them into signals for the 4 Bit fNIRS module interface(s) and sending acquired data packages via the UART interface. On the PC's operating system side, the user can control the instrument and directly read out the data packages in ASCII CSV format via a simple serial port command console or access the serial port with any software such as LabView or Matlab. A LabView graphical user interface was developed for easy configuration and control as well as display and logging of raw and modified Beer-Lambert Law data.

    Figure 3. Software and Interface Concept. Stand alone fNIRS module operated via parallel control interface by the mainboard or any custom control and data acquisition device. Function of the 4 Bit interface (3:RST, 2:TRIG, 1:CH1, 0:CH0): Bits CH1:CH0 select one of the four physical NIRS channels. A rising edge on the TRIG line activates the selected channel, always beginning with wavelength 750 nm of the corresponding LED. Each subsequent rising edge toggles the activation between 750 and 850 nm. When the RST line is pulled up, all channels are turned off. The next rising edge on the TRIG line starts the process again, beginning with 750 nm.

    2.2.4. Mechanical and Probe Design

    In the fNIRS instrument's mechanical design, the idea of modularity/scalability and robust fixation is continued by providing independent custom 3D printed solutions for the single fNIRS modules and the mainboard:

    The Mainboard, Bluetooth module and batteries are worn on the upper arm of a subject in a chained multiple-unit housing (see also Figure 5, in the next section).

    For the single fNIRS modules, a new mechanical spring-loaded design was approached to optimize signal quality, sensitivity and light penetration depth together with easy and robust, adaptive fixation of the optodes (see Figure 4). Based on a spherical approximation of the head with diameter D = 20 cm, the central NIR light detector and the four NIR LEDs are placed perpendicular to the scalp with a source-detector distance of d = 35 mm. To enable perpendicular fixation of the emitters/detector and at the same time allow alignment to the natural unevenness of the head and its deviations from the spherical approximation, the NIR light LEDs are not stiffly connected to the module body housing but integrated in movable spring-loaded LED holders. These holders are based on two nested tubes that are spring-loaded against each other (S1) and against the module housing (S2) and are able to rotate around an axis (R): Spring S1 presses the LED toward the surface of the head, thus enabling alignment and preventing the loss of contact during movements. Spring S2 and the rotary joint R keep the LED perpendicular to the surface while enabling small deviations for comfort and alignment.

    Figure 4. Mechanical spring-loaded concept: Spherical head approximation (top left), geometric channel arrangement (red: NIR LEDs, black: photo detector, blue: measurement points of highest sensitivity top right), spring-loaded mechanical design illustrated on one LED-holder (bottom). Spring S1 for alignment and buffering, spring S2 and rotatory joint R for perpendicular alignment.

    To minimize stray light influences and for cushioning purposes, the detector and emitters are encased by an opaque cell rubber tubing. To fixate a single module to the head, a flexible ribbon with hook-and-loop fastener can be used that is sewed to the module housing.

    The mechanical concept was designed to allow the modules to be used on the forehead as well as over haired regions of the head: The single spring-loaded optodes are easily accessible due to their modular fixation without a cap or other concealing elements. This enables the user to manually brush aside obstructing hair from under the optodes for better optical contact. Even though we successfully conducted measurements over hairy regions of the head, it has to be pointed out that the usability of the modules on other regions than the forehead has not been proven under controlled conditions so far.

    2.3. System Evaluation

    2.3.1. Hardware Analysis

    To enable a differentiated characterization of the instrument's hardware according to functional units, evaluation and analysis was split into emitter branch (current regulation and modulation), receiver branch (lock-in module), power supply stability and overall drift characteristics:

    Current regulator/modulator speed and current shape/oscillation characteristics: To evaluate and optimize the current regulator design characteristics for a stable and minimally oscillating but steep square wave shape of the regulated current signal, both LTSpice simulations and measurements were conducted and the regulator design parameters iteratively improved using two high-precision operational amplifiers (Analog Devices AD824A and Linear Technologies LMC6064). To minimize transient oscillation and settling times, a negative feedback decoupling capacitor C was introduced to the regulator design. For the determination of its optimal value, the shape of the regulated square wave current signal was investigated in a range from C = 0 pF to C = 330 pF at different current levels.

    Lock-in performance: The sum of propagation delays that result from each hardware component in the emitter-detector-signal path leads to an overall phase shift between input and reference signal in the analog lock-in amplification process. Such a phase shift results in an attenuation of the signal during demodulation (Meade, 1982, 1983). To minimize this effect, all hardware elements in the signal path were selected with respect to high-speed/low delay times. The remaining overall phase shift Δ Φ = Δ t T · 2 π between the reference signal (with period T) and the detected pre-amplified signal was measured before demodulation. Using the established straight forward mathematical model for square wave reference lock-in demodulation, as in Meade (1983), a phase shift dependent attenuation factor

    was used to estimate the resulting attenuation.

    For an estimation of the receiver sensitivity using the noise equivalent power (NEP), dark voltage noise levels (no incident light to the photo detector) were measured at the output of the lock-in-module.

    System drifts: The following possible sources of system drift were considered: Changes in the 1Ω LED current regulation resistance due to temperature changes, changes in the total radiated power of the LEDs due to semiconductor junction temperature and changes despite constant currents and supply voltage variations. Changes in stray light, amplifier and thermal resistor noise are strongly suppressed by the lock-in amplification process. To minimize signal drifts resulting from changes in the 1Ω current regulator resistance, Panasonic current sensing resistors with a low temperature coefficient of resistance (TCR = 념뜐 𢄦 /ଌ) were chosen.

    The overall system drift of a single fNIRS module was specified with 20 min continuous acquisition windows of a single active channel at maximum intensity (100 mA) with the PGA set to G = 44 and the module being placed at a fixed position in an opaque closed box.

    Mainboard power supply stability: DC supply voltage drifts during 20 min signal acquisition periods and current modulation impacts on the supply voltage were evaluated. As the 100 mA (max.) square wave 3.125 kHz modulation can influence the power supply voltage stability and noise it can degrade the performance of the signal detection and amplification elements. Their output signals during active modulation were acquired while zero optical input to the photo detector was ensured by encasing the active LED with an opaque metal box. For customization, the layout of the fNIRS module allows both separate and common supply of the LED currents and module hardware.

    2.3.2. Physiological Verification

    Simple qualitative experiments were conducted using a channel at 10� point Fp1 to verify significant strength of physiological information in the raw signal and its power spectrum. Amongst others, visibility and strength of pulse artifacts are indicators for the signal quality and have been widely documented in fNIRS literature with the pulse artifact's amplitude being in the order of metabolic variations due to brain activity (Boas et al., 2004 Lareau et al., 2011 Scholkmann et al., 2014). Thus, with the fNIRS module pressed firmly against the head to reduce the sensitivity to scalp signals (decreased blood flow under the optodes), a clearly visible pulse artifact is a first indicator for sufficient signal quality to measure brain activation. The pulse rate was verified with conventional reference pulse measurements.

    For verification and quantification of the device's capability to measure metabolic brain activity, a mental arithmetic BCI experiment was conducted with 12 subjects. In this experiment, it is shown that the measured hemodynamic responses can be classified on a single-trial basis, i.e., each trial can be classified as containing mental arithmetic or relaxation, instead of measuring only the difference in the average hemodynamic response.

    Mental arithmetic tasks are known to illicit strong hemodynamic reactions in frontal brain areas and have been investigated in a variate of studies with fNIRS (Ang et al., 2010 Herff et al., 2013 Bauernfeind et al., 2014). Here, 30 trials of mental arithmetic data were recorded for each participant. During each 10 s trial, participants were asked to repeatedly subtract a number between 7 and 19 (excluding 10) from a number between 501 and 999). Both numbers were presented on a screen at a distance of roughly 50 cm. After each mental arithmetic trial, participants were asked to relax for 25� s. These pause intervals were indicated by a fixation cross on the screen. A longer resting period of variable length was included after 15 trials to allow participants to rest and drink. No data of these extended resting periods were used in our analysis.

    The open fNIRS device was placed on the forehead and fixated around the head with the flexible ribbon with hook-and-loop fastener sewed to its housing. It was placed such that both active emitters were placed on the locations Fp1 and Fp2 of the international 10�-system. The light detector was placed on AFz resulting in an emitter-detector distance of approximately 3.5 cm.

    All subjects were informed prior to the experiment and gave written consent.

    The signal processing of the recorded data was performed in a straight-forward and simple manner, since we focus on the developed hardware in this paper. More advanced methods have been shown to improve accuracies for classification in neuroimaging (Calhoun et al., 2001 Blankertz et al., 2008 Lemm et al., 2011 Heger et al., 2014). The raw optical densities were transferred to concentration changes of oxygenated and deoxygenated hemoglobin (HbO and HbR, respectively) using the modified Beer-Lambert Law (Sassaroli and Fantini, 2004). HbO and HbR values were then linearly detrended in windows of 300 s. Low frequency noise was attenuated by subtracting a moving average of the mean of 30 s prior and after every sample. Finally the data was low-pass filtered using an elliptic IIR filter with filter order 6 and a cut-off frequency of 0.5 Hz to reduce high-frequency systemic noise like pulse artifacts.

    After preprocessing, trials were extracted based on the experiment timings. For the pause blocks, we extracted the last 10 s of the 25� s pause intervals, to ensure that hemoglobin levels have returned to baseline. For each mental arithmetic trial, we extracted 10 s of data starting 5 s after stimulus presentation, to ensure that the hemodynamic response has already developed. Labels were assigned to the trials referring to either mental arithmetics or pause data. For each trial, we extracted the slope of a straight line fitted to the HbO and HbR data of each channel as a feature. The line was fitted using linear regression with a least-squares approach. Slope features have been shown to work well in previous studies (Herff et al., 2014).

    Evaluation was performed using a 10-fold cross-validation and classification by Linear Discriminant Analysis. In addition to the single trial analysis, the average hemodynamic response is calculated by averaging over all mental arithmetics or all pause trials.

    Technology and Research

    Pediatric BCI Technology: A Promising Outlook for the Future of Childhood Disability

    The future of BCI technology is promising. Pediatric BCI technology in particular is seeing a surge of interest, with possible improvements in therapeutic outcomes already reported for various patient populations. Early research has suggested that through the assistance of a BCI, neurofeedback based rehabilitation in children with CP may help encourage early neuro-plastic changes which could improve the acquisition of motor control, which would otherwise be more challenging (Daly et al., 2014). As this field moves forward, collectively we should strive to improve the quality of life for both those with a documented history of motor control, as well as for those whose movement patterns and motor control may not have been previously apparent.

    Varied future applications for BCI technology are on the horizon. These include a range of projects from improving the information transfer rate of speech and communication-based BCIs to evolving current pediatric BCI systems to include multi-modal data and input streams. These goals aim to meet the complex needs of individuals who will benefit the most from these systems while improving their autonomous communication, social responsiveness, and movement fluidity. Events like the Cybathlon are providing unique, competitive environments that help promote longer BCI use and practice with through BCI end-user (Perdikis et al., 2018). BCI teams across the world are invested in transforming lives which might benefit from the development and synthesis of BCI yet almost none are focused on the needs of disabled children.

    The development of pediatric BCI systems faces its own set of unique challenges. BCIs depend on controllable, evoked patterns of brain activity for analysis. Due to the current dearth of pediatric BCI research, however, evoked activity in the developing brains of children has yet to be well characterized for BCI. Many existing BCI paradigms involve dry, repetitive tasks that do not take into consideration the shorter attention spans of children, prompting the need for more engaging BCI paradigms. Children are also likely to be more sensitive to fatigue and discomfort from wearing BCI hardware for long periods. Improvements in BCI hardware, signal processing methods, and classification algorithms can all contribute to reducing the overall amount of data that need to be collected, thus reducing the likelihood of fatigue and discomfort (Kinney-Lang et al., 2016). The underlying conditions will certainly differ in children (CP rather than ALS) where unique brain injuries, neurophysiology, and the effects of ongoing development must all be considered. Perhaps most importantly, the needs and goals of individual users will be different in children and youth, and patient-driven development of BCI technologies and applications is essential. These challenges, while prevalent and clear limitations for modern BCI systems should not be seen as insurmountable. Rather, they should serve as areas of focus for those motivated to advance BCI applications for children, with the understanding that overcoming these existing barriers will require extensive collaboration among BCI stakeholders across the spectrum.

    Hybrid Brain-Computer Interfaces

    If BCIs are to be accepted clinically, it is critically important that the system can accurately predict user intent (Wolpaw, 2007 Daly and Wolpaw, 2008 Wolpaw and Wolpaw, 2012). Due to current hardware and software limitations, the classification accuracy of predictions is less than 100%. Even for typically developing individuals, accuracy ranges from 50 to 98 (Lee and Choi, 2018 Wang et al., 2018) and drops further for individuals with disabilities (Daly et al., 2014 Lazarou et al., 2018). Studies with children are few with variable results, some reporting lower performance (Mikoajewska and Mikoajewski, 2014 Kinney-Lang et al., 2016) while others suggest comparable results to adults (Zhang et al., 2019). Hybrid-BCIs may be one approach to help improve BCI accuracy.

    Hybrid-BCIs combine different types of brain signals with other physiological measures or another access method to improve classification accuracy by increasing the confidence in the predictions made by the system (Wolpaw and Wolpaw, 2012 Choi et al., 2017). Researchers at Holland Bloorview Kids Rehabilitation Hospital have already begun research in this area. For example, classification accuracy improved in a covert speech task using a hybrid functional near-infrared spectroscopy (fNIRS) and EEG-BCI rather than using either alone (Rezazadeh Sereshkeh et al., 2019). Similarly, classification accuracy improved in a verbal fluency task using a hybrid fNIRS and transcranial Doppler ultrasonography (TCD) BCI rather than either alone (Faress and Chau, 2013). Also, researchers have looked at readiness potentials as a cue for identifying when movements are voluntary rather than unintentional, such as those seen in persons with athetoid CP. These techniques could be used to determine if a switch selection was accidental or intentional (Zeid et al., 2017). Another area to be explored is monitoring the brain’s error-related potential, which occurs when a user knowingly observes an erroneous selection occurring, permitting a BCI system to self-regulate when a prediction was incorrect (Schalk et al., 2000). Techniques such as these appear to be applicable in youth and might be expanded and trialed across a pediatric BCI network.

    In addition to improving classification accuracy, hybrid BCIs can extend the control capacity of BCI systems. Hybrid BCIs can be used to initiate asynchronous control, with one of the systems used to activate the other. This is highlighted in Pfurtscheller et al. (2010) 𠇋rain switch” device, where motor imagery was used to turn on flashing LED lights for an SSVEP𠅌ontrolled hand orthosis. Similarly, hybrid BCIs can be used to separate target identification and target selection, with one system used to guide the user towards the desired target and the other to select the target, such as in combined eye-gaze and EEG BCIs (Kim et al., 2015). Hybrid BCIs can also be used to increase the dimensionality of target selection, combining motor imagery with visual evoked potentials to achieve two-dimensional cursor control (Ma et al., 2017). Whether improving classification accuracy or increasing functionality, hybrid BCIs facilitate the utilization of the strengths of different BCI modalities while mitigating their limitations. This approach may be fruitful in designing BCI systems that address the unique needs of pediatric users.

    Tapping Into Play: Games and Play in BCI

    Play is a critical part of learning and development for all children. Both neurotypical and neurodivergent children benefit more from participating in tasks that keep them interested, engaged, and provide embedded learning opportunities. Current BCI software, however, tends to focus on simple, utility-driven applications, such as spelling grids or moving a mouse cursor. While useful, such applications are limited in an appeal for sustained use, particularly for young BCI-users. Evidence indicates that increasing engagement in BCI through gamified learning may result in longer adoption of the technology while helping promote the practice of BCI control schemes (Powers et al., 2015 De Oliveira et al., 2016 Kinney-Lang et al., 2016 Mullins, 2017). A growing trend across BCI research endeavors reveal that more engaging, user-friendly activities may be able to promote a variety of tangible boons in BCI use𠅋oth in short-term task learning and long-term BCI accuracy (Perdikis et al., 2018 Edelman et al., 2019 Faller et al., 2019). Thus, there is a clear need to promote developing more captivating, accessible BCI software which incorporates essential elements of play into pediatric BCI.

    BCI systems offer unique and attractive opportunities for novel approaches to both virtual plays (e.g., videogames and digital mediums) and physical play (e.g., manipulation of toy robots, cars, et cetera). By tapping into the non-muscular nature of BCI, such systems may be able to provide previously excluded populations the opportunity to explore and learn through play. Previous research has demonstrated such mediums as important avenues that encourage continued learning and rehabilitation in children with disabilities (Harris and Reid, 2005 Howcroft et al., 2012 Hernandez et al., 2013 van den Heuvel et al., 2016). Advancements in BCI research furthering the interaction between BCI systems and play thus represent a promising untapped potential for pediatric BCI end-users.

    Current Technological and Practical Limitations of Modern BCIs

    Despite its significant potential, modern BCI systems and applications suffer from several limitations, which are only compounded when implementing programs with pediatric populations (Mikoajewska and Mikoajewski, 2014 Kinney-Lang et al., 2016 Letourneau et al., 2020). Such barriers exist across the BCI experience: from technological limitations, through design issues affecting user-experience, to long-term implementation challenges (Millán et al., 2010 Powers et al., 2015 Lazarou et al., 2018 Lotte et al., 2018). Highlighting some of these existing limitations may help shed light on potential pathways to investigate. At the technological level, currently, available BCI systems can be limited by their trade-off between accuracy, speed, and degrees of freedom for selection. Moreover, these systems are designed without consideration for children as potential end-users, leading to preconfigured signal processing and analysis schemes which may neglect neurophysiological differences between adults and children (Cowan et al., 2006 Vuckovic et al., 2014 Kinney-Lang et al., 2018a, b), At the user experience level, BCI may be limited in the comfort of headsets, slow “set-up” times, as well as difficulties in maintaining attention and control for extended periods due to fatigability. Finally, at the implementation level, relevant applications of interest may be limited for end-users, particularly children, as few long-term engagement applications have been explored with BCI systems leading to a potential drop in motivation and an increase in frustration as time is invested in learning the BCI control system.

    4. Discussion

    The paper reviewed nine realtime implementations of robotic and VR BCI paradigms developed by the BCI-lab research group. Realtime virtual reality agent and robotic device control scenarios have been explained. The novel information geometry-based MDM method have been introduced, which boosted the VR and robotic BCI accuracies in offline EEG processing applications.

    The previously reported VR and robotic BCI paradigms employed the SWLDA classifier that required larger number of averaged ERPs (10�), which resulted in slower interfacing speeds. The MDM classier introduced in this paper, and compared with the previous SWLDA method, required smaller ERP averaging scenarios (1𠄵) that allowed for perfect scores achievement for the majority of the tested, and reviewed in this paper, VR and robotic BCIs.

    The results obtained with the proposed MDM classifier where significantly better (p < 0.001), as tested with pairwise rank-sum Wilcoxon tests, comparing to the classical SWLDA method. The MDM-based classification boosting results were also independent of the presented oddball BCI stimulation modalities (auditory, tactile, bone𠄼onduction auditory, or mixed) and applications (VR or robotic), which have proven the proposed approach validity for the so expected by our society human augmentation solutions.


    Information has been the key to a better organization and new developments. The more information we have, the more optimally we can organize ourselves to deliver the best outcomes. That is why data collection is an important part for every organization. We can also use this data for the prediction of current trends of certain parameters and future events. As we are becoming more and more aware of this, we have started producing and collecting more data about almost everything by introducing technological developments in this direction. Today, we are facing a situation wherein we are flooded with tons of data from every aspect of our life such as social activities, science, work, health, etc. In a way, we can compare the present situation to a data deluge. The technological advances have helped us in generating more and more data, even to a level where it has become unmanageable with currently available technologies. This has led to the creation of the term ‘big data’ to describe data that is large and unmanageable. In order to meet our present and future social needs, we need to develop new strategies to organize this data and derive meaningful information. One such special social need is healthcare. Like every other industry, healthcare organizations are producing data at a tremendous rate that presents many advantages and challenges at the same time. In this review, we discuss about the basics of big data including its management, analysis and future prospects especially in healthcare sector.

    The data overload

    Every day, people working with various organizations around the world are generating a massive amount of data. The term “digital universe” quantitatively defines such massive amounts of data created, replicated, and consumed in a single year. International Data Corporation (IDC) estimated the approximate size of the digital universe in 2005 to be 130 exabytes (EB). The digital universe in 2017 expanded to about 16,000 EB or 16 zettabytes (ZB). IDC predicted that the digital universe would expand to 40,000 EB by the year 2020. To imagine this size, we would have to assign about 5200 gigabytes (GB) of data to all individuals. This exemplifies the phenomenal speed at which the digital universe is expanding. The internet giants, like Google and Facebook, have been collecting and storing massive amounts of data. For instance, depending on our preferences, Google may store a variety of information including user location, advertisement preferences, list of applications used, internet browsing history, contacts, bookmarks, emails, and other necessary information associated with the user. Similarly, Facebook stores and analyzes more than about 30 petabytes (PB) of user-generated data. Such large amounts of data constitute ‘big data’. Over the past decade, big data has been successfully used by the IT industry to generate critical information that can generate significant revenue.

    These observations have become so conspicuous that has eventually led to the birth of a new field of science termed ‘Data Science’. Data science deals with various aspects including data management and analysis, to extract deeper insights for improving the functionality or services of a system (for example, healthcare and transport system). Additionally, with the availability of some of the most creative and meaningful ways to visualize big data post-analysis, it has become easier to understand the functioning of any complex system. As a large section of society is becoming aware of, and involved in generating big data, it has become necessary to define what big data is. Therefore, in this review, we attempt to provide details on the impact of big data in the transformation of global healthcare sector and its impact on our daily lives.

    Defining big data

    As the name suggests, ‘big data’ represents large amounts of data that is unmanageable using traditional software or internet-based platforms. It surpasses the traditionally used amount of storage, processing and analytical power. Even though a number of definitions for big data exist, the most popular and well-accepted definition was given by Douglas Laney. Laney observed that (big) data was growing in three different dimensions namely, volume, velocity and variety (known as the 3 Vs) [1]. The ‘big’ part of big data is indicative of its large volume. In addition to volume, the big data description also includes velocity and variety. Velocity indicates the speed or rate of data collection and making it accessible for further analysis while, variety remarks on the different types of organized and unorganized data that any firm or system can collect, such as transaction-level data, video, audio, text or log files. These three Vs have become the standard definition of big data. Although, other people have added several other Vs to this definition [2], the most accepted 4th V remains ‘veracity’.

    The term “big data” has become extremely popular across the globe in recent years. Almost every sector of research, whether it relates to industry or academics, is generating and analyzing big data for various purposes. The most challenging task regarding this huge heap of data that can be organized and unorganized, is its management. Given the fact that big data is unmanageable using the traditional software, we need technically advanced applications and software that can utilize fast and cost-efficient high-end computational power for such tasks. Implementation of artificial intelligence (AI) algorithms and novel fusion algorithms would be necessary to make sense from this large amount of data. Indeed, it would be a great feat to achieve automated decision-making by the implementation of machine learning (ML) methods like neural networks and other AI techniques. However, in absence of appropriate software and hardware support, big data can be quite hazy. We need to develop better techniques to handle this ‘endless sea’ of data and smart web applications for efficient analysis to gain workable insights. With proper storage and analytical tools in hand, the information and insights derived from big data can make the critical social infrastructure components and services (like healthcare, safety or transportation) more aware, interactive and efficient [3]. In addition, visualization of big data in a user-friendly manner will be a critical factor for societal development.

    Healthcare as a big-data repository

    Healthcare is a multi-dimensional system established with the sole aim for the prevention, diagnosis, and treatment of health-related issues or impairments in human beings. The major components of a healthcare system are the health professionals (physicians or nurses), health facilities (clinics, hospitals for delivering medicines and other diagnosis or treatment technologies), and a financing institution supporting the former two. The health professionals belong to various health sectors like dentistry, medicine, midwifery, nursing, psychology, physiotherapy, and many others. Healthcare is required at several levels depending on the urgency of situation. Professionals serve it as the first point of consultation (for primary care), acute care requiring skilled professionals (secondary care), advanced medical investigation and treatment (tertiary care) and highly uncommon diagnostic or surgical procedures (quaternary care). At all these levels, the health professionals are responsible for different kinds of information such as patient’s medical history (diagnosis and prescriptions related data), medical and clinical data (like data from imaging and laboratory examinations), and other private or personal medical data. Previously, the common practice to store such medical records for a patient was in the form of either handwritten notes or typed reports [4]. Even the results from a medical examination were stored in a paper file system. In fact, this practice is really old, with the oldest case reports existing on a papyrus text from Egypt that dates back to 1600 BC [5]. In Stanley Reiser’s words, the clinical case records freeze the episode of illness as a story in which patient, family and the doctor are a part of the plot” [6].

    With the advent of computer systems and its potential, the digitization of all clinical exams and medical records in the healthcare systems has become a standard and widely adopted practice nowadays. In 2003, a division of the National Academies of Sciences, Engineering, and Medicine known as Institute of Medicine chose the term “electronic health records” to represent records maintained for improving the health care sector towards the benefit of patients and clinicians. Electronic health records (EHR) as defined by Murphy, Hanken and Waters are computerized medical records for patients any information relating to the past, present or future physical/mental health or condition of an individual which resides in electronic system(s) used to capture, transmit, receive, store, retrieve, link and manipulate multimedia data for the primary purpose of providing healthcare and health-related services” [7].

    Electronic health records

    It is important to note that the National Institutes of Health (NIH) recently announced the “All of Us” initiative ( that aims to collect one million or more patients’ data such as EHR, including medical imaging, socio-behavioral, and environmental data over the next few years. EHRs have introduced many advantages for handling modern healthcare related data. Below, we describe some of the characteristic advantages of using EHRs. The first advantage of EHRs is that healthcare professionals have an improved access to the entire medical history of a patient. The information includes medical diagnoses, prescriptions, data related to known allergies, demographics, clinical narratives, and the results obtained from various laboratory tests. The recognition and treatment of medical conditions thus is time efficient due to a reduction in the lag time of previous test results. With time we have observed a significant decrease in the redundant and additional examinations, lost orders and ambiguities caused by illegible handwriting, and an improved care coordination between multiple healthcare providers. Overcoming such logistical errors has led to reduction in the number of drug allergies by reducing errors in medication dose and frequency. Healthcare professionals have also found access over web based and electronic platforms to improve their medical practices significantly using automatic reminders and prompts regarding vaccinations, abnormal laboratory results, cancer screening, and other periodic checkups. There would be a greater continuity of care and timely interventions by facilitating communication among multiple healthcare providers and patients. They can be associated to electronic authorization and immediate insurance approvals due to less paperwork. EHRs enable faster data retrieval and facilitate reporting of key healthcare quality indicators to the organizations, and also improve public health surveillance by immediate reporting of disease outbreaks. EHRs also provide relevant data regarding the quality of care for the beneficiaries of employee health insurance programs and can help control the increasing costs of health insurance benefits. Finally, EHRs can reduce or absolutely eliminate delays and confusion in the billing and claims management area. The EHRs and internet together help provide access to millions of health-related medical information critical for patient life.

    Digitization of healthcare and big data

    Similar to EHR, an electronic medical record (EMR) stores the standard medical and clinical data gathered from the patients. EHRs, EMRs, personal health record (PHR), medical practice management software (MPM), and many other healthcare data components collectively have the potential to improve the quality, service efficiency, and costs of healthcare along with the reduction of medical errors. The big data in healthcare includes the healthcare payer-provider data (such as EMRs, pharmacy prescription, and insurance records) along with the genomics-driven experiments (such as genotyping, gene expression data) and other data acquired from the smart web of internet of things (IoT) (Fig. 1). The adoption of EHRs was slow at the beginning of the 21st century however it has grown substantially after 2009 [7, 8]. The management and usage of such healthcare data has been increasingly dependent on information technology. The development and usage of wellness monitoring devices and related software that can generate alerts and share the health related data of a patient with the respective health care providers has gained momentum, especially in establishing a real-time biomedical and health monitoring system. These devices are generating a huge amount of data that can be analyzed to provide real-time clinical or medical care [9]. The use of big data from healthcare shows promise for improving health outcomes and controlling costs.

    Workflow of Big data Analytics. Data warehouses store massive amounts of data generated from various sources. This data is processed using analytic pipelines to obtain smarter and affordable healthcare options

    Big data in biomedical research

    A biological system, such as a human cell, exhibits molecular and physical events of complex interplay. In order to understand interdependencies of various components and events of such a complex system, a biomedical or biological experiment usually gathers data on a smaller and/or simpler component. Consequently, it requires multiple simplified experiments to generate a wide map of a given biological phenomenon of interest. This indicates that more the data we have, the better we understand the biological processes. With this idea, modern techniques have evolved at a great pace. For instance, one can imagine the amount of data generated since the integration of efficient technologies like next-generation sequencing (NGS) and Genome wide association studies (GWAS) to decode human genetics. NGS-based data provides information at depths that were previously inaccessible and takes the experimental scenario to a completely new dimension. It has increased the resolution at which we observe or record biological events associated with specific diseases in a real time manner. The idea that large amounts of data can provide us a good amount of information that often remains unidentified or hidden in smaller experimental methods has ushered-in the ‘-omics’ era. The ‘omics’ discipline has witnessed significant progress as instead of studying a single ‘gene’ scientists can now study the whole ‘genome’ of an organism in ‘genomics’ studies within a given amount of time. Similarly, instead of studying the expression or ‘transcription’ of single gene, we can now study the expression of all the genes or the entire ‘transcriptome’ of an organism under ‘transcriptomics’ studies. Each of these individual experiments generate a large amount of data with more depth of information than ever before. Yet, this depth and resolution might be insufficient to provide all the details required to explain a particular mechanism or event. Therefore, one usually finds oneself analyzing a large amount of data obtained from multiple experiments to gain novel insights. This fact is supported by a continuous rise in the number of publications regarding big data in healthcare (Fig. 2). Analysis of such big data from medical and healthcare systems can be of immense help in providing novel strategies for healthcare. The latest technological developments in data generation, collection and analysis, have raised expectations towards a revolution in the field of personalized medicine in near future.

    Publications associated with big data in healthcare. The numbers of publications in PubMed are plotted by year

    Big data from omics studies

    NGS has greatly simplified the sequencing and decreased the costs for generating whole genome sequence data. The cost of complete genome sequencing has fallen from millions to a couple of thousand dollars [10]. NGS technology has resulted in an increased volume of biomedical data that comes from genomic and transcriptomic studies. According to an estimate, the number of human genomes sequenced by 2025 could be between 100 million to 2 billion [11]. Combining the genomic and transcriptomic data with proteomic and metabolomic data can greatly enhance our knowledge about the individual profile of a patient—an approach often ascribed as “individual, personalized or precision health care”. Systematic and integrative analysis of omics data in conjugation with healthcare analytics can help design better treatment strategies towards precision and personalized medicine (Fig. 3). The genomics-driven experiments e.g., genotyping, gene expression, and NGS-based studies are the major source of big data in biomedical healthcare along with EMRs, pharmacy prescription information, and insurance records. Healthcare requires a strong integration of such biomedical data from various sources to provide better treatments and patient care. These prospects are so exciting that even though genomic data from patients would have many variables to be accounted, yet commercial organizations are already using human genome data to help the providers in making personalized medical decisions. This might turn out to be a game-changer in future medicine and health.

    A framework for integrating omics data and health care analytics to promote personalized treatment

    Internet of Things (IOT)

    Healthcare industry has not been quick enough to adapt to the big data movement compared to other industries. Therefore, big data usage in the healthcare sector is still in its infancy. For example, healthcare and biomedical big data have not yet converged to enhance healthcare data with molecular pathology. Such convergence can help unravel various mechanisms of action or other aspects of predictive biology. Therefore, to assess an individual’s health status, biomolecular and clinical datasets need to be married. One such source of clinical data in healthcare is ‘internet of things’ (IoT).

    In fact, IoT is another big player implemented in a number of other industries including healthcare. Until recently, the objects of common use such as cars, watches, refrigerators and health-monitoring devices, did not usually produce or handle data and lacked internet connectivity. However, furnishing such objects with computer chips and sensors that enable data collection and transmission over internet has opened new avenues. The device technologies such as Radio Frequency IDentification (RFID) tags and readers, and Near Field Communication (NFC) devices, that can not only gather information but interact physically, are being increasingly used as the information and communication systems [3]. This enables objects with RFID or NFC to communicate and function as a web of smart things. The analysis of data collected from these chips or sensors may reveal critical information that might be beneficial in improving lifestyle, establishing measures for energy conservation, improving transportation, and healthcare. In fact, IoT has become a rising movement in the field of healthcare. IoT devices create a continuous stream of data while monitoring the health of people (or patients) which makes these devices a major contributor to big data in healthcare. Such resources can interconnect various devices to provide a reliable, effective and smart healthcare service to the elderly and patients with a chronic illness [12].

    Advantages of IoT in healthcare

    Using the web of IoT devices, a doctor can measure and monitor various parameters from his/her clients in their respective locations for example, home or office. Therefore, through early intervention and treatment, a patient might not need hospitalization or even visit the doctor resulting in significant cost reduction in healthcare expenses. Some examples of IoT devices used in healthcare include fitness or health-tracking wearable devices, biosensors, clinical devices for monitoring vital signs, and others types of devices or clinical instruments. Such IoT devices generate a large amount of health related data. If we can integrate this data with other existing healthcare data like EMRs or PHRs, we can predict a patients’ health status and its progression from subclinical to pathological state [9]. In fact, big data generated from IoT has been quiet advantageous in several areas in offering better investigation and predictions. On a larger scale, the data from such devices can help in personnel health monitoring, modelling the spread of a disease and finding ways to contain a particular disease outbreak.

    The analysis of data from IoT would require an updated operating software because of its specific nature along with advanced hardware and software applications. We would need to manage data inflow from IoT instruments in real-time and analyze it by the minute. Associates in the healthcare system are trying to trim down the cost and ameliorate the quality of care by applying advanced analytics to both internally and externally generated data.

    Mobile computing and mobile health (mHealth)

    In today’s digital world, every individual seems to be obsessed to track their fitness and health statistics using the in-built pedometer of their portable and wearable devices such as, smartphones, smartwatches, fitness dashboards or tablets. With an increasingly mobile society in almost all aspects of life, the healthcare infrastructure needs remodeling to accommodate mobile devices [13]. The practice of medicine and public health using mobile devices, known as mHealth or mobile health, pervades different degrees of health care especially for chronic diseases, such as diabetes and cancer [14]. Healthcare organizations are increasingly using mobile health and wellness services for implementing novel and innovative ways to provide care and coordinate health as well as wellness. Mobile platforms can improve healthcare by accelerating interactive communication between patients and healthcare providers. In fact, Apple and Google have developed devoted platforms like Apple’s ResearchKit and Google Fit for developing research applications for fitness and health statistics [15]. These applications support seamless interaction with various consumer devices and embedded sensors for data integration. These apps help the doctors to have direct access to your overall health data. Both the user and their doctors get to know the real-time status of your body. These apps and smart devices also help by improving our wellness planning and encouraging healthy lifestyles. The users or patients can become advocates for their own health.

    Nature of the big data in healthcare

    EHRs can enable advanced analytics and help clinical decision-making by providing enormous data. However, a large proportion of this data is currently unstructured in nature. An unstructured data is the information that does not adhere to a pre-defined model or organizational framework. The reason for this choice may simply be that we can record it in a myriad of formats. Another reason for opting unstructured format is that often the structured input options (drop-down menus, radio buttons, and check boxes) can fall short for capturing data of complex nature. For example, we cannot record the non-standard data regarding a patient’s clinical suspicions, socioeconomic data, patient preferences, key lifestyle factors, and other related information in any other way but an unstructured format. It is difficult to group such varied, yet critical, sources of information into an intuitive or unified data format for further analysis using algorithms to understand and leverage the patients care. Nonetheless, the healthcare industry is required to utilize the full potential of these rich streams of information to enhance the patient experience. In the healthcare sector, it could materialize in terms of better management, care and low-cost treatments. We are miles away from realizing the benefits of big data in a meaningful way and harnessing the insights that come from it. In order to achieve these goals, we need to manage and analyze the big data in a systematic manner.

    Management and analysis of big data

    Big data is the huge amounts of a variety of data generated at a rapid rate. The data gathered from various sources is mostly required for optimizing consumer services rather than consumer consumption. This is also true for big data from the biomedical research and healthcare. The major challenge with big data is how to handle this large volume of information. To make it available for scientific community, the data is required to be stored in a file format that is easily accessible and readable for an efficient analysis. In the context of healthcare data, another major challenge is the implementation of high-end computing tools, protocols and high-end hardware in the clinical setting. Experts from diverse backgrounds including biology, information technology, statistics, and mathematics are required to work together to achieve this goal. The data collected using the sensors can be made available on a storage cloud with pre-installed software tools developed by analytic tool developers. These tools would have data mining and ML functions developed by AI experts to convert the information stored as data into knowledge. Upon implementation, it would enhance the efficiency of acquiring, storing, analyzing, and visualization of big data from healthcare. The main task is to annotate, integrate, and present this complex data in an appropriate manner for a better understanding. In absence of such relevant information, the (healthcare) data remains quite cloudy and may not lead the biomedical researchers any further. Finally, visualization tools developed by computer graphics designers can efficiently display this newly gained knowledge.

    Heterogeneity of data is another challenge in big data analysis. The huge size and highly heterogeneous nature of big data in healthcare renders it relatively less informative using the conventional technologies. The most common platforms for operating the software framework that assists big data analysis are high power computing clusters accessed via grid computing infrastructures. Cloud computing is such a system that has virtualized storage technologies and provides reliable services. It offers high reliability, scalability and autonomy along with ubiquitous access, dynamic resource discovery and composability. Such platforms can act as a receiver of data from the ubiquitous sensors, as a computer to analyze and interpret the data, as well as providing the user with easy to understand web-based visualization. In IoT, the big data processing and analytics can be performed closer to data source using the services of mobile edge computing cloudlets and fog computing. Advanced algorithms are required to implement ML and AI approaches for big data analysis on computing clusters. A programming language suitable for working on big data (e.g. Python, R or other languages) could be used to write such algorithms or software. Therefore, a good knowledge of biology and IT is required to handle the big data from biomedical research. Such a combination of both the trades usually fits for bioinformaticians. The most common among various platforms used for working with big data include Hadoop and Apache Spark. We briefly introduce these platforms below.


    Loading large amounts of (big) data into the memory of even the most powerful of computing clusters is not an efficient way to work with big data. Therefore, the best logical approach for analyzing huge volumes of complex big data is to distribute and process it in parallel on multiple nodes. However, the size of data is usually so large that thousands of computing machines are required to distribute and finish processing in a reasonable amount of time. When working with hundreds or thousands of nodes, one has to handle issues like how to parallelize the computation, distribute the data, and handle failures. One of most popular open-source distributed application for this purpose is Hadoop [16]. Hadoop implements MapReduce algorithm for processing and generating large datasets. MapReduce uses map and reduce primitives to map each logical record’ in the input into a set of intermediate key/value pairs, and reduce operation combines all the values that shared the same key [17]. It efficiently parallelizes the computation, handles failures, and schedules inter-machine communication across large-scale clusters of machines. Hadoop Distributed File System (HDFS) is the file system component that provides a scalable, efficient, and replica based storage of data at various nodes that form a part of a cluster [16]. Hadoop has other tools that enhance the storage and processing components therefore many large companies like Yahoo, Facebook, and others have rapidly adopted it. Hadoop has enabled researchers to use data sets otherwise impossible to handle. Many large projects, like the determination of a correlation between the air quality data and asthma admissions, drug development using genomic and proteomic data, and other such aspects of healthcare are implementing Hadoop. Therefore, with the implementation of Hadoop system, the healthcare analytics will not be held back.

    Apache Spark

    Apache Spark is another open source alternative to Hadoop. It is a unified engine for distributed data processing that includes higher-level libraries for supporting SQL queries (Spark SQL), streaming data (Spark Streaming), machine learning (MLlib) and graph processing (GraphX) [18]. These libraries help in increasing developer productivity because the programming interface requires lesser coding efforts and can be seamlessly combined to create more types of complex computations. By implementing Resilient distributed Datasets (RDDs), in-memory processing of data is supported that can make Spark about 100× faster than Hadoop in multi-pass analytics (on smaller datasets) [19, 20]. This is more true when the data size is smaller than the available memory [21]. This indicates that processing of really big data with Apache Spark would require a large amount of memory. Since, the cost of memory is higher than the hard drive, MapReduce is expected to be more cost effective for large datasets compared to Apache Spark. Similarly, Apache Storm was developed to provide a real-time framework for data stream processing. This platform supports most of the programming languages. Additionally, it offers good horizontal scalability and built-in-fault-tolerance capability for big data analysis.

    Machine learning for information extraction, data analysis and predictions

    In healthcare, patient data contains recorded signals for instance, electrocardiogram (ECG), images, and videos. Healthcare providers have barely managed to convert such healthcare data into EHRs. Efforts are underway to digitize patient-histories from pre-EHR era notes and supplement the standardization process by turning static images into machine-readable text. For example, optical character recognition (OCR) software is one such approach that can recognize handwriting as well as computer fonts and push digitization. Such unstructured and structured healthcare datasets have untapped wealth of information that can be harnessed using advanced AI programs to draw critical actionable insights in the context of patient care. In fact, AI has emerged as the method of choice for big data applications in medicine. This smart system has quickly found its niche in decision making process for the diagnosis of diseases. Healthcare professionals analyze such data for targeted abnormalities using appropriate ML approaches. ML can filter out structured information from such raw data.

    Extracting information from EHR datasets

    Emerging ML or AI based strategies are helping to refine healthcare industry’s information processing capabilities. For example, natural language processing (NLP) is a rapidly developing area of machine learning that can identify key syntactic structures in free text, help in speech recognition and extract the meaning behind a narrative. NLP tools can help generate new documents, like a clinical visit summary, or to dictate clinical notes. The unique content and complexity of clinical documentation can be challenging for many NLP developers. Nonetheless, we should be able to extract relevant information from healthcare data using such approaches as NLP.

    AI has also been used to provide predictive capabilities to healthcare big data. For example, ML algorithms can convert the diagnostic system of medical images into automated decision-making. Though it is apparent that healthcare professionals may not be replaced by machines in the near future, yet AI can definitely assist physicians to make better clinical decisions or even replace human judgment in certain functional areas of healthcare.

    Image analytics

    Some of the most widely used imaging techniques in healthcare include computed tomography (CT), magnetic resonance imaging (MRI), X-ray, molecular imaging, ultrasound, photo-acoustic imaging, functional MRI (fMRI), positron emission tomography (PET), electroencephalography (EEG), and mammograms. These techniques capture high definition medical images (patient data) of large sizes. Healthcare professionals like radiologists, doctors and others do an excellent job in analyzing medical data in the form of these files for targeted abnormalities. However, it is also important to acknowledge the lack of specialized professionals for many diseases. In order to compensate for this dearth of professionals, efficient systems like Picture Archiving and Communication System (PACS) have been developed for storing and convenient access to medical image and reports data [22]. PACSs are popular for delivering images to local workstations, accomplished by protocols such as digital image communication in medicine (DICOM). However, data exchange with a PACS relies on using structured data to retrieve medical images. This by nature misses out on the unstructured information contained in some of the biomedical images. Moreover, it is possible to miss an additional information about a patient’s health status that is present in these images or similar data. A professional focused on diagnosing an unrelated condition might not observe it, especially when the condition is still emerging. To help in such situations, image analytics is making an impact on healthcare by actively extracting disease biomarkers from biomedical images. This approach uses ML and pattern recognition techniques to draw insights from massive volumes of clinical image data to transform the diagnosis, treatment and monitoring of patients. It focuses on enhancing the diagnostic capability of medical imaging for clinical decision-making.

    A number of software tools have been developed based on functionalities such as generic, registration, segmentation, visualization, reconstruction, simulation and diffusion to perform medical image analysis in order to dig out the hidden information. For example, Visualization Toolkit is a freely available software which allows powerful processing and analysis of 3D images from medical tests [23], while SPM can process and analyze 5 different types of brain images (e.g. MRI, fMRI, PET, CT-Scan and EEG) [24]. Other software like GIMIAS, Elastix, and MITK support all types of images. Various other widely used tools and their features in this domain are listed in Table 1. Such bioinformatics-based big data analysis may extract greater insights and value from imaging data to boost and support precision medicine projects, clinical decision support tools, and other modes of healthcare. For example, we can also use it to monitor new targeted-treatments for cancer.

    Big data from omics

    The big data from “omics” studies is a new kind of challenge for the bioinformaticians. Robust algorithms are required to analyze such complex data from biological systems. The ultimate goal is to convert this huge data into an informative knowledge base. The application of bioinformatics approaches to transform the biomedical and genomics data into predictive and preventive health is known as translational bioinformatics. It is at the forefront of data-driven healthcare. Various kinds of quantitative data in healthcare, for example from laboratory measurements, medication data and genomic profiles, can be combined and used to identify new meta-data that can help precision therapies [25]. This is why emerging new technologies are required to help in analyzing this digital wealth. In fact, highly ambitious multimillion-dollar projects like “Big Data Research and Development Initiative” have been launched that aim to enhance the quality of big data tools and techniques for a better organization, efficient access and smart analysis of big data. There are many advantages anticipated from the processing of ‘omics’ data from large-scale Human Genome Project and other population sequencing projects. In the population sequencing projects like 1000 genomes, the researchers will have access to a marvelous amount of raw data. Similarly, Human Genome Project based Encyclopedia of DNA Elements (ENCODE) project aimed to determine all functional elements in the human genome using bioinformatics approaches. Here, we list some of the widely used bioinformatics-based tools for big data analytics on omics data.

    SparkSeq is an efficient and cloud-ready platform based on Apache Spark framework and Hadoop library that is used for analyses of genomic data for interactive genomic data analysis with nucleotide precision

    SAMQA identifies errors and ensures the quality of large-scale genomic data. This tool was originally built for the National Institutes of Health Cancer Genome Atlas project to identify and report errors including sequence alignment/map [SAM] format error and empty reads.

    ART can simulate profiles of read errors and read lengths for data obtained using high throughput sequencing platforms including SOLiD and Illumina platforms.

    DistMap is another toolkit used for distributed short-read mapping based on Hadoop cluster that aims to cover a wider range of sequencing applications. For instance, one of its applications namely the BWA mapper can perform 500 million read pairs in about 6 h, approximately 13 times faster than a conventional single-node mapper.

    SeqWare is a query engine based on Apache HBase database system that enables access for large-scale whole-genome datasets by integrating genome browsers and tools.

    CloudBurst is a parallel computing model utilized in genome mapping experiments to improve the scalability of reading large sequencing data.

    Hydra uses the Hadoop-distributed computing framework for processing large peptide and spectra databases for proteomics datasets. This specific tool is capable of performing 27 billion peptide scorings in less than 60 min on a Hadoop cluster.

    BlueSNP is an R package based on Hadoop platform used for genome-wide association studies (GWAS) analysis, primarily aiming on the statistical readouts to obtain significant associations between genotype–phenotype datasets. The efficiency of this tool is estimated to analyze 1000 phenotypes on 10 6 SNPs in 10 4 individuals in a duration of half-an-hour.

    Myrna the cloud-based pipeline, provides information on the expression level differences of genes, including read alignments, data normalization, and statistical modeling.

    The past few years have witnessed a tremendous increase in disease specific datasets from omics platforms. For example, the ArrayExpress Archive of Functional Genomics data repository contains information from approximately 30,000 experiments and more than one million functional assays. The growing amount of data demands for better and efficient bioinformatics driven packages to analyze and interpret the information obtained. This has also led to the birth of specific tools to analyze such massive amounts of data. Below, we mention some of the most popular commercial platforms for big data analytics.

    Commercial platforms for healthcare data analytics

    In order to tackle big data challenges and perform smoother analytics, various companies have implemented AI to analyze published results, textual data, and image data to obtain meaningful outcomes. IBM Corporation is one of the biggest and experienced players in this sector to provide healthcare analytics services commercially. IBM’s Watson Health is an AI platform to share and analyze health data among hospitals, providers and researchers. Similarly, Flatiron Health provides technology-oriented services in healthcare analytics specially focused in cancer research. Other big companies such as Oracle Corporation and Google Inc. are also focusing to develop cloud-based storage and distributed computing power platforms. Interestingly, in the recent few years, several companies and start-ups have also emerged to provide health care-based analytics and solutions. Some of the vendors in healthcare sector are provided in Table 2. Below we discuss a few of these commercial solutions.


    Ayasdi is one such big vendor which focuses on ML based methodologies to primarily provide machine intelligence platform along with an application framework with tried & tested enterprise scalability. It provides various applications for healthcare analytics, for example, to understand and manage clinical variation, and to transform clinical care costs. It is also capable of analyzing and managing how hospitals are organized, conversation between doctors, risk-oriented decisions by doctors for treatment, and the care they deliver to patients. It also provides an application for the assessment and management of population health, a proactive strategy that goes beyond traditional risk analysis methodologies. It uses ML intelligence for predicting future risk trajectories, identifying risk drivers, and providing solutions for best outcomes. A strategic illustration of the company’s methodology for analytics is provided in Fig. 4.

    Illustration of application of “Intelligent Application Suite” provided by AYASDI for various analyses such as clinical variation, population health, and risk management in healthcare sector


    It is an NLP based algorithm that relies on an interactive text mining algorithm (I2E). I2E can extract and analyze a wide array of information. Results obtained using this technique are tenfold faster than other tools and does not require expert knowledge for data interpretation. This approach can provide information on genetic relationships and facts from unstructured data. Classical, ML requires well-curated data as input to generate clean and filtered results. However, NLP when integrated in EHR or clinical records per se facilitates the extraction of clean and structured information that often remains hidden in unstructured input data (Fig. 5).

    Schematic representation for the working principle of NLP-based AI system used in massive data retention and analysis in Linguamatics

    IBM Watson

    This is one of the unique ideas of the tech-giant IBM that targets big data analytics in almost every professional sector. This platform utilizes ML and AI based algorithms extensively to extract the maximum information from minimal input. IBM Watson enforces the regimen of integrating a wide array of healthcare domains to provide meaningful and structured data (Fig. 6). In an attempt to uncover novel drug targets specifically in cancer disease model, IBM Watson and Pfizer have formed a productive collaboration to accelerate the discovery of novel immune-oncology combinations. Combining Watson’s deep learning modules integrated with AI technologies allows the researchers to interpret complex genomic data sets. IBM Watson has been used to predict specific types of cancer based on the gene expression profiles obtained from various large data sets providing signs of multiple druggable targets. IBM Watson is also used in drug discovery programs by integrating curated literature and forming network maps to provide a detailed overview of the molecular landscape in a specific disease model.

    IBM Watson in healthcare data analytics. Schematic representation of the various functional modules in IBM Watson’s big-data healthcare package. For instance, the drug discovery domain involves network of highly coordinated data acquisition and analysis within the spectrum of curating database to building meaningful pathways towards elucidating novel druggable targets

    In order to analyze the diversified medical data, healthcare domain, describes analytics in four categories: descriptive, diagnostic, predictive, and prescriptive analytics. Descriptive analytics refers for describing the current medical situations and commenting on that whereas diagnostic analysis explains reasons and factors behind occurrence of certain events, for example, choosing treatment option for a patient based on clustering and decision trees. Predictive analytics focuses on predictive ability of the future outcomes by determining trends and probabilities. These methods are mainly built up of machine leaning techniques and are helpful in the context of understanding complications that a patient can develop. Prescriptive analytics is to perform analysis to propose an action towards optimal decision making. For example, decision of avoiding a given treatment to the patient based on observed side effects and predicted complications. In order to improve performance of the current medical systems integration of big data into healthcare analytics can be a major factor however, sophisticated strategies need to be developed. An architecture of best practices of different analytics in healthcare domain is required for integrating big data technologies to improve the outcomes. However, there are many challenges associated with the implementation of such strategies.

    Challenges associated with healthcare big data

    Methods for big data management and analysis are being continuously developed especially for real-time data streaming, capture, aggregation, analytics (using ML and predictive), and visualization solutions that can help integrate a better utilization of EMRs with the healthcare. For example, the EHR adoption rate of federally tested and certified EHR programs in the healthcare sector in the U.S.A. is nearly complete [7]. However, the availability of hundreds of EHR products certified by the government, each with different clinical terminologies, technical specifications, and functional capabilities has led to difficulties in the interoperability and sharing of data. Nonetheless, we can safely say that the healthcare industry has entered into a ‘post-EMR’ deployment phase. Now, the main objective is to gain actionable insights from these vast amounts of data collected as EMRs. Here, we discuss some of these challenges in brief.


    Storing large volume of data is one of the primary challenges, but many organizations are comfortable with data storage on their own premises. It has several advantages like control over security, access, and up-time. However, an on-site server network can be expensive to scale and difficult to maintain. It appears that with decreasing costs and increasing reliability, the cloud-based storage using IT infrastructure is a better option which most of the healthcare organizations have opted for. Organizations must choose cloud-partners that understand the importance of healthcare-specific compliance and security issues. Additionally, cloud storage offers lower up-front costs, nimble disaster recovery, and easier expansion. Organizations can also have a hybrid approach to their data storage programs, which may be the most flexible and workable approach for providers with varying data access and storage needs.


    The data needs to cleansed or scrubbed to ensure the accuracy, correctness, consistency, relevancy, and purity after acquisition. This cleaning process can be manual or automatized using logic rules to ensure high levels of accuracy and integrity. More sophisticated and precise tools use machine-learning techniques to reduce time and expenses and to stop foul data from derailing big data projects.

    Unified format

    Patients produce a huge volume of data that is not easy to capture with traditional EHR format, as it is knotty and not easily manageable. It is too difficult to handle big data especially when it comes without a perfect data organization to the healthcare providers. A need to codify all the clinically relevant information surfaced for the purpose of claims, billing purposes, and clinical analytics. Therefore, medical coding systems like Current Procedural Terminology (CPT) and International Classification of Diseases (ICD) code sets were developed to represent the core clinical concepts. However, these code sets have their own limitations.


    Some studies have observed that the reporting of patient data into EMRs or EHRs is not entirely accurate yet [26,27,28,29], probably because of poor EHR utility, complex workflows, and a broken understanding of why big data is all-important to capture well. All these factors can contribute to the quality issues for big data all along its lifecycle. The EHRs intend to improve the quality and communication of data in clinical workflows though reports indicate discrepancies in these contexts. The documentation quality might improve by using self-report questionnaires from patients for their symptoms.

    Image pre-processing

    Studies have observed various physical factors that can lead to altered data quality and misinterpretations from existing medical records [30]. Medical images often suffer technical barriers that involve multiple types of noise and artifacts. Improper handling of medical images can also cause tampering of images for instance might lead to delineation of anatomical structures such as veins which is non-correlative with real case scenario. Reduction of noise, clearing artifacts, adjusting contrast of acquired images and image quality adjustment post mishandling are some of the measures that can be implemented to benefit the purpose.


    There have been many security breaches, hackings, phishing attacks, and ransomware episodes that data security is a priority for healthcare organizations. After noticing an array of vulnerabilities, a list of technical safeguards was developed for the protected health information (PHI). These rules, termed as HIPAA Security Rules, help guide organizations with storing, transmission, authentication protocols, and controls over access, integrity, and auditing. Common security measures like using up-to-date anti-virus software, firewalls, encrypting sensitive data, and multi-factor authentication can save a lot of trouble.


    To have a successful data governance plan, it would be mandatory to have complete, accurate, and up-to-date metadata regarding all the stored data. The metadata would be composed of information like time of creation, purpose and person responsible for the data, previous usage (by who, why, how, and when) for researchers and data analysts. This would allow analysts to replicate previous queries and help later scientific studies and accurate benchmarking. This increases the usefulness of data and prevents creation of “data dumpsters” of low or no use.


    Metadata would make it easier for organizations to query their data and get some answers. However, in absence of proper interoperability between datasets the query tools may not access an entire repository of data. Also, different components of a dataset should be well interconnected or linked and easily accessible otherwise a complete portrait of an individual patient’s health may not be generated. Medical coding systems like ICD-10, SNOMED-CT, or LOINC must be implemented to reduce free-form concepts into a shared ontology. If the accuracy, completeness, and standardization of the data are not in question, then Structured Query Language (SQL) can be used to query large datasets and relational databases.


    A clean and engaging visualization of data with charts, heat maps, and histograms to illustrate contrasting figures and correct labeling of information to reduce potential confusion, can make it much easier for us to absorb information and use it appropriately. Other examples include bar charts, pie charts, and scatterplots with their own specific ways to convey the data.

    Data sharing

    Patients may or may not receive their care at multiple locations. In the former case, sharing data with other healthcare organizations would be essential. During such sharing, if the data is not interoperable then data movement between disparate organizations could be severely curtailed. This could be due to technical and organizational barriers. This may leave clinicians without key information for making decisions regarding follow-ups and treatment strategies for patients. Solutions like Fast Healthcare Interoperability Resource (FHIR) and public APIs, CommonWell (a not-for-profit trade association) and Carequality (a consensus-built, common interoperability framework) are making data interoperability and sharing easy and secure. The biggest roadblock for data sharing is the treatment of data as a commodity that can provide a competitive advantage. Therefore, sometimes both providers and vendors intentionally interfere with the flow of information to block the information flow between different EHR systems [31].

    The healthcare providers will need to overcome every challenge on this list and more to develop a big data exchange ecosystem that provides trustworthy, timely, and meaningful information by connecting all members of the care continuum. Time, commitment, funding, and communication would be required before these challenges are overcome.

    Big data analytics for cutting costs

    To develop a healthcare system based on big data that can exchange big data and provides us with trustworthy, timely, and meaningful information, we need to overcome every challenge mentioned above. Overcoming these challenges would require investment in terms of time, funding, and commitment. However, like other technological advances, the success of these ambitious steps would apparently ease the present burdens on healthcare especially in terms of costs. It is believed that the implementation of big data analytics by healthcare organizations might lead to a saving of over 25% in annual costs in the coming years. Better diagnosis and disease predictions by big data analytics can enable cost reduction by decreasing the hospital readmission rate. The healthcare firms do not understand the variables responsible for readmissions well enough. It would be easier for healthcare organizations to improve their protocols for dealing with patients and prevent readmission by determining these relationships well. Big data analytics can also help in optimizing staffing, forecasting operating room demands, streamlining patient care, and improving the pharmaceutical supply chain. All of these factors will lead to an ultimate reduction in the healthcare costs by the organizations.

    Quantum mechanics and big data analysis

    Big data sets can be staggering in size. Therefore, its analysis remains daunting even with the most powerful modern computers. For most of the analysis, the bottleneck lies in the computer’s ability to access its memory and not in the processor [32, 33]. The capacity, bandwidth or latency requirements of memory hierarchy outweigh the computational requirements so much that supercomputers are increasingly used for big data analysis [34, 35]. An additional solution is the application of quantum approach for big data analysis.

    Quantum computing and its advantages

    The common digital computing uses binary digits to code for the data whereas quantum computation uses quantum bits or qubits [36]. A qubit is a quantum version of the classical binary bits that can represent a zero, a one, or any linear combination of states (called superpositions) of those two qubit states [37]. Therefore, qubits allow computer bits to operate in three states compared to two states in the classical computation. This allows quantum computers to work thousands of times faster than regular computers. For example, a conventional analysis of a dataset with n points would require 2 n processing units whereas it would require just n quantum bits using a quantum computer. Quantum computers use quantum mechanical phenomena like superposition and quantum entanglement to perform computations [38, 39].

    Quantum algorithms can speed-up the big data analysis exponentially [40]. Some complex problems, believed to be unsolvable using conventional computing, can be solved by quantum approaches. For example, the current encryption techniques such as RSA, public-key (PK) and Data Encryption Standard (DES) which are thought to be impassable now would be irrelevant in future because quantum computers will quickly get through them [41]. Quantum approaches can dramatically reduce the information required for big data analysis. For example, quantum theory can maximize the distinguishability between a multilayer network using a minimum number of layers [42]. In addition, quantum approaches require a relatively small dataset to obtain a maximally sensitive data analysis compared to the conventional (machine-learning) techniques. Therefore, quantum approaches can drastically reduce the amount of computational power required to analyze big data. Even though, quantum computing is still in its infancy and presents many open challenges, it is being implemented for healthcare data.

    Applications in big data analysis

    Quantum computing is picking up and seems to be a potential solution for big data analysis. For example, identification of rare events, such as the production of Higgs bosons at the Large Hadron Collider (LHC) can now be performed using quantum approaches [43]. At LHC, huge amounts of collision data (1PB/s) is generated that needs to be filtered and analyzed. One such approach, the quantum annealing for ML (QAML) that implements a combination of ML and quantum computing with a programmable quantum annealer, helps reduce human intervention and increase the accuracy of assessing particle-collision data. In another example, the quantum support vector machine was implemented for both training and classification stages to classify new data [44]. Such quantum approaches could find applications in many areas of science [43]. Indeed, recurrent quantum neural network (RQNN) was implemented to increase signal separability in electroencephalogram (EEG) signals [45]. Similarly, quantum annealing was applied to intensity modulated radiotherapy (IMRT) beamlet intensity optimization [46]. Similarly, there exist more applications of quantum approaches regarding healthcare e.g. quantum sensors and quantum microscopes [47].

    AKA brain machine interfaces (BMI). Craver (2010) calls these a kind of prosthetic. I often use Datteri’s (2009) preferred terms, bionic and hybrid. For the purposes of this article, the terms prosthetic, bionic, or hybrid should be considered interchangeable in reference to models and experiments.

    Illustrative videos are available as supplementary materials at the Nature website,

    This clarification is needed because there is a sense in which any skill learning extends the brain beyond its previous repertoire of functions—e.g. learning to type, play the violin. I do not assume that there is a difference in kind between the kinds of brain plasticity and extension of function required for skill learning, and those observed following BCI use. It is just that the latter will not be observed in the absence of specific technological interventions because they rely on new kinds of brain-implant-body connections offered by the technology.

    Scare quotes because I do not aim to reinforce the simplistic picture of the brain as sandwiched between sensory inputs and motor outputs (see Hurley 1998).

    Thanks to an anonymous reviewer for raising this concern and for suggesting the electric plug metaphor.

    To pursue the electric plug metaphor, imagine an electrical motor built in the UK and designed to operate on a 240 V supply. Using a standard plug adaptor, the device is switched on in the USA and because it now only has a 110 V supply it doesn’t operate at full speed. But this device has an inherent capacity to modify internal components in response to the demands of the new electrical input, and in time begins to run as it did in the UK. It behaves as if it has grown an internal step-up transformer. This is what the brain is like as it adapts to the BCI.

    Note, however, that it need not be movement in an artificial body part that is generated, since many BCI experiments just require subjects to control the movement of a cursor on a computer monitor and also, it has been shown that parts of the brain other than motor cortex can be co-opted for this purpose (Leuthardt et al. 2011).

    It might be suggested that a prosthetic that used both accurate electrode placement and a more naturalistic decoding algorithm would have no need to rely on cortical plasticity. In a follow up to this paper, I explain the practical and theoretical limitations on making decoding models maximally realistic in this way.

    One may also object to my claim that BCI’s functionally extend the motor cortex by suggesting the alternative hypothesis that the co-opting of circuits for the new tasks is just normal re-use (see Anderson (2010) and thanks to an anonymous reviewer for raising this suggestion). As it happens, there are grounds for thinking that some phenomena commonly attributed to plasticity may actually be instances of re-use—e.g. that M1 has been described by different labs as encoding abstract direction of movement or controlling muscle activity, depending on experiments performed in those labs (Meel Velliste, personal communication). According to the simple plasticity account, one or both of these functions is not naturally performed by M1, and it must learn to do it but it could be that M1 is able to perform and switch between both of these functions even under non-experimental conditions. Importantly, the reuse hypothesis does not predict there will be structural changes in neural circuits called on to perform different functions. However, what is clear from the literature on BCI’s, and normal motor skill learning, is that such changes are also taking place, e.g. in the form of alteration of motor cortical neurons’ directional tuning preferences and domain of control (see Sanes and Donoghue 2000 for review), and so such effects are universally considered as instances of plasticity. It is these phenomena that I focus on.

    I will return to this issue in “Conclusions and questions” below.

    “Second, as far as ArB2 is concerned, many studies show that bionic implantation is likely to produce long-term changes in the biological system. It has been widely demonstrated … that the implantation of a bionic interface and the connection with external devices typically produces plastic changes in parts of the biological system, such as long-term changes of neural connectivity. Other plastic changes affect the activity of neurons.” (313).

    One could of course argue that Datteri ought to have treated the lamprey and the motor cortex cases differently because of the difference in degree of input–output matching, instead of lumping them together. In effect, that is to concede my point that Datteri’s framework is inappropriate for most of BCI research. It does seem, however, that Datteri underestimates the prevalence of BCI’s showing poor input–output matching, and the importance of plasticity for the working of most BCI’s. He writes that in the case of M1 interfaces, ArB2 is likely to be contravened by undetected changes just because the initial state of the biological system is less well characterised and so “plastic changes may be hard to detect and predict due to the lack of adequate theoretical models” (315). But from what is known already about the way that such techniques extend brain function, there is no question of any researchers being unaware of plastic changes they induce in motor cortex! Datteri neglects the importance of plasticity to the actual working of the BCI. To reiterate, functioning prosthetic implants are possible because the brain adapts to them.

    Following p. 11 above, the sense of “identity” here is that of having anatomical components and physiological properties that are effectively indistinguishable for the scientists comparing the systems. A plastically modified system will not be identical, in this sense, to the original one.

    For simplicity of exposition, and consistency with the rest of the paper, I focus on Craver’s example of the BCI for movement control, rather than the alternative case study of Berger’s prosthetic hippocampus. The conclusions he draws are not different for the two examples.

    The dimensions of completeness and verification describe how exhaustively and faithfully the model or simulation reproduces features of the biological mechanism. As Craver writes “All models and simulations of mechanisms omit details to emphasize certain key features of a target mechanism over others. Models are useful in part because they commit such sins of omission” (842). I will return to this point in “Conclusions and questions” below, and in this section concentrate on the three kinds of validity.

    Given that the topic is systems neuroscience, rather than cellular or molecular neuroscience which study sub-neuronal mechanisms, I understand the key “parts” here to be neurons, so that for a model of a brain circuit to be mechanistically valid it must be quite anatomically accurate, featuring the same number and type of neurons as in the actual mechanism.

    One wonders if Craver is saying that if multiple realizability were to occur in a non-bionic experiment, this would cause the same epistemic problem. In fact one cannot assume that mechanisms in systems neuroscience are not multiply-realized across individuals and across the lifespan. No two brains are identical, and circuits controlling perceptions and actions are sculpted and personalized by genetics and experience. It seems that the problem of failing to achieve mechanistic and phenomenal validity generalizes to non-bionic systems neuroscience, on Craver’s analysis. This point is comparable to the one made above (“Neuroplasticity in non-bionic experiments” section) that Datteri’s no-plasticity constraint must apply to non-bionic experiments in systems neuroscience, if it is to apply to bionic ones. However a more charitable reading of Craver takes up the point that the range of inputs and outputs used by nature is much narrower than that use by engineers (“The space of functional inputs and outputs is larger than the space of functional inputs and outputs that development and evolution have thus far had occasion to exploit.” p.847). Basic neuroscience, in its quest for phenomenal validity, can be said to be targeting this subspace of the expanse of possible inputs and outputs. Likewise, systems neuroscientists could be said to be working towards a description of the small range of mechanisms employed by different people for a specific function.

    One might object that this experiment works by intervening on a natural mechanism in the brain, not by modelling the hybrid mechanism as a route towards modelling the brain. I would disagree with this interpretation of the experiment. While the BCI is certainly a tool for intervening on the natural system, my central point is that findings from the hybrid system serve rather straightforwardly as the bases for hypotheses about the natural system. Scientists are modelling the hybrid system, but it turns out that coding in the hybrid system need not be characterised any differently from the natural one in spite of cortical reorganisation.

    This finding is somewhat controversial as other research groups have reported BCI’s operating with fewer neurons being recorded (Serruya et al. 2002 Taylor et al. 2002). Still, it seems that a population code of some sort is in play since no groups advocate a single-neuron code for motor control.

    I will discuss this result in the next section.

    Note that in Craver’s definition of mechanistic validity, the model’s representations of parts, activities, and organizational features must all be relevantly similar to the actual mechanism’s. The crucial point of this section was that validity with respect to organization can come apart from more anatomical accuracy concerning parts (neurons), and so needs to be evaluated separately. For further discussion see “Conclusions and questions” section below.

    Here is another example from (non-bionic) visual neuroscience: Freeman et al. (2011) present new fMRI data on orientation tuning of neurons in primary visual cortex, which they account for in terms of the retinotopic organisation of V1. They write that, “our results provide a mechanistic explanation” (p. 4804) of the pattern of findings. Again, what they describe is an organisational principle, rather than a detailed circuit model.

    See Craver (2010: 842) quoted in note 16 above: incomplete models are primarily “useful”, and omissions are “sins” rather than explanatory virtues cf. “How-possibly models are often heuristically useful in constructing and exploring the space of possible mechanisms, but they are not adequate explanations. How-actually models, in contrast, describe real components, activities, and organizational features of the mechanism that in fact produces the phenomenon. They show how a mechanism works, not merely how it might work” (2007: 112) and Datteri (2009: 308) “Underspecified models and mechanism sketches are progressively refined as model discovery proceeds, until a full-fledged mechanism model is worked out.”

    This is obviously a very brief sketch of an alternative approach, which will be presented more fully in a follow up to this paper.


    Abbott A (2006) In search of the sixth sense. Nature 442:125–127

    Carmena JM, Lebedev MA, Crist RE, O’Doherty JE, Santucci DM, Dimitrov DF, Patil PG, Henriquez CS, Nicolelis MAL (2003) Learning to control a brain–machine interface for reaching and grasping by primates. PLoS Biol 1(2):193–208

    Carmena JM, Lebedev MA, Henriquez CS, Nicolelis MAL (2005) Stable ensemble performance with single-neuron variability during reaching movements in primates. J Neurosci 25(46):10712–10716

    Carrozza MC, Cappiello G, Micera S, Edin BB, Beccai L, Cipriani C (2006) Design of a cybernetic hand for perception and action. Biol Cybern 95(6):629–644

    Cohen D, Nicolelis MAL (2004) Reduction of single-neuron firing uncertainty by cortical ensembles during motor skill learning. J Neurosci 24(12):3574–3582

    Dario P, Carrozza MC, Guglielmelli E, Laschi C, Menciassi A, Micera S, Vecchi F (2005) Robotics as a future and emerging technology. IEEE Robot Autom Mag 12(2):29–45

    Datteri E, Tamburrini G (2007) Biorobotic experiments for the discovery of biological mechanisms. Philosophy of Science (in press)

    Donoghue JP, Nurmikko A, Black M, Hochberg LR (2007) Assistive technology and robotic control using motor cortex ensemble-based neural interface systems in humans with tetraplegia. J Physiol 579(3):603–611

    Farah MJ (2002) Emerging ethical issues in neurosciences. Nature Neurosci 5(11):1123–1129

    Farah MJ (2005) Neuroethics: the practical and the philosophical. Trends Cogn Sci 9(1):34–40

    Hochberg LR, Serruya MD, Friehs GM, Mukand JA, Saleh M, Caplan AH, Branner A, Chen D, Penn R, Donoghue JP (2006) Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature 442:164–171

    Kandel ER, Schwartz JH, Jessell TM (2000) Principles of neural science, 4th edn. McGraw-Hill Medical, New York

    Karniel A, Kositsky M, Fleming KM, Chiappalone M, Sanguineti V, Alford ST, Mussa-Ivaldi F (2005) Computational analysis in vitro: dynamics and plasticity of a neuro-robotic system. J Neural Eng 2:S250–S265

    Keiper A (2006) The age of neuroelectronics, The New Atlantis (Online), 11, Winter, 2006, available at

    Lebedev MA, Nicolelis MAL (2006) Brain–machine interfaces: past, present and future. Trends Neurosci 29(9):536–546

    Lucivero F (2007) Brain machine interfaces and persons: ontological and ethical issues, Graduation thesis, Department of Philosophy, University of Pisa

    Marino D, Tamburrini G (2007) Learning Automata and human responsibility. Int Rev Inform Ethics e-journal (forthcoming)

    Marino D, Santoro M, Tamburrini G (2007) Learning robots and human–machine interactions: from epistemology to applied ethics. Art Intell Soc (this issue)

    Micera S, Carrozza MC, Beccai L, Vecchi F, Dario P (2006) Hybrid bionic systems for the replacement of hand function. Proc IEEE 94(9):1752–1762

    Millán JdR, Renkens F, Mouriño J, Gerstner W (2004) Brain-actuated interaction. Art Intell 159(1–2):241–259

    Moreno JD (2006) Mind wars. Brain Research and National Defense, Dana Press, Washington DC

    Musallam S, Corneil BD, Greger B, Scherberger H, Andersen RA (2004) Cognitive control signals for neural prosthetics. Science 305(5681):258–262

    Mussa-Ivaldi FA, Miller LE (2003) Brain–machine interfaces: computational demands and clinical needs meet basic neuroscience. Trends Neurosci 26(6):329–334

    Navarro X, Krueger TB, Lago N, Micera S, Stieglitz T, Dario P (2005) A critical review of interfaces with the peripheral nervous system for the control of neuroprostheses and hybrid bionic systems. J Peripher Nerv Syst 10:229–258

    Pfurtscheller G, Leeb R, Keinrath C, Friedman D, Neuper C, Guger C, Slater M (2006) Walking from thought. Brain Res 1071(1):145–152

    Porro CA, Francescato MP, Cettolo V, Diamond ME, Baraldi P, Zuiani C, Bazzocchi M, di Prampero PE (1996) Primary motor and sensory cortex activation during motor performance and motor imagery: a functional magnetic resonance imaging study. J Neurosci 16:7688–7698

    Pylatiuk C, Mounier S, Kargov A, Schulz S, Bretthauer G (2004) Progress in the development of a multifunctional hand prosthesis. In: Proceedings of the 26th IEEE annual conference of the engineering in medicine and biology society, pp 4260–4263

    Reger BD, Fleming KM, Sanguineti V, Alford S, Mussa-Ivaldi FA (2000) Connecting brains to robots: an artificial body for studying the computational properties of neural tissues. Art Life 6(4):307–24

    Salvini P, Laschi C, Dario P (2007) Roboethics in biorobotics: discussion of case studies. In: Proceedings of 2007 IEEE international conference on robotics automation (ICRA’07), workshop on roboethics (invited paper), 14th April, Rome, Italy

    Sanchez JC, Carmena JM, Lebedev MA, Nicolelis MAL, Harris JG, Principe JC (2004) Ascertaining the importance of neurons to develop better brain–machine interfaces. IEEE Trans Biomed Eng 51(6):943–953

    Santhanam G, Ryu SI, Yu BM, Afshar A, Shenoy KV (2006) A high-performance brain–computer interface. Nature 442:195–198

    Santucci DM, Kralik JD, Lebedev MA, Nicolelis MA (2005) Frontal and parietal cortical ensembles predict single-trial muscle activity during reaching movements in primates. Eur J Neurosci 22(6):1529–1540

    Sasaki Y, Nakayama Y, Yoshida M (2002) Sensory feedback system using interferential current for EMG prosthetic hand. In: Proceedings of the 2nd joint EMBS international conference, Houston, TX

    Serruya MD, Donoghue JP (2004) Design principles for intracortical neuromotor prosthetics. In: Horch KW, Dhillon GS (eds) Neuroprosthetics: theory and practice. Imperial College Press, London, pp 1158–1196

    Spencer PE, Marschark M (2003) Cochlear implants: issues and implications. In: Marschark M, Spencer PE (eds) Oxford handbook of deaf studies, language and education. Oxford University Press, Oxford, pp 434–450

    Talwar SK, Xu S, Hawley ES, Weiss SA, Moxon KA, Chapin JK (2002) Rat navigation guided by remote control. Nature 417(6884):37–38

    Tamburrini G, Datteri E (2005) Machine experiments and theoretical modelling: from cybernetic methodology to neuro-robotics. Minds Mach 15(3–4):335–358

    Taylor DM, Helms Tillery SI, Schwartz AB (2002) Direct cortical control of 3d neuroprosthetic devices. Science 296:1829–1832

    Tyler DJ, Durand DM (2002) Functionally selective peripheral nerve stimulation with a flat interface nerve electrode. IEEE Trans Neural Syst Rehabil Eng 10:294–303

    Warwick K (2003) Cyborg morals, cyborg values, cyborg ethics. Ethics Inform Technol 5:131–137

    Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin J, Kim J, James Biggs S, Srinivasan MA, Nicolelis MAL (2000) Realtime prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408:361–365

    Wiener N (1964) God & Golem, Inc.—a comment on certain points where cybernetics impinges on religion, MIT Press, Cambridge

    Wolpaw JR (2007) Brain–computer interfaces as new brain output pathways. J Physiol 579(3):613–619

    Wolpaw JR, Birbaumer N, Mc Farland DJ, Pfurtscheller G, Vaughan TM (2002) Brain–computer interfaces for communication and control. Clin Neurophysiol 113:767–791

    Zecca M, Micera S, Carrozza MC, Dario P (2002) On the control of multifunctional prosthetic hands by processing the electromyographic signal. Crit Rev Biomed Eng 30(4–6):459–485

    Watch the video: Brain Computer Interface w. Python and OpenBCI for EEG data (February 2023).