This tutorial provides a simple yet powerful introduction to the ncpi library. It walks through the core workflow and highlights the library’s main functionalities, demonstrating its potential through this practical example. In this tutorial, we define a simplified toy simulator that generates spike trains. For an example using a real spiking simulator, see the tutorial Forward Modelling of Neural Activity. We transform spikes into field potentials using a field potential proxy and extract the catch22 feature set from the simulated data. Finally,we split the data into training and test sets to evaluate the capacity of an RandomForestRegressor inverse model to recover the original simulation model parameters.

First, we define a simulator of synthetic spike trains for a population of neurons, where each neuron's activity is driven by a combination of independent noise and a shared sinusoidal signal. Both inputs are convolved with an exponential synaptic kernel whose decay is controlled by the parameter θ. For simplicity, we assume the shared input's influence is also scaled by θ, introducing θ-dependent correlations across neurons. Spike trains are generated by thresholding the resulting time-varying spike probabilities.

import numpy as np

def simulator(θ, n_neurons=50, sim_time=2000, sampling_rate=100):
    """
    Simulates spike trains modulated by a shared sinusoidal signal and independent noise.

    Parameters
    ----------
    θ : float
        Controls the exponential decay of the synaptic kernel and the influence of the shared signal.
    n_neurons: int, optional
        Number of neurons to simulate. Default is 50.
    sim_time: int, optional
        Length of the simulation in time points. Default is 2000.
    sampling_rate: int, optional
        Sampling rate in Hz (or time points per second). Default is 100.

    Returns
    -------
    spikes : ndarray of shape (n_neurons, sim_time)
        Binary array representing spikes (1s) and no spikes (0s) for each neuron over time.
    exp_kernel : ndarray
        The exponential kernel used for convolution.
    """
    spikes = np.zeros((n_neurons, sim_time))

    # Synaptic kernel
    tau = sampling_rate * (θ + 0.01)
    t_kernel = np.arange(int(sampling_rate * 4))
    exp_kernel = np.exp(-t_kernel / tau)

    # Sinusoidal shared signal
    freq = 2.0  # Hz
    time = np.arange(sim_time) / sampling_rate
    shared_input = 0.5 * (1 + np.sin(2 * np.pi * freq * time))  # Values in [0, 1]

    for neuron in range(n_neurons):
        # Independent signal for each neuron
        private_input = np.random.rand(sim_time)
        private_modulated = np.convolve(private_input, exp_kernel, mode='same')

        # Combine shared and private inputs
        k = 0.5 * θ # Mixing coefficient based on θ
        modulated = private_modulated + k * np.convolve(shared_input, exp_kernel, mode='same')

        # Normalize combined modulation
        modulated -= modulated.min()
        modulated /= modulated.max()

        # Generate spikes
        spike_probs = modulated - 0.9
        spikes[neuron] = np.random.rand(sim_time) < spike_probs

    return spikes, exp_kernel

Let’s plot some spiking activity across different theta parameter combinations.

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

# Generate some θs
θ = np.linspace(0, 0.5, 4)

# Simulate spike trains for each parameter
simulations = [simulator(theta) for theta in θ]

# Create a figure with a GridSpec layout
fig = plt.figure(figsize=(10, 7))
gs = gridspec.GridSpec(len(θ), 2, width_ratios=[1, 3], wspace=0.3)

for i, (theta, sim_data) in enumerate(zip(θ, simulations)):
    spikes, exp_kernel = sim_data

    # Plot the exponential kernel
    ax_kernel = fig.add_subplot(gs[i, 0])
    ax_kernel.plot(np.arange(len(exp_kernel)) / 100, exp_kernel, color='blue')
    ax_kernel.set_title(f'Kernel (θ={theta:.2f})')
    if i == len(θ) - 1:
        ax_kernel.set_xlabel('Time (s)')
    ax_kernel.set_ylabel('Amplitude')

    # Plot the spike trains as a scatter plot for all 50 neurons
    ax_spike = fig.add_subplot(gs[i, 1])
    for neuron_idx, neuron_spikes in enumerate(spikes):
        spike_times = np.where(neuron_spikes == 1)[0]  # Get spike times (indices where spikes occur)
        ax_spike.scatter(spike_times, [neuron_idx] * len(spike_times), color='black', s=2)

    ax_spike.set_title(f'Spike Trains (θ={theta:.2f})')
    ax_spike.set_ylabel('Neuron Index')
    if i == len(θ) - 1:
        ax_spike.set_xlabel('Time (s)')
    ax_spike.set_xticks(range(200, 1801, 200))
    ax_spike.set_xticklabels([f"{t/100:.1f}" for t in range(200, 1801, 200)])
    ax_spike.set_xlim(200, 1800)
    ax_spike.set_ylim(-0.5, 49.5)  # Adjust y-axis limits to fit 50 neurons

# Adjust spacing between subplots
plt.subplots_adjust(left=0.1, right=0.95, top=0.95, bottom=0.1, wspace=0.3, hspace=0.5)

plt.show()

Here, we generate a dataset by simulating spike trains for different values of the parameter θ. For each simulation, we calculate firing rates and a field potential proxy, which will be used later for training and testing our model.

import ncpi

# Set simulation parameters
n_neurons = 50           # Number of neurons in each simulated recording
sim_time = 2000          # Total number of time points per sample
sampling_rate = 100      # Sampling rate in Hz
n_samples = 1000        # Total number of samples (combined training + test set)

# Define the bin size and number of bins for firing rate computation
bin_size = 100  # ms
bin_size = int(bin_size * sampling_rate / 1000)  # convert to time steps
n_bins = int(sim_time / bin_size) # Number of bins

# Preallocate arrays for storing simulation output
sim_data = {
    'X': np.zeros((n_samples, n_bins)),
    'θ': np.zeros((n_samples, 1))
}

# Create the simulation dataset
for sample in range(n_samples):
    # Generate a random parameter θ
    θ = np.random.uniform(0, 0.5)

    # Simulate the spike train
    spikes, exp_kernel = simulator(θ, sampling_rate=sampling_rate,
                                      n_neurons=n_neurons,
                                      sim_time=sim_time)

    # Compute firing rates
    fr = [
        [np.sum(spikes[ii, jj * bin_size:(jj + 1) * bin_size])
         for jj in range(n_bins)]
         for ii in range(spikes.shape[0])
    ]

    # Create a FieldPotential object
    fp = ncpi.FieldPotential(kernel = False)

    # Get the field potential proxy
    proxy = fp.compute_proxy(method = 'FR', sim_data = {'FR': fr}, sim_step = None)

    # Save simulation data
    sim_data['X'][sample, :] = proxy
    sim_data['θ'][sample, 0] = θ

Next, we reshape the θ array if needed and extract features from the simulated data using the catch22 feature set. These features will help us train the model to predict θ.

import pandas as pd

# If sim_data['θ'] is a 2D array with one column, reshape it to a 1D array
if sim_data['θ'].shape[1] == 1:
    sim_data['θ'] = np.reshape(sim_data['θ'], (-1,))

# Compute features
df = pd.DataFrame({'Data': sim_data['X'].tolist()})
features = ncpi.Features(method='catch22')
df = features.compute_features(df)

Computing features: 100%|██████████| 334/334 [00:01<00:00, 212.21it/s]

Let’s explore how the 22 catch22 features evolve as θ changes.

import seaborn as sns

# Extract features and θ values
features = np.array(df['Features'].tolist())  # Shape: (n_samples, 22)
theta_values = sim_data['θ']  # Shape: (n_samples,)

# Create a DataFrame for easier plotting with Seaborn
plot_data = pd.DataFrame(features, columns=[f'Feature {i+1}' for i in range(22)])
plot_data['θ'] = theta_values

# Create a figure with subplots for each feature
fig, axes = plt.subplots(11, 2, figsize=(14, 24))  # 11 rows, 2 columns for 22 features
axes = axes.flatten()  # Flatten axes for easy iteration

for i, ax in enumerate(axes):
    if i < 22:  # Plot only for the 22 features
        sns.scatterplot(x='θ', y=f'Feature {i+1}', data=plot_data, ax=ax, alpha=0.5, s=10)
        sns.regplot(x='θ', y=f'Feature {i+1}', data=plot_data, ax=ax, scatter=False, color='red', ci=None)
        ax.set_title(f'Feature {i+1}')
    else:
        ax.axis('off')  # Turn off unused subplots

# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()

Now, we split the data into 90% for training and 10% for testing. The training data is used to teach the model, while the test data evaluates how well it performs. If you’re working with real data, the test set could be replaced with actual experimental data to predict changes in parameters.

# Split simulation data into 90% training and 10% test data
indices = np.arange(n_samples)
np.random.shuffle(indices)
split = int(0.9 * len(indices))
train_indices = indices[:split]
test_indices = indices[split:]

X_train = np.array(df.iloc[train_indices].drop(columns=['Data'])['Features'].tolist())
X_test = np.array(df.iloc[test_indices].drop(columns=['Data'])['Features'].tolist())
θ_train = np.array(sim_data['θ'][train_indices])
θ_test = np.array(sim_data['θ'][test_indices])

Here, we set up and train the model. The model learns to predict the parameter θ based on the features we extracted earlier.

# Create the inference object and add simulation data
inference = ncpi.Inference(model='RandomForestRegressor',
                           hyperparams={'n_estimators': 100,
                                        'max_depth': 10,
                                        'min_samples_split': 2})
inference.add_simulation_data(X_train, θ_train)

# Train the model
inference.train(param_grid=None)

Let’s evaluate how well the model performs! Using the test data, we generate predictions and compare them against the true values of θ. Ideally, the predicted points should fall along the red dashed line, which represents perfect agreement between predictions and true values. Feel free to experiment with different hyperparameters when creating the inference object to observe how the model’s accuracy changes.

from sklearn.metrics import mean_squared_error

# Evaluate the model using the test data
predictions = inference.predict(X_test)

# Calculate MSE
mse = mean_squared_error(θ_test, predictions)

# Plot real vs predicted values
plt.figure(figsize=(8, 6))
plt.scatter(θ_test, predictions, alpha=0.5, label=f'MSE = {mse:.3f}')
plt.plot([θ_test.min(), θ_test.max()], [θ_test.min(), θ_test.max()], 'r--', label='Ideal Fit')
plt.xlabel('True θ')
plt.ylabel('Predicted θ')
plt.legend()
plt.show()

Computing predictions: 100%|██████████| 34/34 [00:01<00:00, 20.15it/s]