# 7 - AI Architecture: Neural Network Design

Summary

Topic: AI Architecture : Neural Network Design

Summary: AI Neural networks mimic the neural network of the brain. Once the technical architecture has been built, how does each component work? We present the various mathematical component in action.

This is paper #7 in the series. We recommend reading the previous articles in the series for an easier understanding.

Keywords: AI; Neural network; Machine Learning; Deep learning; Neuron; Synapses; Layers; Embedding; Vectorization; Activation Function; Feature Selection; Encoding; Forward Propagation

Author: Sylvain LIÈGE

Note: This Paper was NOT written by AI, although AI might be used for research purposes.

Assumption: We assume that you have read our previous papers or at least are familiar with their topics.

1 Introduction

It is now time to start using the various elements covered in the previous papers. We now have described how Algebra is necessary to convert the real world into numbers. We have introduced how functions are used to convert the past data into future predictions. We have described how a biological neural network works and how we can recreate one in the digital world.

In this paper we will present the 3 key elements of making this architecture work: FeatureSelection (decide what parts of the real world will be used to enter the model), Encoding(converting the world into numbers), and Forward Propagation (test the system to see how it performs).

Our example will use a classical neural network’s core structure. Recent AI massive systems are more complex, but our purpose is to convey the principles that are driving an AI system.

We will use mathematics in this paper, but just enough to understand the principles. No need to be a mathematician to follow.

2 Building our smell detection model

2.1 Short recap of the digital Neural network architecture

A digital neural network is made of:

Neurons that take numbers as input and produces new numbers as outputs.
Synapses that convey the numbers from one neuron to another. These synapses can decide to transmit or not the information and can amplify or decrease the importance of the information on the way.
One or more entry points to get the real-world data into the system
One or more exit points to produce the result(s)

2.2 Step 1: Identify the information structure of the real world – Feature Selection

As we now know, we must use numbers to allow mathematical wizardry to be performed. So, how do we convert the world into numbers? If the number of use case is very large, we will stick to our initial example of identifying if food is edible based on its odour.

The idea behind this step is to identify elements of information that we believe is making sense in order to achieve the end result. So, we can ask: how would my nose know that if something is good or bad? How would it decompose the information to reach this decision? So, let’s do just that…

The starting point is a bunch of molecules extracted from the physical food and floating in the air.

These molecules could be split into:

Smell Type: The types or classes of molecules (e.g., what kind of chemical makes the smell— fruity, floral, spicy, earthy, sweet, pungent, musky).
Intensity: How strong or weak the smell is (e.g., faint vs. overpowering).
Molecule concentration (g., number of molecules hitting the nose, intensity perceived).

Then, we could think of more intuitive characteristics:

Pleasantness:
Is the smell pleasant (e.g., like flowers or fresh bread) or unpleasant (e.g., like garbage or sulphur)? This is intuitive—people naturally categorize smells as good or bad.
Example: A rose might score high (0.9 on a 0–1 scale), while rotten eggs score low (0.1).
Familiarity:
How familiar the smell is to someone (e.g., everyday smells like coffee vs. rare ones like a specific exotic flower). This mirrors how the brain prioritizes familiar sensory inputs.
Example: Coffee might be 0.8 (familiar), while a niche perfume could be 0.2.
Duration:
How long the smell lingers (e.g., quick and fleeting vs. persistent). This is relatable—think of a passing whiff of perfume vs. a lingering cooking smell.
Example: A citrus scent might be 0.3 (short-lived), while a woody smell could be 0.7 (long-lasting).

We now have identified 6 pieces of information that could describe a smell:

Smell Type
Intensity
Molecule concentration
Pleasantness
Familiarity
Duration

At this stage we still do not have numbers to work with. So, let’s convert them into numbers.

2.3 Step 2: Converting the real world into numbers – Encoding

Smell type: we need to list the various types of smell we can recognise, for instance: fruity, floral, spicy, earthy, sweet, pungent, musky, etc. Let’s attribute a number to each of these types. Fruity = 1; Floral = 2; Spicy = 3; and so on… We now have a list of possible numbers from 1 to n the number of types. Let’s say we have 7 types. Then the values can be any integer from 1 to 7.

Intensity: we could describe the intensity in two ways. The first way is to list the various intensities from Faint to Overpowering via light, medium, strong and so on. With such strategy we would end up with a list of numbers just like we did with the smell type.
Another strategy consists in describing this intensity with a number between 0 and 1. 0 would mean you cannot get the smell of it and 1 would be that it is taking over every over smell. Of course, we would have an infinity of possibilities with any number between 0 and 1, like 0.1 is faint, 0.25 is light, 0.54 is high medium, etc. We now have a system to describe the intensity with a number from 0 to 1.

Molecule concentration: This could be a simple number indicating the number of molecule hitting the nose, any positive integer number from 0.

Pleasantness: Here again we could have a number between 0 and 1, with extremely unpleasant smells like rotten egg getting a 0.1 to extremely pleasant smell like roses getting 0.9.

Familiarity: Another number between 0 and 1 could do the job. 0 would be a never smelled before smell and 1 would be a perfectly recognised smell, with 0.5 being I know it but cannot put a name on it, while 0.3 would be “it rings a bell but have no idea”, etc.

Duration: could be, for instance, the number of seconds the smell remains in my nose, i.e. a positive integer from 0.

So, in order to make things easier, let’s name these 6 pieces of information.

Smell Type = t

Intensity = i

Concentration = c

Pleasantness = p

Familiarity = f

Duration = d

2.4 Step 3 : Feeding the model with initial data – Forward Propagation

Now that we have identified our initial data, we have to decide how they enter our model. Indeed, we still have decision to make on how we shape the entry feed.

We have to choose between 3 big strategies and all the variation possible between these three.

#1 We can enter all the information into one single neuron.

#2 We can enter each information into its own specialised neuron, ending with 6 neurons.

#3 We can group pieces of data to enter them in any number of neurons between 1 and 6.

This choice will have an impact on the behaviour of our Machine Learning model. So, it is indeed a strategic decision. And to understand better the impact of this decision, let’s visualise the various options.

Solution #1: we have one neuron per data type. It then looks like this:

Solution #2: we group intensity with concentration as they are quite related, and we group pleasantness and familiarity for the same reason. We then end up with the following system:

Solution #3, we keep it simple and have a single entry point for the 6 parameters. It then looks like that:

3 Neural network density

What we can observe from these 3 solutions is the density of the digital neural network is much higher in the first solution and decreases down to our last solution. It is visually clear by the number of synapses in action, or should I say “activable” (remember the Activation Function from in our last paper? —think of it as what helps decide if a neuron shares its information). It means that the “thinking process” or “thinking power” of our system is definitely higher in our first solution. As we can guess, the first system is more “intelligent”, it will be capable of more finesse in its reasoning than the last one. This also means the processing power needed to run the network increases with its complexity. The last solution will be both faster and cheaper to run. If you do not need a huge level of finesse in the output, it might very well be sufficient. As usual, one solution rarely fits all.
What we have a first taste of here, is that digital neural networks are the subject of technical design. And we have just observed the very beginning, but a significant one.

I would add that in practice, the architecture question is much more complex, and the design does not only depend on the density. In fact, a too high density can actually be counterproductive. Also, it depends on the nature of the problem we want to solve and, as we can now guess, the nature, amount and quality of the data fed to the system are definitely essential. Indeed, a dense network fed with rubbish data will only produce finesse rubbish, if you see what I mean. But all in all, we have shown here one important element of the design of a digital neural network.

4 Where is the Intelligence?

In this paper we have illustrated how the “intelligence” is built by converting the real world into numbers, and deciding how these numbers will be processed. The architecture of the network is of great importance for the level of details expected in the result.

I would emphasise nonetheless that so far, the intelligence is mostly coming from human being who decide how to split the real world into smaller pieces and convert these pieces into numbers. From there it seems we still have mathematical wizardry going on.

In our next paper we will start digging into this wizardry. Stay tuned.