Impactful Person

One of the most impactful people in my life is someone who I’ve never actually met. He doesn’t know my name, age, or virtually anything about me, nor I about him; my only form of recognition is by my…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Neural Networks for Tabular Data

This is a quick and complete post on how neural networks fits in tabular data. I am using fastai library to implement it. Let’s start.

Tabular data is represented in spreadsheets, a relational database, financial reports, etc. There are a lot of areas which require tabular data analysis like:

Using neural networks for tabular data is a whole new idea as there are already a lot of methods out there in the market like Random Forests, GBM, etc. But neural networks turned to be highly useful for the tabular data as well. Using neural networks for tabular data required less maintenance, and the approach is highly effective and reliable.

Fastai has made it very easy to analyse tabular data using neural nets. There is a module in the library fastai.tabular which is solely built for the purpose.

Let us import the library as below:

Apart from fastai, pandas library is convenient for analysing tabular data. Pandas DataFrame is the standard format for tabular data in Python.

Let us untar the data. Data is made available as one of the fastai academic dataset. And then we will use pandas to access the database.

After we have uploaded our data, we need to define the dependent variable, continuous variables and categorical variables.

❓ What is meant by dependent variable, continuous variables, and categorical variables

After this, we need to define the process which will happen ahead of time. So we pre-process the data frame rather than doing it as we go. Preprocessing is done on the training dataset before it is fed for training the model. The same preprocessing variables are carried over to the test and training dataset.

Let us defined the processes as mentioned above, like below:

If you want to see a batch of data:

Blue: categorical variables, Green: mean of continuous variables

For tabular data bunch, we have a tabular learner.

layers= — this is where we’re defining our architecture just like when we chose ResNet 34 or whatever for convolutional nets. It refers to the sizes of the hidden fully connected layers between the input (after embedding) and before the classification layer. The number of hidden layers decides the length of the list.

You can unfreeze and then fine-tune the learner if you want to get the more relevant results. To know more about this, you may go through my classification post.

Our deduction matches with the actual data.

There is always room for more research and neural nets on tabular data is a very big source of research. But in gist, this is how we use neural nets for tabular data.

Add a comment

Related posts:

Cheapest car for a learner driver to insure?

my mums insurance wont insure anyone under 21 and my dads insurance want just under 3000, so i wondered which would be the cheapest car for a learner driver to insure? thanks xxx ANSWER: I recommend…

Austin Clements Promoted to Principal

The first time I met Austin, TenOneTen was just getting rolling. We were evaluating a tremendous number of deals, putting systems in place within the firm, and working with the companies we’d…