about
syllabus
All example source code
In session four’s notes on node and twitter bots I covered the basics of working with node, npm, and building an API with “RESTian” routes.
This page picks up on that thread and looks at a few scenarios where running server side code to work with text has advantages over running everything on the client.
One of the main reasons you might go down this road is if you have a large dataset that takes a significant amount of time to process. For example, let’s say you want to build a word counting app and you have one million documents to process. This would be unrealistic to do on the client-side, but reasonable on the server.
To build this example the first thing I’ll do is go and grab concordance.js
from the text analysis examples. Functions and objects from separate JS files can be used in node just like in an HTML page. However, this must be done via node modules and the require()
function.
For example, if you have the following constructor function in a file called concordance.js
:
You can have access to this constructor function in your main app (server.js
) with two additional steps. First, you must add Concordance
to module.exports
in concordance.js
. module.exports
is the object that’s actually returned as when you call require.
Once you’ve done this, you can now get access to Concordance
with a call to require()
like so:
Now that I have a Concordance
object I can start filling it with data on the server. Let’s say I had a sequence of numbered files sitting on the server that I want to process. I can read those files and pass the contents to the concordance with node’s file system module (aka fs
). The fs
module has functions for grabbing a list of files in a directory as well as reading specific files.
One thing you might notice about the above is the use of readdirSync()
and readFileSync()
as opposed to readdir()
and readFile()
. The “sync” refers to “synchronous” meaning these lines of code are “blocking”. The data has to be read before moving onto the next line. This is unusual in JavaScript in that typically a callback is required to be executed when the data is read. This is a case where I am happy for the program to stop and wait because I want to process all of the data before the server starts listening for connections. It’s ok if it takes a long time because this only happens once when the server starts. (This, however, would not be advisable at other points in the code like handling a client request to the server.)
Now that the data is read, I can create routes that send the data to a client making a loadJSON()
request with p5 (or pick your function using any JS library that can make HTTP requests.) Here’s one that sends everything in the concordance object.
I can also get fancier and make up my own protocol for sending back pieces of data from the concordance. For example, here is some code that returns the count for a specific word or “word not found” if it is not present in the concordance. The point is to send back a JavaScript object — it’s up to you to put in the object what you think makes the most sense.
The client can then access this data with loadJSON()
.
One thing you might notice about the above loadJSON()
calls is that they do not reference the domain itself, simply the route “all” or “search”. This is because I am assuming that the p5 sketch will be hosted by the same node app that is running the API code. In fact, in my example I’m doing exactly this by placing the p5 sketch in a “public” folder and serving those files statically using node and express:
However, let’s say you want others to be able to access your API from their code. In order for this to be possible you must enable something called CORS (Cross-origin resource sharing). This prevents others from getting that nasty error: XMLHttpRequest cannot load. No ‘Access-Control-Allow-Origin’ header is present on the requested resource. (More about Cross Domain Requests in JS.)
This is easy enough to do with the Node CORS package.
Another topic relevant to server-side programming is “persistance”. In other words, let’s say you want to build a text classifier. NaturalNode includes as one of its features Bayesian Text Classification which I briefly covered in text analysis.
Let’s assume your application classifies text as “happy” or “sad”. The system is “trained” by users submitting text tagged with the appropriate category (happy or sad). The server passes all this text to a Classifier
object which stores all the relevant counts and probabilities for the submitted text. After running your application for a week, you have hundreds of submissions. What would happen if you have to close and restart the server?
If all of the data is just stored in memory in the Classifier
object, it will all be lost as soon as the server quits. A solution is to save the data somewhere permanent that persists even when the server stops running. One option is to use a database (like mongodb or the simpler nedb) but for some basic scenarios these solutions can be overkill.
One option is to just write out a text file filled with JSON. The fs
module can handle the reading and writing and the JSON object has functions parse()
and stringify()
to convert back and forth from JS object to raw text.
Here is the skeleton of code used in the examples here. Step 1 for the server is to check and see if the file exists. (The code below is from a sentiment analysis scenario where words and their positive/negative score are stored in a file called additional.json
.)
If it does in fact exist, the data can be read using fs
and stored in a variable using JSON.parse()
.
In the case of it not existing, the code simply makes an empty JavaScript object.
The example includes an API call to add a word and score to the sentiment analysis. The server can then just write out a new JSON file using JSON.stringify()
on the variable additional
. Here, the writing of a file must be done asynchronously since the action is associated with a web request (rather than just when the server starts up) and you don’t want the server to get stuck on an operation while other clients could be trying to connect.
Writing out a text file is a nice quick and dirty solution that can work for many small-scale creative projects in a prototyping stage. Often, however, a more involved database is required. In addition to creating a database on your own server, there is also the option of using an API service like Firebase to store your data. I have notes and examples on firebase here.
The HTTP (Hypertext Transfer Protocol) includes several different kinds of requests. The one you are likely most commonly familiar with is a GET
request. This is the request that happens when you type a URL into an address bar. You are asking the server if you can “get” something, and what is sent back is some sort of data, often in the form of HTML, but in can be anything.
In fact, you are also making GET requests in code all the time. If you pass a url into the loadJSON()
function, a GET request is made.
In this week’s examples, I am handling GET requests in node as well, by using the get()
function on the app
object, i.e.
There are some scenarios, however, where a POST request is preferable to a GET. POST requests are designed for instances where the data sent would be stored on the server (or affect some sort of change of state by the server like deleting a database record.) They are also useful in the context of sending sensitive data like passwords since the data of a POST is not visible via the URL address.
In the case of creative ITP projects we can be a little loosey goosey about these distinctions. I’m using a post here because my examples might send a large paragraph (or even more) of text (GET requests have a data length limit).
To send a POST from p5 the httpPost()
method is available. Simply pass a JavaScript object with the data for the post to the appropriate url. You can then also define success and error callbacks to track the request.
On the server side, receiving the data looks almost identical to a GET
.
In the above code, you’ll notice that the data is pulled from req.body
rather than req.params
as with a GET
request. This only works because earlier in my code I am including the express body-parser package.
This package handles the parsing of a POST
body and gives you easy access in request.body
.