What JavaScript Developers Can Learn from C++

Types, memory and how learning a lower-level language can make you a better programmer

Published on
Apr 23, 2019

Read time
9 min read

Introduction

Like many new developers, JavaScript was the first language I learnt. It’s the front-end programming language of the web and — thanks to Node.js — it is also a popular back-end tool.

I also believe that, as a ‘higher-level’ language, JavaScript is a fantastic choice for beginners. You can run it on any web browser, and features such as prototypal inheritance and dynamic types give learners fewer obstacles to overcome before they write and execute their first piece of code.

But what makes JavaScript easier for beginners can also make it harder to master. It can behave in seemingly unintuitive ways, and so many developers rely on a trial-and-error approach when it comes to more opaque features, such as implicit type coercion or the this keyword. It’s much easier to know features like these than to understand them.

“Any fool can know. The point is to understand.” — Albert Einstein

So, to become a more advanced JavaScript developer, it’s useful to try and better understand what’s going on under the hood. Ultimately, the best place to look is the V8 JavaScript Engine: its the most widely-used JavaScript compiler (underlying Google Chrome, Node.js and more) and it’s open-source, so you can see exactly how JavaScript’s features are executed in C++ — the underlying language.

But this article isn’t a guide to V8. Rather, it’s a look at how lower-level languages like C++ can help us improve our understanding of higher-level languages like JavaScript. Not only can C++ help us understand the underlying compiler code, but — by looking into the things C++ developers must do which JavaScript developers can avoid doing — we can get a much better sense of where JavaScript is saving us time, and why sometimes that can cause problems.

In particular, we’ll look at data types and memory management in C++, and how knowledge of these can help us avoid type errors and prevent memory leaks in JavaScript. We’ll also look into what memory management has to do with the end of time.

Type Coercion in JavaScript

Before jumping into C++, let’s look at how JavaScript handles data types and some of the pitfalls of its system of ‘type coercion’.

JavaScript uses type coercion to automatically convert one data type to another: strings to numbers, numbers to strings, number or strings to booleans, and so on. In other words, if you don’t explicitly specify what type you want, JavaScript will guess based on a set of rules. Sometimes, this is useful and it can help us write code quickly and concisely. Other times, it can be the cause of confusion.

Indeed, even though this behaviour is — ultimately — predictable, certain automatic decisions are less-than-intuitive, and in a large codebase, it’s easy to see how type coercion could lead to unexpected errors. For example, here are a few demonstrations of the results achieved using equations that combine strings and numbers:

"10" - 4;
// 6

"10" + 4;
// "104"

"20" - "5";
// 15

"20" + "5";
// 205

"20" + +"5";
// 205

"foo" + "bar";
// "foobar"

"foo" + +"bar";
// "fooNaN"

"6" - 3 + 3;
// 6

"6" + 3 - 3;
// 60

In these examples, a lot of the potential confusion revolves around the + operator, which can be used to both coerce a string to a number and — as the concatenation operator — to combine two or more strings.

Although type coercion may help developers write code more quickly and concisely — and it gives beginners one less thing to think about — is clear why such a system could lead to errors, particularly in a larger, more complex codebase. The results above may make perfect sense to seasoned JavaScript developers, but they are not all intuitive!

With the benefits and flaws of JavaScript’s type coercion system in mind, let’s now see how C++ handles data types.

Types and Memory Management in C++

Lower-level languages such as the C++ do not have the same potential pitfalls because data types must be stated at the point of definition. While JavaScript has three keywords — var, let and const — for the declaration of new variables, in C++ every data type has its own keyword.

So, for example, the 7 basic data types in C++ are integer, floating point, double floating point, character, wide character, boolean, and valueless. The keywords used to define them are int, float, double, bool, char,wchar_t, and void, respectively.

The following snippet contains a sample declaration of each of these types, with additional notes in the comments:

#include <iostream>
#include <string>
using namespace std;

int main()
{

  // BOOLEANS
  bool isChecked = true;

  // INTEGERS
  int age = 24;

  // FLOATS
  // In general, a float has 7 decimal digits of precision, while a double has 15
  float pi7 = 3.1415926;
  double pi15 = 3.141592653589793;

  // CHARACTERS
  // Regular characters can contain only values stored in the ISO Latin tables
  // Wide characters, however, can contain unicode values
  char englishGreeting[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
  wchar_t mandarinGreeting[7] = { 'n', 'ǐ', ' ', 'h', 'ǎ', 'o', '\0' };

  // STRINGS
  // In C++, string is not a data type (as it is in JavaScript and many other languages)
  // It is a class, and so we must write #include <string> at the top of the document
  string greeting = "Hello";

  // VOID
  // A common use of void is to define functions which don't return anything
  void printMessage() {
    cout << "Hello, world!";
  };

  return 0;
}

Unlike JavaScript, C++ places a lot of control for memory management in the hands of developers. In C++, every time we declare a variable we are also making a decision about how much memory to reserve. For example, an ordinary char usually contains just 8 bits (1 byte), limiting its use to the 255 characters of the ISO Latin tables. By contrast, a wchar_t contains 16 or 32 bits, taking up more memory but allowing us to access the much larger variety of Unicode characters.

The greatest variety of options is found in the integer type, where the basic int keyword can be combined with the size keywords short, long and long long and the “signedness” keywords signed and unsigned.

The basic int type contains the natural size suggested by the system architecture. On a 64-bit operating system, that is usually a size of 32 bits. In practice, that means such a signed variable can contain values varying between -2,147,483,648 and 2,147,483,647, while an unsigned variable can contain values between 0 and 4,294,967,295.

If you know the range of possible integers is smaller than that, you can use a short int to save memory. Or, if you’re dealing with extremely large integers, you can use a unsigned long long int to write 64-bit numbers as large as 2^64-1 (9 quintillion).

Why Memory Matters: A Use-Case About the End of Time

Using a 64-bit variable declaration such as long long int allows computers to measure dates some 292 million years into the future. This may seem like a needlessly large amount of time, but it actually solves a very practical problem.

By convention, most dates in computing are measured using Unix time, which is dated from midnight on 1 January 1970 UTC and which is accurate to the nearest second. On systems where Unix time is stored as a signed 32-bit number, the largest value that can be recorded is 2,147,483,647. This might seem large, but given we’re recording every single second, two billion actually doesn’t get us very far.

In fact, dates recorded on 32-bit systems will reach their maximum value on 19 January 2038 UTC (at exactly 03:14:07). When that happens, the date will wrap around to negative 2,147,483,647, appearing as 13 December 1901. This is known as the 2038 Problem, and it has lead to many hyperbolic headlines, such as “All computers are going to be wiped out in 2038” — courtesy of the Metro, a tabloid in the UK.

That sensationalist headline may be far from the truth, but — when 2038 comes around — the problem may cause issues for 32-bit operating systems and even older versions of entire programming languages. I first encountered the problem using PHP, which — before version 5.2 — had no built-in way of recording dates past 2038. (For the record, JavaScript uses a 64-bit system to measure date, so we JavaScript developers don’t need to worry about this)!

The 2038 Problem demonstrates the potential usefulness of managing memory ourselves. Where we require a smaller range of values, we can save memory. And where we require a larger range, we can make sure our system stores an adequate amount.

Memory Management in JavaScript

“JavaScript automatically allocates memory when objects are created and frees it when they are not used anymore (garbage collection). This automaticity is a potential source of confusion: it can give developers the false impression that they don’t need to worry about memory management.” — MDN

JavaScript is known as a “garbage-collected” language. It uses a mark-and-sweep algorithm to check which pieces of memory are active and which are “garbage”. The collector then can free up the “garbage”, returning the unused memory to the operating system.

Garbage collection is a characteristic of higher-level languages, and it helps free up memory that — as far as it is possible to tell without the explicit instructions from a developer — is no longer needed. For a useful look at garbage collection in JavaScript, check out this article and MDN’s page on Memory Management.

Garbage collection is a powerful system for automatic memory management, but it’s not foolproof. In particular, so-called “unwanted references” can lead to memory leaks, meaning that a program takes up more memory than is necessary, making it less efficient. However, if we are aware of the risk of memory leaks, we can take steps to remove them.

One common reason for memory leaks is the accidental use of global variables. Any time we define a variable in JavaScript without a keyword var, let or const, then that is automatically considered to be a global variable. Unless foo is already defined, the expression foo = "bar" is equivalent to window.foo = "bar".

A linting tool like ESLint will help you look out for errors like this, but JavaScript’s in-built strict mode also prevents the accidental use of global variables, marking them as errors. To activate strict mode, simply type "use strict" at the beginning of any script or function where you want to use it. For more ways to remove memory leaks from your code, check out this article.

Types in JavaScript

There are also ways to specify variable types and create your own types in JavaScript, in a way reminiscent of lower-level languages. The most popular and comprehensive solution is TypeScript, a syntactical superset of JavaScript, which adds the option of static typing to the language.

There are lots of great resources out there on TypeScript, suffice to say that it is a great way to ensure that your code is scalable and error-free, and it will help us avoid the kind of unintuitive results we saw above, in the section on “type coercion”. The file extension of TypeScript is .ts and there is also an equivalent for .jsx: .tsx. One of the best starting points for beginners is TypeScript in 5 Minutes.

It’s also worth noting that there are also type annotation solutions specific to different JavaScript technologies. For example, you can add the official PropTypes node module to your React projects. This enables you to document the intended data types for props passed to a component as well as setting default values. Especially when combined with a linter like ESLint, PropTypes is a powerful addition to any React-based setup.

Conclusion

Overall, I hope this article has helped elucidate some of the differences between lower-level languages like C++ and higher-level languages JavaScript.

I also hope it has equipped you with the tools to bring some of the benefits of C++ into JavaScript, in the form of TypeScript or PropTypes, and demonstrate that it is possible to influence and improve memory management in JavaScript.

If you’re interested in learning more, I’ve included several links below of the articles and resources I found most useful when writing this article. And if you’ve got a decent understanding of C++ and you want to learn more about the way JavaScript is implemented, the best place to go is probably either the official V8 site or the official Git repo. Happy coding!

References

Types in JavaScript

JavaScript data types and data structures
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data...

Types in C++

Fundamental Types in C++
https://en.cppreference.com/w/cpp/language/types

Memory in JavaScript

JavaScript Garbage Collection
https://javascript.info/garbage-collection
JavaScript Memory Management
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Memo...
4 Types of Memory Leaks in JavaScript and How to G...
https://auth0.com/blog/four-types-of-leaks-in-your-javascrip...

TypeScript

TypeScript in 5 minutes
https://www.typescriptlang.org/docs/handbook/typescript-in-5...

PropTypes

prop-types: Runtime type checking for React props ...
https://www.npmjs.com/package/prop-types

© 2024 Bret Cameron