Why Publishers need an industry-specific CMS

As publishers realize that using a Content Management System (CMS) is not just good organizational practice, but increasingly indispensable to remain robust and competitive, an increasingly common question to consider is what sort of CMS to acquire. While it might be tempting to simply use free services like Dropbox or Google Drive, I’ve found that there are four reasons why a more specialized system that is specifically designed for publishing makes a lot more sense.

The virtues of Book Folders

While it is a truism that every book is unique, this doesn’t mean that certain trends tend to repeat. Recognizing that most books are split into chapters with different teams working on art, editorial, design, proofreading, etc., a CMS built for publishing automatically creates a comprehensive folder structure for each book. A sample screenshot from PageMajik is provided as an example:

For a production team where art, editorial, design, and proofreading are handled by different people or at different stages, distinct folders are provided to store their files.

While it certainly is convenient to not have different teams constantly ensuring that they know which files are theirs and having to adopt intricate naming conventions, the folder structure enforces version throughout by making it possible to store each version of every file, as well as detailed metadata on each of those files. The presence of older versions lets users open any previous version to compare and contrast newer ones, and if the latest version proves unsatisfactory, a previous one can be reverted to.

As figure 1 shows, the metadata associated with each file that can be stored includes who created and updated it, when it was updated, and how many previous versions of that file are stored. This bird’s eye view of all content lets you monitor, search, and retrieve any information required, granting unprecedented control over the publishing process.

You get a Workflow, you get a Workflow, everybody gets a Workflow!

Your CMS doesn’t necessarily have to be a cluttered space where everyone has access to every file. You can specify instructions in advance regarding who is allowed to access what, letting you tailor the system to your particular needs. This doesn’t just keep files safe, it also removes the need to remember onerous instructions about who you should inform when you finish your work on the file or who to send your file to. Now the pre-set instructions will ensure that everything that has to be done at a certain stage is completed, and that once all the tasks are finished and signed-off on, the system will automatically trigger the next stage of production and everyone with permissions will be informed about this change.

This minimizes errors by not having to depend on just human supervision to ensure all the work gets done. In addition, it simply makes it more convenient for everyone involved, because they can focus on their work without having to deal with the hassles of the larger process itself.



The AI Wars

How Does Publishing Compare to Other Industries?

A report published last year noted that by 2023 the artificial intelligence market will be a $14.2 billion industry, up from $525 million in 2015, with most of the growth taking place in North America. “The reason behind the positive growth of AI markets in this region is the wide-scale adoption of AI technologies in various end-use industries such as manufacturing, media and e-commerce,” the report noted.

But how is AI currently being integrated into our lives? And can publishing learn anything from these other industries?


In online shopping, we see AI play a role in recommendations based on previous purchases, programmatic advertising based on behavior, and with chatbots helping to answer simple questions during a shopping experience. Today, almost every retail site features these tools, with Amazon and Apple’s iTunes leading on development in this field.


In the media space for example, machine learning is already being employed on both the editorial and advertising side of operations. In a previous blog post, we noted how The Washington Post used bots to help with their Olympic reporting. In addition, Associated Press partnered with Automated Insights to use AI technology to automate quarterly reports. Content producers from every segment of the media are beginning to use AI software to improve the speed and efficiency of their workflow, the production process, and their ability to organize and categorize content.


As with retail sites, advertisers are exploring a variety of ways to tailor messaging based on reader/user behavior. In addition, as mentioned in this AdWeek article, advertising agencies are using AI to discover new consumer targets and to customize information based on region or interests of individual users. What’s more, McCann Erickson Japan even hired an AI Creative Director to direct commercial design.


For music, film, and TV, today’s users require and expect curation and personalization. Netflix’s 104 million global users and Spotify’s 140 million global users go to each streaming site to be recommended films, television, and music that they will want to see. AI helps in creating that.


Though technology was to blame for the demise of the music industry a decade ago, AI seems to be helping to bring it back. AI-generated music can help reduce time and cost, saving record labels significant amounts of money in the process, while also allowing musicians who may not be able to afford a band to play behind them to create the music they want with Garageband and other programs. According to a Goldman Sachs report, streaming services, such as Spotify, will generate over $34 billion in revenue in the music industry by 2030. As noted in this Forbes article, user behavior and interests that come from using streaming services can help the music industry better understand the market, what types of music and artists to invest in, and how quickly to roll out new music.


In 2016 for the film “Morgan,” 20th Century Fox partnered with IBM Research to create the first ever cognitive movie trailer. As noted in IBM’s Think blog, “Traditionally, creating a movie trailer is a labor-intensive, completely manual process. Teams have to sort through hours of footage and manually select each and every potential candidate moment. This process is expensive and time consuming –taking anywhere between 10 and 30 days to complete. From a 90-minute movie, our system provided our filmmaker a total of six minutes of footage. From the moment our system watched ‘Morgan’ for the first time, to the moment our filmmaker finished the final editing, the entire process took about 24 hours.” It is streamlining these time-consuming processes throughout the industry where AI can be of best service.


For publishing, there are a lot of possibilities for where to use AI, but the need and use so far has outweighed the development. For example, technology and better direct connection to readers has provided publishers with an extraordinary amount of granular information about customers and products in the marketplace. Unfortunately, although they have this information, there is simply no way for a human to go through and easily process this information and develop ways to use it.

As previously mentioned in retail, recommendations on bookselling sites is probably the most prominent use of AI in the industry at this moment in time.

For academic publishers, AI can measure a student’s understanding of concepts and tailor a specific framework for that student’s learning.

For the PageMajik product suite, we are using AI to help speed up the workflow from author to the marketplace in order to save the publisher time and money. We hope to eliminate some of the redundant and time-consuming tasks throughout the publishing process by automating significant portions with AI.

Is Technology Fatigue Holding Publishers Back?

If the CEO Roundtable at BookExpo is any indicator, publishers are still focusing on traditional channels in which to reach readers. As Shelf Awareness reported, “[Macmillan CEO John] Sargent agreed that the ‘long-term health of the industry’ was good, but said he thought that in the coming years publishers will face ‘some serious issues’ pertaining to ‘changing consumer buying behaviors.’ As consumers shop more and more online, it will be harder for them to discover books; Sargent argued that what publishers need to protect is ‘lots and lots of shelf space’ in which customers can browse and discover books.”

Music, film, and television have embraced the discovery tools and companies like Spotify, Netflix, and Hulu have helped them find both tried-and-true and new audiences using AI discovery tools. Books and readers have yet to embrace that technology. Other than subscription models and the Amazon algorithm, there have been few ways that the publishing industry is really exploring discovery via AI.

Is this due to a lack of understanding of the changing marketplace? Or an unwillingness to give up on existing channels and modes of discovery? Or is it something to do with how readers discover books?

Traditionally, discovery has been about browsing a bookshop, as Sargent noted; seeing an enticing cover, reading the flyleaf, scanning the first page. Today, that isn’t the speed at which the world works and traffic to bookstores isn’t what it once was. We need new discovery tools and a way to connect to readers where they are—on their computers, smartphones, and tablets.

Discovery isn’t the only place in which publishers continue to follow traditional channels. Back-end systems for workflow and rights management continue to be maintained in older methods. AI can help speed up time-consuming processes and provide better record-keeping, but what is slowing publishers down is something else that is going on—technology fatigue.

For the past 11 years since the Kindle turned the world on its ear, the centuries old industry of the printed word has been trying to play catch-up to the ever-changing consumer. Every year, there are new tools, new channels, new ways of consuming content, and new perspectives on the industry. Are publishers just exhausted by the ideas and want to revert to old ways?


At April’s Book Industry Study Group annual meeting, Maureen McMahon, president and publisher of Kaplan Publishing, and BISG chair discussed the challenges the book industry is facing as technology continues to impact it. When blockchain came up, she joked, “I’m not ready to think about it.”

And yet, as much as some of these sales channels and discovery tools and systems still work, publishing can be doing better if they just embrace some tools that can make jobs simpler and connect to readers more directly.

Our customers who have taken a chance on our product suite have seen a 40% increase in efficiency in the publishing process. Buying back that time in the day, freeing up staff to work on other projects, and speeding books and journals to the marketplace to meet growing demand, can help a publisher increase revenue dramatically. So, while the ever-changing technological landscape can sometimes be daunting and exhausting, it is worth the struggle for publishers to embrace these changes, adapt, and take control of their own future.

The Mona Lisa and Machines

A psychological theory for why we don’t take AI as seriously as we should

The artistic machines are coming. Artificial intelligence is already starting to upend deep assumptions about the indispensability of human input in the diverse areas like journalism, archaeology, writing, and even musical composition. Although a lot of this technology is still in its infancy, there doesn’t seem to be any real limitation in principle to the extent to which machines could take over in these domains, at least in the long run.

This awareness of our possible looming obsolescence should be a source of anxiety, but to be honest I just don’t feel it. At a visceral level, I still have a persistent gut-feeling that the richness of human art and creativity simply cannot be replicated by non-human machines, and this is unshaken by the accumulating evidence suggesting otherwise. A theory by Yale psychologist Paul Bloom explains why.

In a 2005 piece for the Atlantic, Bloom summarizes a fascinating theory of two distinct ways humans categorize objects in the world:

A distinction between the physical and the psychological is fundamental to human thought. Purely physical things, such as rocks and trees, are subject to the pitiless laws of Newton. Throw a rock, and it will fly through space on a certain path; if you put a branch on the ground, it will not disappear, scamper away, or fly into space. Psychological things, such as people, possess minds, intentions, beliefs, goals, and desires. They move unexpectedly, according to volition and whim; they can chase or run away.

From this difference arises two distinct domains of objects—the physical and the social—with their own interior logic and expectations. While both these domains are descriptions of the same world, they operate in non-overlapping ways:

We perceive the world of objects as essentially separate from the world of minds. This separateness of these two mechanisms, one for understanding the physical world and one for understanding the social world, gives rise to a duality of experience. We experience the world of material things as separate from the world of goals and desires.

While “physical” and “social” might be distinguished easily enough conceptually, in the real world the same object can have both a physical aspect and a social aspect. Consider, for example, the Mona Lisa. Of course, a big part of what makes this so valuable is how it looks—the way the light blends, the use of perspective, the enigmatic smile. But notice that these physical aspects (after all, just a specific placement of pigments) are replicable given the technology today. A 3D printer can probably generate a fake so similar to the original that even experts would be unable to tell the difference. But even if such a fake were produced, the value of the original would be undiminished and the fake would not suddenly be valued in the millions.

This indicates that a necessary part of what makes the Mona Lisa so valuable are the social aspects of the original painting—its particular history, including the fact that it was painted by Leonardo da Vinci in the 16th century using certain experimental techniques. Machine-made art lacks social aspects since we don’t impute intentions or goals to their makers, and these social aspects are necessary to make sense of why art in general is held to be valuable at all. 

So while it is amusing to consider the abstract possibility that a monkey hitting a typewriter for an infinite amount of time would almost surely type out the entire corpus of Shakespeare, for all intents and purposes, the social aspects of human art—the fact that a particular human being, with particular intentions, goals, and purposes—remain essential to our identification of and valuation of art. For now, the social aspects are considered necessary for art, but it isn’t implausible at all to think that this might change.

For instance, if you can’t tell human-made artifacts from machine-made, the social origins would simply matter less in any marketplace where the merits of the physical aspects is an independent metric of its value. After all, given time, AI might even start composing music that exceeds that which has human creators. At that point, the dominance of the social aspects in gate-keeping what is considered art will wither away slowly, as more and more people realize that their hangup over origins is keeping them away from superior art.

This isn’t to say that machine-made artifacts would necessarily be embraced rapidly or by everyone, but it has to be conceded that the distinction between the physical and social we currently rely on tacitly in privileging human-made art, and the consequent dismissal of the possibility of machines making inroads into the human world of creativity, is far shakier than we might think.

To come back to where we started, I still have a visceral sense that the richness of human art and creativity simply cannot be replicated by non-human machines, it is just hard-wired into our brains. But I’ve come to realize that this feeling shouldn’t be counted on.

The Ripple Effects of Blockchain Investment

This week US-based cryptocurrency start-up Ripple, announced the launch of its University Blockchain Research Initiative (UBRI) by committing over $50m to 17 global universities in order to support and accelerate education and technical development around blockchain.

This significant move by one of the most widely talked about crypto firms will see Ripple form close collaborations with institutions and offer technical resources and expertise, in addition to funding. Projects being undertaken by universities as part of the programme include research by Princeton University into the global policy impact of blockchain and a blockchain research program being built at the University of Luxembourg. Other prestigious schools involved in the UBRI include the University of North Carolina, MIT, and the University of Pennsylvania.

While the UBRI is likely to focus predominantly on digital payments in the financial sector, Ripple’s main business interest, the initiative is just one example of many buoyant investment drives taking place all over the globe, which aim to nurture a new generation of blockchainers and help blockchain realize its massive potential.

Worldwide bragging rights

Although most of the major blockchain success stories to date undoubtedly come from Silicon Valley and some of the main tech hubs in Europe, China is also becoming increasingly active and ambitious at encouraging blockchain innovation. In April, a new Blockchain Industrial Park opened in Hangzhou, home of Alibaba, which is designed to act as an incubation centre for blockchain start-ups.

Concurrently, a fund of $1.6bn was made available to help support some of the country’s most promising blockchain projects. The Xiong’An Global Blockchain Innovation Fund is partially funded by the city government and will be managed by Li Xiaolai, a renowned blockchain investor and bitcoin tycoon.

On an international level, governments are extremely eager to be in the blockchain game, and venture capitalists are fully aware of the benefits which come with investing in this phenomenal growth industry. There is a lot of money being made available and a huge appetite to drive blockchain innovation across multiple industries, across multiple usage scenarios.

Plugging the skills gap

But there is a problem. As was the case with all the major tech booms of yesteryear — the Cloud being the most referred to predecessor — demand and hunger for innovation far outweigh supply.

A report by freelance employment website Upwork stated that blockchain technologists have become one of the most sought-after, hottest commodities on the job market, second only to those working in robotics. Meanwhile LinkedIn reported that last year there were 4,500 job openings posted containing the term “blockchain” in their description, a threefold increase on the previous year. But, unfortunately, many of these positions will not be filled. A TechCrunch article from earlier this year claimed that there are now 14 job openings for every single blockchain developer/engineer.

The demand for technical expertise in developing blockchain-based technologies is through the roof, yet in reality very few technology professionals possess the skills or knowledge required to satisfy this growth in demand.

If we want to address this global shortfall and challenge, the Ripple approach may be the best way forward. Investing in grassroots level education and training so that the next wave of graduates become blockchain-savvy is a sure-fire way of bringing blockchain supply closer to blockchain demand. And whether you work in law, accountancy, or sales, in industries as broad as healthcare, government, and publishing, the impact of these types of long term investments will be felt in years to come as the blockchain gathers pace and becomes an integral part of our everyday life.

Archaeology and…machine learning?

Studying 2600-year old artifacts with algorithmic techniques

With the increase in use of machine learning and artificial intelligence in every domain, it is now commonplace to find reports about how humans are likely to become increasingly otiose in the coming world. A more clear-eyed analysis of how technology is being used, however, reveals that these pronouncements are still very much premature, and that an alternate (and more plausible) outcome is one where technology doesn’t replace but supplements human labour in complex ways.

An excellent example of this kind of work is documented in a 2016 Proceedings of the National Academy of Sciences paper by a team from Tel Aviv University. A central question in biblical scholarship concerns when exactly the various parts of the Bible were written, which is made particularly complex because we have so little background knowledge information about life 2,500 years ago. Some traction on this was made through the innovative use of machine learning algorithms to try to determine the level of literacy in the community, giving us an idea of whether people in that community would be capable of producing a work of enormous complexity such as the Bible. The project considered 16 inscriptions found in the area of the desert fort of Arad.

Each of these was an ostracon or a piece of broken pottery used to write on, like in the figure above. Notice how it is chipped, meaning that traditionally only brief excepts are present. In addition, over time the writing can fade, making reading it difficult, let alone comparing and contrasting different pieces. That’s where the tech comes in.

After restoring the script as much as possible, the researchers used machine learning software to identify individual characters and then compare the same letter on different Ostracons on a range of metrics like overall shape, the angles between strokes, the character’s center of gravity, as well as their horizontal and vertical projections. Allowing for some range in handwriting variability, the programme would identify distinct authors through letters which exceeded a threshold of difference. Through this method, the authors concluded that there were a minimum of six authors for the artifacts they had.

This was clearly a case of machine learning performing tasks that humans cannot even dream of doing with their naked eye, and someone who wanted to push the narrative of a coming apocalypse of job losses for human beings can treat this research as confirming their world view. But a closer examination of the variety of methods indicates a slightly more complex story.

Although the programme did identify at least six authors, this fact by itself says very little about the extent of the literate population—after all it could have been the case that only six people in the area had been literate or it could have been that a lot more people were. To make inroads with regard to this question, the results of the application were analyzed by human researchers and a model of the hierarchical relationships between the authors and intended recipients of each message was constructed:

Since there appeared to be people from every sociopolitical strata represented, the authors concluded that it was likely literacy was widespread among the inhabitants of the area in the kingdom of Judah near Fort Arad in 600 BCE.

For the wider audience, the lesson from this study is that we shouldn’t be too certain that machine learning and AI will mean the end of jobs, since there is still the possibility of modifying older ways of working that incorporate technology while still relying substantially on human minds and hands. The effects of the coming machine learning revolution should not be prophesied about in general terms, but instead we should engage in nuanced studies and projections of individual fields and sub-fields.

The future is neither completely opaque nor transparent, and what we can glean about it is almost definitely going to be fragmentary, tentative, and context-dependent, instead of a single grand narrative.


Preview of Society for Scholarly Publishing and BookExpo

Next week in the US, two big annual events will take place—The Society for Scholarly Publishing Annual Meeting in Chicago and BookExpo in New York.

The focus of the Society for Scholarly Publishing’s Annual Meeting is “Scholarly Publishing at the Crossroads: What’s working, what’s holding us back, where do we go from here?” and, as they celebrate the organization’s 40th anniversary, the meeting will focus on past and future practices, technology, establishing and reaching new markets, and how publishers keep up with the changing needs of researchers and academics as both authors and users.

This year’s BookExpo is “Reimagined,” according to parent company Reed Exhibitions, BookExpo will become the “first end-to-end business solution for the global publishing industry,” with attendees experiencing “how content creation, rights trading, retail strategy and consumer behavior will increase profit and give you the tools to succeed in today’s shifting marketplace.”

The two events highlight how far scholarly and STM publishing have come in embracing technology in the workflow and address user needs as they are in today’s world, whereas trade publishing continues to focus on print vs. digital, metadata, and predominantly on adapting an existing system rather than creating something entirely new.

Below are our highlights from both events’ programs, a selection of the events which will help publishers improve their business structure.

SSP Annual Meeting

Wednesday, May 30th

8:30–11:30 Pre-Meeting Seminar: Humans, AI, and Decision Making: How Do We Make Use of Data, Text Mining and Machine Learning for Better Decision Making

AI represents a suite of technologies that are already supporting and assisting human decision-making in a whole host of settings. In this seminar, we’ll discuss some of the ways in which publishers and institutions are using big data, semantics and analytics to make smarter strategic decisions.

Thursday, May 31st

10:30–12:00 pm Artificial Intelligence: How Publishers will Benefit from Artificial Intelligence?

Smart publishers are beginning to embrace AI and are weaving it into the core of their business—to source new content, to inform and improve content and for new product development. Publishers are also using AI to reduce costs in their editorial processes.

3:30–4:30 pm Strange Bedfellows: Integrating Editorial and Sales to Maximize Success

The scholarly communications landscape is increasing in complexity. Publishers can no longer afford to allow departments to operate in silos. Sales colleagues at a publishing house need to understand the goals and objectives of their Editorial colleagues—and vice versa—in order to make the most of market conditions and partner effectively.

Friday, June 1st

11:00–12:30pm New Tools and Trends in Discovery Technologies

With over 2.5 million scholarly articles published each year—more than 8,000 each day—the glut of available scholarly content poses challenges to researchers, authors, publishers, and libraries. For authors and publishers, getting their work discovered and read, and ultimately cited, can be a career-defining challenge. Libraries compete with the open web by providing enhanced discovery services which they hope will be valued by their users. No single solution has emerged to satisfy all of these needs.


Thursday, May 31st

9:45am Leadership Round Table: Publishers on Publishing

This roundtable will feature CEOs from top publishing houses, including Markus Dohle, CEO of Penguin Random House; Carolyn Reidy, President and CEO of Simon & Schuster; and John Sargent, CEO of Macmillan in a powerhouse presentation that will surely be a highlight of BookExpo. Together, these leaders will reflect on industry trends, market highlights, and the power and responsibilities of publishers as global, corporate citizens. Maria A. Pallante, Association of American Publishers President and CEO, will moderate.

11:00am The Content Liberation Movement

Even well into the digital age, publishers have persisted in maintaining processes that confine their businesses to a specific format (usually, the book) and to a single business model. Forward-thinking editors today demand freedom to reuse and repurpose content in innovative, high value ways, especially on mobile devices. Content management systems, though, aren’t fast enough at identifying assets and don’t go far enough when assembling new products.

1:00 pm The State of the Publishing Industry Today

Join Jonathan Stolper, the President of NPD Books, as he breaks down the latest outlook for the US book market. Drawing on data from NPD’s BookScan, PubTrack, and Books & Consumer platforms, this presentation will deliver essential insights into the latest trends from book publishing’s most authoritative source of industry information, including:
 • A recap of key industry performance in 2017/2018
 • The significant trends in content and platform
 • The outlook for digital versus print in the next few years
 • The opportunities (and risks) for publishers and retailers in 2018 and beyond

Friday, June 1st

12:00 pm KeywordsEnhance Discoverability and Increase Sales on Amazon

Hear from technology experts & publishers how they are using the latest machine learning and AI technology tools to increase discoverability, drive sales and help make effective marketing decisions.



Stalking the Muse with Kanye West

A technological response to the question of the origins of creativity

Human beings have always had a close affinity to art. Our humanoid ancestors etched shells hundreds of thousands of years ago, and we have continued to make and celebrate artistic achievement in an unbroken line since then. But this importance placed on art inevitably raises a question—where does creativity come from?

In Plato’s Ion, Socrates faces the rhapsode Ion, a performer of epic poetry, and argues that while his talents were indeed impressive, they were not the application of any skill. Rather, it was divine inspiration coursing through his mind:

Many are the noble words in which poets speak concerning the actions of men; but like yourself when speaking about Homer, they do not speak of them by any rules of art: they are simply inspired to utter that to which the Muse impels them…for not by art does the poet sing, but by power divine. The poets are only the interpreters of the Gods by whom they are severally possessed.

We might reject this as quaint, but what it gets right is that creativity is not generated by an insular process cut off from others and the past, but rather through the interaction of the artist with something outside of the artist. But what Plato wholly attributed this “something” to the work of the Gods, we now partially attribute to prior art itself.

As a culture, we note that often creative work is part inspiration and part adaptation, with artists drawing on earlier work that may have influenced their novels, plays, or films. For example, when the smash-hit musical “Hamilton” first appeared on the scene, multiple mainstream sources Slate, Vulture, The Guardian, The New York Times traced the influences that inspired Lin-Manuel Miranda to create such a groundbreaking work.

Some artists very clearly outline their influences, such as beloved children’s book writer and illustrator Maurice Sendak. He made no secret of his antecedents and sources, and instead wore them on his sleeve. When reading a biography on William Blake, with a rare honesty, he stated:

I read Blake because I want to schlep something from him that I can eat raw, have…Why am I clinging to every word Blake says in this book? I’m trying to suck all his strength out.

And it wasn’t just Blake he was drawing from. It was his standard modus operandi, a part of his creative process:

The muse does not come pay visits, so you go out stalking, hoping that something will catch you. Where do I steal from?

While these might suggest that Sendak was simply borrowing other people’s ideas, the real story is far more complicated. Sendak’s “stealing” was not merely appropriation, but a transmutation of prior work into something unseen. We can note the influences, but no one who has read Where the Wild Things Are or In the Night Kitchen can deny that these were Sendak originals, unquestionably terrific and original works of art.

Sendak shows that even if we draw heavily on past works for inspiration, our art can be wholly our own and new

Or to put a modern spin on it:

This idea of inspiration sparked PageMajik’s newest idea an AI engine that analyzes scenes and points out similar contexts and ideas in the works of great authors. For example, if a dramatic scene involving a dysfunctional family was being written, you might be shown brief excerpts from A Long Day’s Journey into Night or August: Osage County.

Why would this be useful for publishers?

With the threat of plagiarism or reusing material that has come up in the last few years, for those self-publishing their work or even for bestselling writers, it looks at new submissions to make sure they don’t match previously published work.

Why would this be useful to writers?

In a way of enabling Sendak-style inspiration, it can provide authors with an opportunity to boost their creative ideas by highlighting excerpts in similar work that might help them figure out a plot point or a way to interpret the scene in a new and interesting way.

By making overt some of these influences, this system can ensure that what’s being written really does vary from earlier texts and isn’t just an accidental copy.

As someone who firmly believes we can’t know how good a tech idea is until multiple people use it independently over a decent period of time, I can’t wait to see how this works out.

Journalism in the Age of AI

How technology is upending how we produce and consume the news

An Olympic Achievement

The Washington Post’s coverage of the 2018 Winter Games in PyeongChang was somewhat unusual. Glancing through their social media page, the articles and updates might not have looked particularly different. But that was precisely what was unusual. For, you see, it was not composed by a human reporter.

The Washington Post Olympics Bot (@WPOlyBot) generated constant updates on Twitter during the Games, letting viewers stay on top of the latest developments. The updates included announcements about events that were beginning soon:

A line about the winners of events and any specific achievements:

And even a periodic calculation of the cumulative medals won by various countries:

These updates are based off data from sports data companies, and ensure total coverage while not burdening real journalists or relying on human speed and reporting accuracy. While this certainly eased the burden on the human journalists, this was not meant to be a totalreplacement for them. Rather, it was intended to “free up Post reporters and editors to add analysis, color from the scene and real insight to stories in ways only they can.”

While the benefits of the Olympics Bot are very real, there are also easily spotted limitations. Its data was taken from other sites, which meant it was still dependent on human activity at some point in the chain. Moreover, as you scroll through the twitter feed, you notice that the tweets are themselves somewhat plain, having substituted clarity for style. Human journalists then are still quite essential to journalism.

A Dowsing Rod for Information

Although it has to be conceded that AI cannot simply replace human journalism, it can still be asked whether they can help approach the avalanche of online content produced everyday. One interesting proposal that has been made recently is from a recent paper by Google’s Yinfei Yang and UPenn’s Ani Nenkova where they propose testing for “content density”.

According to the authors, “content density” is a measure of how much information there actually is in a certain piece of writing. It is a way of separating serious information articles from mere fluff, and in this way ensure that readers can focus their finite time and energy as effectively as possible on actual content.

To get a sense of what the difference between informative and non-informative content is, consider an example they provide to illustrate this distinction:


The European Union’s chief trade negotiator, Peter Mandelson, urged the United States on Monday to reduce subsidies to its farmers and to address unsolved issues on the trade in services to avert a breakdown in global trade talks.

Ahead of a meeting with President Bush on Tuesday, Mr. Mandelson said the latest round of trade talks, begun in Doha, Qatar, in 2001, are at a crucial stage. He warned of a ”serious potential breakdown” if rapid progress is not made in the coming months.


“ART consists of limitation,” G. K. Chesterton said. ”The most beautiful part of every picture is the frame.” Well put, although the buyer of the latest multimillion-dollar Picasso may not agree.

But there are pictures—whether sketches on paper or oils on canvas—that may look like nothing but scratch marks or listless piles of paint when you bring them home from the auction house or dealer. But with the addition of the perfect frame, these works of art may glow or gleam or rustle or whatever their makers intended them to do.

Assuming that journalistic conventions will more or less remain the same, the authors designed a classifier that utilizes a machine learning approach to differentiate between informative and non-informative text using lexical features (eg: words and their associated average age of acquisition, imagery, and concreteness) and syntactic features (eg: the flow between sentences in terms of discourse relations and entity mentions).

The classifier then categorized with a 67–75% accuracy a test set of articles from different domains. Admittedly this is not quite 100%, and the assumptions made about steady journalistic conventions mean this cannot just be applied broadly just yet. Still, by showing that a successful model that can select for content density better than chance is possible, Yinfei Yang and Ani Nenkova open the possibility of cutting down on time lost on wading through the ubiquitous fluff we seem to be awash with.

As impressive as content density is as a measure of news-worthiness, a problem it cannot address is political bias. After all, there is no dearth of sites which produce article after article stuffed to the brim with deeply partisan content, so just being able to detect content density is not going to be enough.

Knowhere Else to Go

A startup that tries to deal with precisely this is Knowhere News, which boasts of offering “the world’s most unbiased news”.

The way it works is quite straight forward—the site’s AI engine looks for whatever topic is popular at a given time, scours multiple articles on that topic, and then generates an unbiased version of the news. Since there is no journalism required, the actual writing can take as little as 60 seconds!

To work around the fact that not all news sources are equally reliable, human input is required to pre-set points for trustworthiness to value reliable sources over fringe views.

For political stories, Knowhere News even produces two additional articles for the left and the right in addition to the impartial version. For example, the headlines of a recent topic were:

Impartial: Whistleblower on Trump lawyer finances says records are missing

Left: Whistleblower on Trump lawyer finances fears cover-up

Right: Person who leaked Cohen’s financial information questioned

This example really emphasizes how important such a tool can be in our era of hyper-partisan politics. But its limitations are also clear—for one, this too will depend on the work of human journalists to create a mass of articles it can work on.

More importantly, an assumption animating this project is that the “impartial” or view from the center is the most appropriate one to take. While this might very well be true in some cases, there is a risk of legitimating extremist views if we are always willing to meet in the middle. For example, if one political party starting moving towards fascism while the other remained moderate, the impartial view generated by Knowhere News would be a moderation of fascist claims instead of its repudiation. It is important to recognize that while moderation and conciliation are valuable ideals, they can be taken too far.

A Future of Robot Journalism?

Admittedly, the examples examined here don’t exactly mean pink slips for journalists just yet. AI still employs machine learning algorithms that rapidly sift through already existing data to provide updates and create bias-free versions. But these algorithms still need human hands to create the material they can draw on.

But let’s not get complacent about AI just yet—there are still many paths through which AI can make inroads to original journalism. As more academics and culture influencers become active on social media, it is conceivable that an AI system might direct engage them through journalistic activity. Interviews conducted over email don’t necessarily need a human asking questions, putting a new spin on the Turing test. And with the network of cameras and audio devices like phones present in every locale in every community, there might even be enough raw data for field reporting by AI some day.

Granted, the technology that will be required for these advances don’t even look close to materializing just yet. But given the speed at which technology has been overturning entrenched assumptions, it might be hubris to be too cocky about its limitations.

GDPR—how publishers can navigate the choppy waters

If you live in Europe, the odds are that in recent months your inbox has been inundated with emails from pretty much every company you’ve ever had dealings with, asking you whether you’d like to continue to hear from them, or “opt in”. From monthly newsletters to special offers, curated content to advertisements, we as consumers have become accustomed to having our data harvested by companies who then target us with tailored and untailored marketing messages to promote their products and services.

You may have unwittingly forgotten to untick a box when you purchased flights five years ago and have been receiving weekly emails from the airline ever since. Or perhaps in order to log into a café’s WIFI once-upon-a-time you were subsequently asked to subscribe to direct mail from them in exchange for a silky-smooth internet connection. And now, finally, you are being given the chance to right all those wrongs and do away with all those unwanted or unsolicited emails once and for all. You may ask why this is happening and why are you being given the golden opportunity to finally cleanse your life of spam. The answer is GDPR.

What is GDPR?

During the course of the last year, citizens of Europe have been collectively rolling their eyes every time they hear any mention of something called the General Data Protection Regulation, or GDPR, as it has become more affectionately known. Coming into effect next week, the new regulation in EU law addresses data protection and privacy for all individuals in the EU, aiming to give consumers control over their personal data and to simplify the regulatory environment for international business around the continent.

In essence, this means that businesses who directly contact consumers can no longer do so without renewed affirmative consent and recorded approval from the individual before any further data is collected. In addition, consumers can demand that any data held on them can be accessed, amended or completely deleted whenever they like. Any failure to comply with the new regulations could result in hefty fines and crippling penalties for businesses from the Information Commissioner Office (ICO).

Why should publishers care?

While consumers click “unsubscribe” and “opt out” en masse, what are the key implications for businesses? And more specifically, how are publishers likely to be affected by the new legal framework?

First and foremost, and perhaps inevitably, any company which collects consumer data and then uses it to communicate directly with them will see the impact of their direct marketing efforts weakened dramatically. While most publishers operate as B2B entities working through retailers, many have, and still do, conduct direct-to-consumer (B2C) marketing and sales activity. Some publishers, particularly those with recognisable and strong consumer-friendly brand identities, have had great success at building networks and communities around their content and marketing directly to book buyers. And it is these publishers who will need to be most wary of GDPR as it comes into play.

Another thing to consider is that GDPR extends beyond a company’s proprietary systems. If a publisher is using a third-party ecommerce system, for example, it is automatically considered an extension of their own customer database. Therefore, it is the responsibility of the publisher to ensure that those system providers, which harvest customer data on their behalf, are also taking measures to be fully compliant with GDPR.

Surviving the data minefield

While the ICO has sought to reaffirm on several occasions that GDPR should not be a cause for panic, the office has also stated that inaction is not an option either. Legal experts in the industry are suggesting that the first step publishers should take is to conduct a data audit to recognise what kind of personal data they hold, where it came from and with whom it has been shared, and to make efforts to track and document the relationship the company has with each individual.

If the publisher would like to continue engaging with consumers as it has done previously it will need to establish an opt-in/opt-out consent mechanism for both new and existing customers, and carefully record every communication it instigates with these individuals as well as any data collected on them in the future. Steps should also be taken to update privacy policies and notices on company websites and other relevant legal documentation.

Finally, as it will become more challenging for publishers to proactively engage with consumers through direct channels, it is highly likely that they will have to instead put more focus on search and discoverability. To this end, ensuring that metadata is accurate and that a publisher’s content is ubiquitous across every possible channel will never be more important.

There are many ways in which GDPR will likely impact the way publishers go about their day-to-day business, and a week ahead of deadline day it’s still not too late for companies to get more informed, seek legal advice and start taking the necessary steps to become more compliant.