Investigating Final Draft's XML document format with Ruby

So apparently there is no open source screenplay format. I was poking around and the closest I came was this and this, which ultimately led me to this. At the time of writing, the Open Screenplay Format (OSF) apparently doesn’t exist anymore.

The paranoid-conspiratorial side of me suspects that the nefarious folk at Final Draft are behind the OSF’s disappearance. In retaliation for their imagined meddling in the affairs of their competitors, I decided to straight-up jack their FDX (Final Draft-flavoured XML) file format and make it better.

First, I obtained a script.

Since I’d already been poking around a bit, I knew about fountain.io. They’ve got some sort of Markdown-flavoured screenplay-writing utility (which is awesome). It just so happens that they have a copy of Big Fish in FDX format. Perfect.

This is kind of what FDX looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
<FinalDraft DocumentType="Script" Template="No" Version="1">
<Content>
<Paragraph Type="General">
<Text>This is a Southern story, full of lies and fabrications, but truer for their inclusion.</Text>
</Paragraph>
<Paragraph Type="Transition">
<Text>FADE IN</Text>
</Paragraph>
<Paragraph Type="Scene Heading">
<Text>A RIVER.</Text>
</Paragraph>
<Paragraph Type="Action">
<Text>We’re underwater, watching a fat catfish swim along.</Text>
</Paragraph><Paragraph Type="Action">
<Text>This is The Beast.</Text>
</Paragraph>
<Paragraph Type="Character">
<Text>EDWARD (V.O.)</Text>
</Paragraph>
<Paragraph Type="Dialogue">
<Text>There are some fish that cannot be caught. It’s not that they’re faster or stronger than other fish. They’re just touched by something extra. Call it luck. Call it grace. One such fish was The Beast.</Text>
</Paragraph>
<!-- And so forth... -->

Upon a cursory inspection, I quickly concluded that FDX is primarily concerned with the visual format of the exported screenplay (obviously). I, on the other hand, am only concerned with visual format insofar as it provides me clues as to how to import typeset screenplays and shoehorn them into a new format… something less XML and more JSON, perhaps.

Though I generally think XML is a pain to work with, it is well-suited for the kind of typesetting and document structuring that is the purview of Final Draft. Less so for a DevOps-driven movie studio that wants the ever-changing script to automatically orchestrate its own production. I’m not sure what the final Open Screenplay Format (I’m stealing the name now) will look like at this early stage of the game, but I do know that FDX will provide a good starting point.

As such, I need some basic information about FDX. I.e.,

  • The elements of which it is comprised,
  • The attributes of each of those elements,
  • And valid values for each of those attributes

It didn’t take long to realize that reading the FDX screenplay and cataloging this information by hand is dumb, so I whipped up this groovy little ad hoc Ruby script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
require 'nokogiri'
require 'pp'
f = File.open('Big Fish.fdx')
doc = Nokogiri::XML(f)
f.close
elems = doc.xpath("//*")
schema = {}
elems.each do |e|
# Add a new element, if necessary
schema[e.name] = {} unless schema.has_key? e.name
# Get an element's attributes
e.attributes.keys.each do |a|
schema[e.name][e.attributes[a].name] = [] unless schema[e.name].has_key? e.attributes[a].name
# Get valid attribute values
schema[e.name][e.attributes[a].name] << e.attributes[a].value unless schema[e.name][e.attributes[a].name].include? e.attributes[a].value
end
end
pp schema

All that produced this (simplified):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
{"FinalDraft"=>
{"DocumentType"=>["Script"], "Template"=>["No"], "Version"=>["1"]},
"Content"=>{},
"Paragraph"=>
{"Type"=>
["General",
"Transition",
"Scene Heading",
"Action",
"Character",
"Dialogue",
"Parenthetical"],
"Alignment"=>["Center", "Right"],
"FirstIndent"=>["0.00"],
"Leading"=>["Regular"],
"LeftIndent"=>["1.25"],
"RightIndent"=>["-1.25"],
"SpaceBefore"=>["0"],
"Spacing"=>["1"],
"StartsNewPage"=>["No"]},
"Text"=>
{"AdornmentStyle"=>["0"],
"Background"=>["#FFFFFFFFFFFF"],
"Color"=>["#000000000000"],
"Font"=>["Courier Final Draft"],
"RevisionID"=>["0"],
"Size"=>["12"],
"Style"=>[""]},
"TitlePage"=>{},
"HeaderAndFooter"=>
{"FooterFirstPage"=>["No"],
"FooterVisible"=>["No"],
"HeaderFirstPage"=>["No"],
"HeaderVisible"=>["Yes"],
"StartingPage"=>["1"]},
"Header"=>{},
"DynamicLabel"=>{"Type"=>["Page #"]},
"Footer"=>{},
"PageLayout"=>
{"BackgroundColor"=>["#FFFFFFFFFFFF"],
"BottomMargin"=>["72"],
"BreakDialogueAndActionAtSentences"=>["Yes"],
"DocumentLeading"=>["Normal"],
"FooterMargin"=>["36"],
"ForegroundColor"=>["#000000000000"],
"HeaderMargin"=>["36"],
"InvisiblesColor"=>["#A0A0A0A0A0A0"],
"TopMargin"=>["72"],
"UsesSmartQuotes"=>["No"]},
"AutoCastList"=>
{"AddParentheses"=>["Yes"],
"AutomaticallyGenerate"=>["No"],
"CastListElement"=>["Cast List"]},
"ElementSettings"=>
{"Type"=>
["General",
"Scene Heading",
"Action",
"Character",
"Parenthetical",
"Dialogue",
"Transition",
"Shot",
"Cast List",
"New Act"]},
"FontSpec"=>
{"AdornmentStyle"=>["0"],
"Background"=>["#FFFFFFFFFFFF"],
"Color"=>["#000000000000"],
"Font"=>["Courier Final Draft"],
"RevisionID"=>["0"],
"Size"=>["12"],
"Style"=>["", "AllCaps", "Underline+AllCaps"]},
"ParagraphSpec"=>
{"Alignment"=>["Left", "Right", "Center"],
"FirstIndent"=>["0.00", "-0.10"],
"Leading"=>["Regular"],
"LeftIndent"=>["1.50", "3.50", "3.00", "2.50", "5.50"],
"RightIndent"=>["7.50", "7.25", "5.50", "6.00", "7.10"],
"SpaceBefore"=>["0", "24", "12", "120"],
"Spacing"=>["1"],
"StartsNewPage"=>["No", "Yes"]},
"Behavior"=>
{"PaginateAs"=>
["General",
"Scene Heading",
"Action",
"Character",
"Parenthetical",
"Dialogue",
"Transition"],
"ReturnKey"=>["General", "Action", "Dialogue", "Scene Heading"],
"Shortcut"=>["0", "1", "2", "3", "4", "5", "6", "7", "8", ""]}}

This enabled me to identify the most important script elements, which, upon inspection, are anything tagged with Paragraph and constrained by the Type attribute. That is, a movie (Big Fish at the very least) is comprised of the following:

  • General
  • Transition
  • Scene Heading
  • Action
  • Character
  • Dialogue
  • Parenthetical

These are the components of a screenplay that direct the action on screen. The rest appear mostly concerned with typesetting the document exported from Final Draft.

From here I will investigate the best way to structure the new and improved OSF. Stay tuned…