Kepler
running workflows from the commandline
running workflows from the command line was simple, just have to include all the classes that are used by the workflow in the classpath environment or the java command, and call ptolemy.actor.gui.MoMLSimpleApplication with the workflow.xml as input
E.G.
$ export CLASSPATH=$PTII/build/ptolemy.jar:$KEPLER/build/classes/:$KEPLER/lib/jar/jargon_v1.4.17.jar
$ java ptolemy.actor.gui.MoMLSimpleApplication ~/dart/kepler/workflows/tristan/QA-and-SRB.xml
As a side note, some of the actors don't work without the gui, like the display actor, which writes text to a display window. although currently I've only used such actors for debugging purposes, and if the need comes an actor can be easily written to write data to the console using System.out.print().
Another note, as something to look into, is how to handle exceptions efficiently so the workflows can recover correctly from non-fatal exceptions (i'll have to build a few use cases to test some time, add comments if you have any ideas). i've got a feeling there will be cases we come across where a workflow running as a service will run into cases where some exceptions will be thrown and terminate the workflow unnessesarily.
quality assurance and srb
New SRB actors have been written to work with my workflow, and i've had success with running files through a quality assurance test and writing only the files that pass the test to srb.
SPut and Sinit (previously SRBConnect) have been written and work together.
Also wrote a modified BooleanMultiplexor actor, because the one that comes with ptolemy only works if there are tokens on all the ports, where i needed it to still fire if there was only a token on the port which would be passed through.
Bugs: need to put in idiot tests (make sure all exceptions are caught and all required inputs are available)
there is also the possibility that my code has fatal flaws that will cripple the actors in the future. Will address that when it comes
Rewriting SRB actors
It seems the SRB actors already implemented in kepler have been specifically written to work in their specific situations. unfortunatley it spits the dummy when I try to use it as-is in my workflows. Thus i'm re-writting the SRB actors i need, which will hopefully just be the Sput actory, but the SRBConnect actor may need a re-write as well (depending on how the two play together).
Temporary solution to file polling problem
I've fixed the solution to the problem where files would be picked up and passed to processing layers while they were still copying to the filesystem (or still being created etc).
The Directory poller now checks to make sure at least 5 seconds have passed since the file was last modified. This should work for most cases (except for file copies over high latency networks and similar situations where a file might not be modified for more than 5 seconds even tho it is still in use).
I don't like this solution, as it makes data processing take 5 seconds longer than it needs to, plus for the reason mentioned above it's not a general solution, but it's the best I can do for now.
Another solution would to make the data copy to the polling directory with an special extention that the poller recognises and ignores, and have the extention removed when the data has finished being copied. The problem with this is it has to be implemented on the source's end, which may be able to be modified to conform to this protocol.
Bug fixes
problem involved:
* syhncronize section, which prevented another file from being processed until the Quality assurance actor was finished with the previous file.
* binary file reader, didn't send any data to the QA actor if the file was empty, thus
* QA actor never had any input to act upon so it never sent out an exitCode, thus
* synchronize section would not let another iteration continue
Solution:
added a exitCode output port to the binary file reader actor, which sends out a code if there is no data to be sent
added a zero data notification input port to the QA actor so that the BFR actor can let it know if there is no data, so it can send out the appropriate exit code, and allow the next iteration to fire.
Problem with polling for new files
There is a problem I've encounted while running a few tests with my Directory Polling actor.
If a file is being written to--for example, a large data file is being copied to the data directory--when the directory is checked for new files, the yet-to-be-complete data file will be flagged as a new file, and passed on to the other actors.
This causes a big problem, since the file then being written to SRB etc may not be complete.
Solution may be to check each file every second until the lastModified value has remained the same for a number of iterations. However i don't like this solution, as it add's a constraint that each file takes at least n seconds to process, although this wouldn't be a problem for large data sets that aren't produced rapidly.
Quality assurance with kepler, proof of concept
Here's the basis of my Quality Assurance test with kepler:
Built a program that creates random data files containing letters of the alphabet.
Quality Assurance is to make sure the data files don't contain any characters which are not part of the alphabet.
Directory Poller: watches a directory and outputs a list of any newly ariving files.
Binary File Reader: kepler's current binary file reading actor did not play nice with numerous iterations of the workflow, so I've written one that does.
Quality Assurance usecase: preforms the quality assurance tasks on the binary data stream from the Binary File Reader actor.
It now works at a "proof of concept" level.
Few bugs to work out:
* Directory Poller sometimes outputs a file twice.
* the workflow sometimes gets stuck somewhere (only noticed when passing files that fail the quality assurance test)
Next step:
Write files that pass the QA test to SRB
UPDATE:
the bug where a file sometimes shows up twice is caused when the function to check for new/modified files runs twice while the file is being created/written to
the getting stuck bug is caused by the file being read containing no data. not sure why things get stuck yet, may have something to do with the QA actor
kepler and eclipse
http://kepler-project.org/Wiki.jsp?page=UsingEclipseForKeplerDevelopment
Errors and workarounds for Setting up Ptolemy II:
- Project ptII is missing required library: 'ptolemy/domains/ptinyos/lib/nesc-dump.jar'
- Properties for ptII -> Java Build Path -> Source -> ptII -> Excluded -> Edit:
- Exclusion patterns -> Add: ptolemy/domains/ptinyos/
- Properties for ptII -> Java Build Path -> Libraries:
- Remove nesc-dump.jar
- ThalesGraphFrame.java:107: The method _createGraphPane(NamedObj) in the type ActorGraphFrame is not applicable for the arguments ()
- Properties for ptII -> Java Build Path -> Source -> ptII -> Excluded -> Edit:
- Exclusion patterns -> Add: jni/ThalesGraphFrame.java
Errors and workarounds for Setting up Kepler:
- ptII/lib/ptolemy.jar and ptII/lib/ptolemy-doc.jar don't exist
- figure out how to make Eclipse compile jars (i'm too lazy)
- use ant to build jars
- Ant build ptolemy using Eclipse ant plugin fails:
- Window -> Preferences -> Ant -> Properties:
- Add Property...:
- Name: env.KEPLER
- Value: /home/tristan/dart/kepler
- Add Property...:
- Name: env.PTII
- Value: /home/tristan/dart/ptII
- Add ptII/lib/ptolemy*.jar to kepler -> Properties -> Java Build Path -> Libraries
- figure out how to make Eclipse compile jars (i'm too lazy)
- DataCacheViewer.java and UDF_split.java have compile errors (however, ant builds them fine)
DataCacheViewer.java:114: The method getItemAt(int) is undefined for the type DataCacheManager
DataCacheViewer.java:158: The method getItems() is undefined for the type DataCacheManager
DataCacheViewer.java:210: The method getItems() is undefined for the type DataCacheManager
DataCacheViewer.java:220: The method getData() is undefined for the type DataCacheObject
DataCacheViewer.java:222: The method getData() is undefined for the type DataCacheObject
DataCacheViewer.java:231: The method getData() is undefined for the type DataCacheObject
DataCacheViewer.java:318: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:321: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:344: The method saveCache() is undefined for the type DataCacheManager
DataCacheViewer.java:626: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:631: The method removeItem(DataCacheObject) in the type DataCacheManager is not applicable for the arguments (int)
DataCacheViewer.java:637: The method removeItems(int[]) is undefined for the type DataCacheManager
DataCacheViewer.java:643: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:644: The method clear() is undefined for the type DataCacheManager
DataCacheViewer.java:650: The method getItemAt(int) is undefined for the type DataCacheManager
DataCacheViewer.java:690: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:716: The method getAttrs() is undefined for the type DataCacheObject
DataCacheViewer.java:738: The method getItemAt(int) is undefined for the type DataCacheManager
DataCacheViewer.java:755: The method getLocalFileName() is undefined for the type DataCacheObject
DataCacheViewer.java:757: The method getSerialClassName() is undefined for the type DataCacheObject
DataCacheViewer.java:758: The method getSize() is undefined for the type DataCacheManager
DataCacheViewer.java:874: The method saveCache() is undefined for the type DataCacheManager
UDF_split.java:1: The declared package does not match the expected package org.kepler.scia
- dead in the water
As for now, it is simpler to just use eclipse to edit source files and use ant to compile. it works fine that way
How to get data into Kepler
One main problem is how data comes from the data sources.
This entry makes a few assumptions on how data may come from the data sources, and the possible solutions are wrapped around these assumptions.
Possible solutions (with no info on how data comes from data sources):
- Read from file from local filesystem.
Have a directory which raw data files are placed, and have kepler read the files in from there.
Problems: How kepler knows there is new data to be read?
Possible Solutions/Problems:- Constantly poll the directory and check if new files are available. This would be very inefficent.
- Have a trigger to execute the kepler workflow when a new file is available. Is this even possible?
- Apparently this is possible in linux. TASK: figure out how!
- Read from file in SRB
Same as previous method except the raw data is stored in a special directory in SRB. - Have data incoming from a network socket.
This saves the problem of how to know new data is available, as the sockets can be set to wait for new data.
Problems: - Network listening actors would have to be written. This shouldn't really be a problem.